nowave.it

No-frills vector search on the JVM

Fri 27 December 2024

In this article, we informally analyze the performance of a naive vector search implementation in core Java, and benchmark it against a rust counterpart, as well as the HNSW implemented in Meta's faiss.

Binary Space Partitioning

A couple of months ago I came across this interesting article https://fennel.ai/blog/vector-search-in-200-lines-of-rust/, that implements vector search with a space partitioning algorithm similar to the one used in the popular annoy library.

Both annoy and fann boast good performance, even compared to more sophisticated families of algorithms. However, both of them are implemented using compiled, non-garbage-collected languages (C++ and Rust respectively).

I was curious to see how badly an implementation in a higher-level language would perform, so I implemented fann (with some tweaks) in core Java.

The source code is available at https://github.com/gmodena/searchy, jars are published to jitpack.io. Indexes can be built and queried with a fluent interface:

import io.github.gmodena.searchy.Index;

Index index = new Index.Builder()
        .withNumTrees(3)
        .withMaxNodeSize(15)
        .add(new float[]{1f, 2f, 3f})
        .add(new float[]{4f, 5f, 6f})
        .add(new float[]{7f, 8f, 9f})
        .build();

var result = index.query(new float[]{6.5f, 8f, 9f}, 2);

result.stream().map(c -> Arrays.toString(c.vector().raw() ) + " " + c.id() + " " + c.distance())
        .forEach(System.out::println);

Benchmarks

I ran fann benchmarks using their suite and stock parameter configuration, on a 12th Gen Intel(R) Core(TM) i5-1240P processor with 32GB of DDR4 RAM. This nix flake is a helper for setting up the rust and python runtimes.

As a point of reference, I compared faiss HNSW (configured with ef_search=16, ef_construction=40, and max_node_size=15) against a lightweight version of the Rust and Java indexes (num_trees=3, max_node_size=15). Below are some results taken from an arbitary run of fann, faiss and searchy libraries compared to an exhaustive search baseline. All indexes are built on 999,994 300-dimensional, wiki-news FastText embeddings.

Indexing time and query latencies

The table below shows timing information for the three implementations, measured in seconds. Query represents the average latency for a topK (K=20) search across 1,000 randomly sampled vectors. Query latency performance of searchy falls between HNSW and fann.

searchy (Java) HNSW (C++) fann (Rust)
Indexing 17.261 28.4794 15.451
Query 0.00015273 0.00679184 0.00007916

Indexing and average query latency times (seconds)

Search results

Below are topK (K=10) matches for the terms river and war, including their respective Euclidean distances from various implementations. Results are stochastic, and the topK will change across multiple runs. Variations between implementations reflect differences in algorithm behavior and, to some degree, floating-point implementation across different runtimes

Closest Matches for river

Exhaustive Search searchy HNSW fann
Word Distance Word Distance Word Distance Word Distance
river 0.0 river 0.0 river 0.0 river 0.0
River 1.391 river. 1.731 River 1.391 rivers 1.476
rivers 1.476 rivulet 1.974 swift-flowing 1.624 creek 1.637
river- 1.583 village 1.977 flood-swollen 1.638 waterway 1.783
swift-flowing 1.624 oxbow 2.040 river.The 1.682 park 2.053
creek 1.637 coulee 2.063 unfordable 1.692 swamp 2.092
flood-swollen 1.638 creeks 2.081 River- 1.695 side-wheeler 2.102
river.The 1.682 prarie 2.099 River.The 1.695 enterprise 2.207
river-bed 1.685 lands 2.105 Beult 1.733 marshes 2.261
unfordable 1.692 meadow 2.127 Ichamati 1.734 towboat 2.293

Top 10 results for the river query


Closest Matches for war

Exhaustive Search searchy HNSW fann
Word Distance Word Distance Word Distance Word Distance
war 0.0 war 0.0 war 0.0 war 0.0
war-- 1.384 war.It 1.470 war-- 1.384 War.The 1.736
war--a 1.449 war- 1.597 war--a 1.449 admistration 1.867
wars 1.459 un-winnable 1.640 war.It 1.470 -war 1.913
war.It 1.470 US-lead 1.679 unwinable 1.513 campiagn 1.914
unwinable 1.513 terrrorism 1.744 war.And 1.518 subject 1.927
war.And 1.518 crisises 1.770 Iraw 1.549 propaganda 1.959
hostilites 1.548 pre-Iraq 1.789 war.But 1.553 baby-killers 2.024
war.And 1.518 capaign 1.816 war.The 1.567 reports 2.032
war.And 1.553 Al-Quada 1.856 Veitnam 1.588 marshes 2.092

Top 10 results for the war query


Recall and error distribution

Some quick and dirty measurements showed recall metrics roughly matching what was reported in the fann article. Recall and query latency are proportional to the number of trees in fann and searchy indexes.