Fri 27 December 2024
In this article, we informally analyze the performance of a naive vector search implementation in core Java, and benchmark it against a rust counterpart, as well as the HNSW implemented in Meta's faiss.
A couple of months ago I came across this interesting article https://fennel.ai/blog/vector-search-in-200-lines-of-rust/, that implements vector search with a space partitioning algorithm similar to the one used in the popular annoy library.
Both annoy
and fann
boast good performance, even compared to more sophisticated families of algorithms. However, both of them are implemented using compiled, non-garbage-collected languages (C++ and Rust respectively).
I was curious to see how badly an implementation in a higher-level language would perform, so I implemented fann
(with some tweaks) in core Java.
The source code is available at https://github.com/gmodena/searchy, jars are published to jitpack.io. Indexes can be built and queried with a fluent interface:
import io.github.gmodena.searchy.Index;
Index index = new Index.Builder()
.withNumTrees(3)
.withMaxNodeSize(15)
.add(new float[]{1f, 2f, 3f})
.add(new float[]{4f, 5f, 6f})
.add(new float[]{7f, 8f, 9f})
.build();
var result = index.query(new float[]{6.5f, 8f, 9f}, 2);
result.stream().map(c -> Arrays.toString(c.vector().raw() ) + " " + c.id() + " " + c.distance())
.forEach(System.out::println);
I ran fann
benchmarks using their suite and stock parameter configuration, on a 12th Gen Intel(R) Core(TM) i5-1240P processor with 32GB of DDR4 RAM.
This
nix flake is a helper for setting up the rust and python runtimes.
As a point of reference, I compared faiss HNSW (configured with ef_search=16
, ef_construction=40
, and max_node_size=15
)
against a lightweight version of the Rust and Java indexes (num_trees=3
, max_node_size=15
).
Below are some results taken from an arbitary run of fann
, faiss
and searchy
libraries compared
to an exhaustive search baseline. All indexes are built on 999,994 300-dimensional, wiki-news FastText embeddings.
The table below shows timing information for the three implementations, measured in seconds.
Query represents the average latency for a topK (K=20) search across 1,000 randomly sampled vectors. Query
latency performance of searchy
falls between HNSW
and fann
.
searchy (Java) | HNSW (C++) | fann (Rust) | |
---|---|---|---|
Indexing | 17.261 | 28.4794 | 15.451 |
Query | 0.00015273 | 0.00679184 | 0.00007916 |
Indexing and average query latency times (seconds)
Below are topK (K=10) matches for the terms river
and war
, including their respective Euclidean distances from various implementations.
Results are stochastic, and the topK will change across multiple runs. Variations between implementations reflect differences in algorithm behavior and, to some degree, floating-point
implementation across different runtimes
Exhaustive Search | searchy | HNSW | fann | ||||
---|---|---|---|---|---|---|---|
Word | Distance | Word | Distance | Word | Distance | Word | Distance |
river | 0.0 | river | 0.0 | river | 0.0 | river | 0.0 |
River | 1.391 | river. | 1.731 | River | 1.391 | rivers | 1.476 |
rivers | 1.476 | rivulet | 1.974 | swift-flowing | 1.624 | creek | 1.637 |
river- | 1.583 | village | 1.977 | flood-swollen | 1.638 | waterway | 1.783 |
swift-flowing | 1.624 | oxbow | 2.040 | river.The | 1.682 | park | 2.053 |
creek | 1.637 | coulee | 2.063 | unfordable | 1.692 | swamp | 2.092 |
flood-swollen | 1.638 | creeks | 2.081 | River- | 1.695 | side-wheeler | 2.102 |
river.The | 1.682 | prarie | 2.099 | River.The | 1.695 | enterprise | 2.207 |
river-bed | 1.685 | lands | 2.105 | Beult | 1.733 | marshes | 2.261 |
unfordable | 1.692 | meadow | 2.127 | Ichamati | 1.734 | towboat | 2.293 |
Top 10 results for the river
query
Exhaustive Search | searchy | HNSW | fann | ||||
---|---|---|---|---|---|---|---|
Word | Distance | Word | Distance | Word | Distance | Word | Distance |
war | 0.0 | war | 0.0 | war | 0.0 | war | 0.0 |
war-- | 1.384 | war.It | 1.470 | war-- | 1.384 | War.The | 1.736 |
war--a | 1.449 | war- | 1.597 | war--a | 1.449 | admistration | 1.867 |
wars | 1.459 | un-winnable | 1.640 | war.It | 1.470 | -war | 1.913 |
war.It | 1.470 | US-lead | 1.679 | unwinable | 1.513 | campiagn | 1.914 |
unwinable | 1.513 | terrrorism | 1.744 | war.And | 1.518 | subject | 1.927 |
war.And | 1.518 | crisises | 1.770 | Iraw | 1.549 | propaganda | 1.959 |
hostilites | 1.548 | pre-Iraq | 1.789 | war.But | 1.553 | baby-killers | 2.024 |
war.And | 1.518 | capaign | 1.816 | war.The | 1.567 | reports | 2.032 |
war.And | 1.553 | Al-Quada | 1.856 | Veitnam | 1.588 | marshes | 2.092 |
Top 10 results for the war
query
Some quick and dirty measurements showed recall metrics roughly matching what was reported in the fann article. Recall and query latency are proportional to the number of trees in fann and searchy indexes.