No-frills vector search on the JVM

Fri 27 December 2024

In this article, we informally analyze the performance of a naive vector search implementation in core Java, and benchmark it against a rust counterpart, as well as the HNSW implemented in Meta's faiss.

Binary Space Partitioning

A couple of months ago I came across this interesting article https://fennel.ai/blog/vector-search-in-200-lines-of-rust/, that implements vector search with a space partitioning algorithm similar to the one used in the popular annoy library.

Both annoy and fann boast good performance, even compared to more sophisticated families of algorithms. However, both of them are implemented using compiled, non-garbage-collected languages (C++ and Rust respectively).

I was curious to see how badly an implementation in a higher-level language would perform, so I implemented fann (with some tweaks) in core Java.

The source code is available at https://github.com/gmodena/searchy, jars are published to jitpack.io. Indexes can be built and queried with a fluent interface:

import io.github.gmodena.searchy.Index;

Index index = new Index.Builder()
        .withNumTrees(3)
        .withMaxNodeSize(15)
        .add(new float[]{1f, 2f, 3f})
        .add(new float[]{4f, 5f, 6f})
        .add(new float[]{7f, 8f, 9f})
        .build();

var result = index.query(new float[]{6.5f, 8f, 9f}, 2);

result.stream().map(c -> Arrays.toString(c.vector().raw() ) + " " + c.id() + " " + c.distance())
        .forEach(System.out::println);

Benchmarks

I ran fann benchmarks using their suite and stock parameter configuration, on a 12th Gen Intel(R) Core(TM) i5-1240P processor with 32GB of DDR4 RAM. This nix flake is a helper for setting up the rust and python runtimes.

As a point of reference, I compared faiss HNSW (configured with ef_search=16, ef_construction=40, and max_node_size=15) against a lightweight version of the Rust and Java indexes (num_trees=3, max_node_size=15). Below are some results taken from an arbitary run of fann, faiss and searchy libraries compared to an exhaustive search baseline. All indexes are built on 999,994 300-dimensional, wiki-news FastText embeddings.

Indexing time and query latencies

The table below shows timing information for the three implementations, measured in seconds. Query represents the average latency for a topK (K=20) search across 1,000 randomly sampled vectors. Query latency performance of searchy falls between HNSW and fann.

	searchy (Java)	HNSW (C++)	fann (Rust)
Indexing	17.261	28.4794	15.451
Query	0.00015273	0.00679184	0.00007916

Indexing and average query latency times (seconds)

Search results

Below are topK (K=10) matches for the terms river and war, including their respective Euclidean distances from various implementations. Results are stochastic, and the topK will change across multiple runs. Variations between implementations reflect differences in algorithm behavior and, to some degree, floating-point implementation across different runtimes

Closest Matches for river

Exhaustive Search		searchy		HNSW		fann
Word	Distance	Word	Distance	Word	Distance	Word	Distance
river	0.0	river	0.0	river	0.0	river	0.0
River	1.391	river.	1.731	River	1.391	rivers	1.476
rivers	1.476	rivulet	1.974	swift-flowing	1.624	creek	1.637
river-	1.583	village	1.977	flood-swollen	1.638	waterway	1.783
swift-flowing	1.624	oxbow	2.040	river.The	1.682	park	2.053
creek	1.637	coulee	2.063	unfordable	1.692	swamp	2.092
flood-swollen	1.638	creeks	2.081	River-	1.695	side-wheeler	2.102
river.The	1.682	prarie	2.099	River.The	1.695	enterprise	2.207
river-bed	1.685	lands	2.105	Beult	1.733	marshes	2.261
unfordable	1.692	meadow	2.127	Ichamati	1.734	towboat	2.293

Top 10 results for the river query

Closest Matches for war

Exhaustive Search		searchy		HNSW		fann
Word	Distance	Word	Distance	Word	Distance	Word	Distance
war	0.0	war	0.0	war	0.0	war	0.0
war--	1.384	war.It	1.470	war--	1.384	War.The	1.736
war--a	1.449	war-	1.597	war--a	1.449	admistration	1.867
wars	1.459	un-winnable	1.640	war.It	1.470	-war	1.913
war.It	1.470	US-lead	1.679	unwinable	1.513	campiagn	1.914
unwinable	1.513	terrrorism	1.744	war.And	1.518	subject	1.927
war.And	1.518	crisises	1.770	Iraw	1.549	propaganda	1.959
hostilites	1.548	pre-Iraq	1.789	war.But	1.553	baby-killers	2.024
war.And	1.518	capaign	1.816	war.The	1.567	reports	2.032
war.And	1.553	Al-Quada	1.856	Veitnam	1.588	marshes	2.092

Top 10 results for the war query

Recall and error distribution

Some quick and dirty measurements showed recall metrics roughly matching what was reported in the fann article. Recall and query latency are proportional to the number of trees in fann and searchy indexes.