Implementation notes

Here are a few notes on implementation details for which we sometimes get questions. We describe the tradeoffs and maybe a few unexpected design choices or results.

Matrix multiplication to do many L2 distance computations

A typical operation in IndexFlatL2 is to exhaustively compare a set of nq query vectors and a set of nb database vectors in dimension d (then select the top-k smallest vectors).

Faiss has two implementations of this operation:

direct implementation that loops over nq, nb and the dimension of the vectors.
an implementation that uses the decomposition d(x, y) = ||x||^2 + ||y||^2 - 2 * <x, y>. This is faster because the most expensive operation in O(nq * nb * d) can be handed over to BLAS that normally does this efficiently.

We use implementation 1 when nq < 20 and d is a multiple of 4, and implementation 2 otherwise. The threshold 20 can be adjusted via global variable faiss::distance_compute_bias_threshold (accessible in Python via faiss.cvar.distance_compute_bias_threshold).

Note that solution 2 may be less stable numerically than 1 for vectors of very different magnitudes, see discussion in issue #297.