Vector Search Metrics

The Metric defines the distance measure between vectors during the vector search. From a mathematical point of view, the K nearest neighbors are those with the smallest distance from the search query vector. By contrast, in vector search higher score reflects greater similarity. Hyperspace bridges this gap using algebraic relations between the distance and score, as described below.

Metric
Distance Measure
Hyperspace Score

l2

d(X,Y)=i(xiyi)2d(X, Y) = \sum_{i} (x_i - y_i)^2

11+d\frac{1}{1 + d}

ip

d(X,Y)=ixiyid(X,Y)=-∑_i x _i ⋅y_ i

{11+dif d01dd<0\begin{cases} \frac{1}{1 + d} & \text{if } d \geq 0 \\ 1 - d & \text{d<0} \end{cases}

hamming

d(X,Y)=i=1nδ(xi,yi)d(X,Y)= \sum_{i=1}^{n} \delta(x_i, y_i)

11+d\frac{1}{1 + d}

Euclidean Metric (L2)

The L2L2 metric represents the distance between 2 points in a Euclidean space. The metric is derived from the Pythagorean theorem and represents the length of the shortest path between the two points.

The L2L2 metric is a special case of the LpLp metric, given by

Lp(X,Y)=(ixiyip)1pL^p(X, Y) = \left( \sum_{i} |x_i - y_i|^p \right)^{\frac{1}{p}}

In Hyperspace, we use the squared L2L2 metric.

Inner Product (ip)

The inner product metric, denoted as XYX \cdot Y , is a measure of similarity between two vectors. The mathematical operation calculates the projection of one vector on the other, which is 1 if the vectors are parallel and 0 if they are perpendicular. In a Euclidean distance, the inner product is given by

XY=ixiyiX \cdot Y =∑_i x _i ⋅y_ i

Hamming Distance (hamming)

The Hamming distance quantifies the difference between two strings of equal length by measuring the number of positions at which the corresponding symbols differ. For binary strings, it counts the bits that are different between two binary vectors. The lower the Hamming distance, the more similar the strings are considered to be, and vice versa.

The hamming distance is calculated as

HammingDistance(X,Y)=i=1nδ(xi,yi)\text{HammingDistance}(X, Y) = \sum_{i=1}^{n} \delta(x_i, y_i)

where δ(xi,yi)={1if xi=yi0otherwise \delta(x_i, y_i) =\begin{cases} 1 & \text{if } x_i=y_i\\ 0 & \text{otherwise} \end{cases} is the Kronecker delta. The above formula counts the number of characters that are different between XX and YY .

Last updated

#108: Max's Nov 6 changes

Change request updated