Hyperspace Docs
Hyperspace Homepage
  • Getting started
    • Overview
      • Hyperspace Advantages
      • Hyperspace Search
    • Quick Start
  • flows
    • Setting Up
      • Installing the Hyperspace API Client
      • Connecting to the Hyperspace Server
      • Creating a Database Schema Configuration File
        • Vector Similarity Metrics
        • Index Type Methods
      • Creating a Collection
      • Uploading Data to a Collection
      • Building and Running Queries
        • Building a Lexical Search Query
        • Building a Vector Search Query
        • Building a Hybrid Search Query
      • Retrieving Results
    • Data Collections
      • Uploading Data
      • Accessing Data
      • Supported Data Types
    • Queries
      • DSL Query interface
        • Aggregations
        • Bool Query
        • Candidate Generation and Metadata Filtering
        • Scoring and Ranking
  • Reference
    • Hyperspace Query Flow
    • Features and Benefits
    • Search Processing Unit (SPU)
    • Hyperspace Document Prototype
  • API Documentation
    • Hyperspace Client
      • add_batch
      • add_document
      • async_req
      • clear_collection
      • collections_info
      • commit
      • create_collection
      • delete_collection
      • delete_by_query
      • dsl_search
      • get_schema
      • get_document
      • reset_password
      • search
      • update_by_query
      • update_document
    • DSL Query Framework
      • Aggregations
        • Cardinality Aggregation
        • Date Histogram
        • Metric Aggregations
        • Terms Aggregation
      • Bool Queries
        • Free Text Search
        • 'match' Clause
        • 'filter' Clause
        • 'must' Clause
        • 'must_not' Clause
        • 'should' Clause
        • 'should_not' Clause
      • Candidate Generation and Metadata Filtering
        • Geo Coordinates Match
        • Range Match
        • Term Match
      • Scoring and Ranking
        • Boost
        • 'dis_max'
        • Function Score
        • Rarity Score (TF-IDF)
  • Releases
    • 2024 Releases
Powered by GitBook
On this page
  • Euclidean Metric (l2)
  • Inner Product (ip)
  • Hamming Distance (hamming)
  1. flows
  2. Setting Up
  3. Creating a Database Schema Configuration File

Vector Similarity Metrics

The vector search Metric defines the distance measure between vectors during the vector search. From a mathematical point of view, the K nearest neighbors are those with the smallest distance from the search query vector.

In vector search higher score reflects greater similarity, or smaller distance. Thus, a algebraic conversion from distance to score is required. This relation is given in the Hyperspace Score column.

Metric
Distance Measure
Hyperspace Score

l2

ip

hamming

Euclidean Metric (l2)

The L2L2L2 metric quantifies the similarity as the distance between two points in a Euclidean space. The metric is derived from the Pythagorean theorem and represents the length of the shortest path between the two points:

L2(X,Y)=(∑i∣xi−yi∣2)12L^2(X, Y) = \left( \sum_{i} |x_i - y_i|^2 \right)^{\frac{1}{2}}L2(X,Y)=(i∑​∣xi​−yi​∣2)21​

Note that the L2L^2L2 metric is a special case of the LpL^pLp metric -

Lp(X,Y)=(∑i∣xi−yi∣p)1pL^p(X, Y) = \left( \sum_{i} |x_i - y_i|^p \right)^{\frac{1}{p}}Lp(X,Y)=(i∑​∣xi​−yi​∣p)p1​

Hyperspace uses the squared L2L2L2 metric for calculation efficiency. This does not affect the order of the candidates.

Inner Product (ip)

The inner product metric quantifies the similarity between two vectors as -1 times the projection of one vector on the other, which is -1 if the vectors are parallel and 0 if they are perpendicular. The minus sign ensures that minimal distance corresponds to maximum similarity. In a Euclidean vector space, the inner product is:

IP(X,Y)=−∑ixi⋅yiIP(X,Y) =-∑_i x _i ⋅y_ i IP(X,Y)=−i∑​xi​⋅yi​

Vectors must be normalized before using the IP metric.

Hamming Distance (hamming)

The Hamming distance metric quantifies the similarity between two lists by measuring the number of positions at which the corresponding symbols differ. The metric operates over binary strings, and counts the bits that are different between two binary vectors. The lower the Hamming distance, the more similar the strings are considered to be, and vice versa.

The hamming distance is given by

Hamming(X,Y)=∑i=1nδ(xi,yi)\text{Hamming}(X, Y) = \sum_{i=1}^{n} \delta(x_i, y_i) Hamming(X,Y)=i=1∑n​δ(xi​,yi​)
PreviousCreating a Database Schema Configuration FileNextIndex Type Methods

Last updated 1 year ago

where δ(xi,yi)={1if xi=yi0otherwise \delta(x_i, y_i) =\begin{cases} 1 & \text{if } x_i=y_i\\ 0 & \text{otherwise} \end{cases}δ(xi​,yi​)={10​if xi​=yi​otherwise​ is the . The above formula counts the number of characters that are different between XX X and YYY .

d(X,Y)=∑i(xi−yi)2d(X, Y) = \sum_{i} (x_i - y_i)^2d(X,Y)=∑i​(xi​−yi​)2
11+d\frac{1}{1 + d}1+d1​
d(X,Y)=−∑ixi⋅yid(X,Y)=-∑_i x _i ⋅y_ i d(X,Y)=−∑i​xi​⋅yi​
{11+dif d≥01−dd<0\begin{cases} \frac{1}{1 + d} & \text{if } d \geq 0 \\ 1 - d & \text{d<0} \end{cases}{1+d1​1−d​if d≥0d<0​
d(X,Y)=∑i=1nδ(xi,yi)d(X,Y)= \sum_{i=1}^{n} \delta(x_i, y_i) d(X,Y)=∑i=1n​δ(xi​,yi​)
11+d\frac{1}{1 + d}1+d1​
Kronecker delta