Hyperspace Hybrid Search

Keywords and metadata hold essential information about data. Keyword-based search (also called Classic Search) leverages this information by matching keywords and values to find related items. Similarity Search translates these keywords and values into queries to quickly identify objects with shared characteristics or patterns. This approach minimizes computational effort. Similarity Search is a relatively fast method that minimizes the computational investment required to identify similar items, thus enabling fast retrieval. Just like Vector Search, this method is commonly used in a wide variety of applications.

Vector Search is a widely used technique for finding similar items using vector representations. It involves vector embedding, which turns data into high-dimensional vectors that embody their essential traits. Instead of traditional keyword or exact matching, Vector Search identifies similar items by measuring the closeness or similarity between these vectors rather than relying on traditional keyword matching or exact matches. This approach is frequently used in a wide variety of applications, such as recommendation systems and content search.

Hybrid Search combines vector and keyword/metadata searches, offering a versatile solution that caters to various types of applications and delivers the best of both worlds.

Hyperspace's engine allows hybrid search that merges full Classic Search and Vector Search. This combination leverages the strengths of each search type, offering a more accurate and comprehensive search experience.

Hyperspace Hybrid Index

Hyperspace enables enhanced search performance through smart indexing. Hyperspace's similarity search engine employs an inverted index that supports various data field types, ensuring O(1) data access. The search performance is further improved by providing cardinality hints at the configuration level.

Hyperspace offers similar advantages in the realm of Vector Search. In particular, it is optimized for Graph Search in many cases where CPU-based search suffers from poor performance. In many scenarios, Vector Search uses graph technology. To fully employ the CPU cycle, the data should be cached to some extent. However, graphs commonly suffer from low predictability, thus reducing the ability for efficient caching.

Post- and Pre-Filtering

Hybrid K Nearest Neighbors (KNN)

For Hybrid search that combines accurate KNN (brute force) with metadata filtering, Hyperspace uses the pre-filtering approach. In this approach, the documents are first filtered using Classic Search. Vector similarity is then calculated only for documents that pass this initial filtering. For KNN, this approach optimizes the query latency without reducing its recall.​

Hybrid Approximate Nearest Neighbors (ANN)

While brute force optimizes latency without reducing search recall for brute force KNN, for the Approximate Nearest Neighbor (ANN) calculation, this approach can lead to a sparse graph and reduction of recall.

To avoid this outcome, Hyperspace uses the post-filtering approach for ANN, in which the matching is performed before the metadata filtering. This approach optimizes query recall at the expense of latency.

Last updated