Hyperspace Hybrid Search
Vector search is a widely-used technique for finding similar items using vector representations. It involves vector embedding, which turns data into high-dimensional vectors that embody their essential traits. Instead of traditional keyword or exact matching, Vector Search identifies similar items by measuring the closeness or similarity between these vectors rather than relying on traditional keyword matching or exact matches. This approach is frequently used in a wide variety of applications, such as recommendation systems and content search.
Keywords and metadata hold essential information about data. Keyword-based search (also called Classic Search) takes advantage of this by matching keywords and values to find related items. Similarity search translates these keywords and values into queries to quickly identify objects with shared characteristics or patterns. This approach minimizes computational effort. Similarity search is a relatively fast method and aims to minimize the computational investment required to identify similar items, thus enabling fast retrieval. Just like Vector Search, this method is commonly used in a wide variety of applications.
Hybrid Search combines vector and keyword/metadata searches, offering a versatile solution that caters to various types of applications, delivering the best of all worlds.
Hyperspace's engine allows hybrid search that merges full classic search and vector search. This combination leverages the strengths of each search type, offering a more accurate and comprehensive search experience.
Hyperspace enables enhanced search performance through smart indexing. Hyperspace's similarity search engine employs an inverted index that supports various data field types, ensuring O(1) data access. The search performance is further improved by providing cardinality hints at the configuration level.
Hyperspace offers similar advantages in the realm of Vector Search. In particular, it is optimized for graph search in many cases in which CPU-based search suffers from poor performance. In many scenarios, Vector Search uses graph technology. To fully employ the CPU cycle, the data should be cached to some extent. However, graphs commonly suffer from low predictability, thus reducing the ability for efficient caching.
For Hybrid search that combines accurate KNN (brute force) with metadata filtering, Hyperspace uses the pre-filtering approach, by which the documents are first filtered using classic search, and the vector similarity is only calculated for documents that pass the initial filtering. For KNN, this approach optimizes the query latency, without reducing its recall.
While the brute force optimizes latency without reducing search recall for brute force KNN, for approximate nearest neighbor (ANN) calculation, this approach can lead to a sparse graph and reduction of recall. In order to avoid this outcome, for ANN, Hyperspace uses the post-filtering approach, by which the matching is performed before the metadata filtering. This approach optimizes the query recall, at the expanse of latency.
Illustration of the pre process effect on approximated indexes (HNSW, IVF, etc.)