Overview

An introduction to the Hyperspace hybrid search database.

Hyperspace is a search database for AI-driven applications, combining high performance with maximum search relevancy. This cloud-native, hybrid search database is managed for you, eliminating infrastructure complexities. Hyperspace features an easy-to-use API and excels in delivering query results with minimal latency, even when dealing with billions of documents.

Fast queries

Using designated processing units in the cloud, Hyperspace provides latencies 10 to 100 times faster than industry benchmarks, at a reduced cost. Its query syntax is native Python and supports advanced filtering and scoring functionality in keyword searches, vector searches, and hybrid searches.

Hyperspace is built to power a wide variety of AI applications, including real-time recommendations, search, generation, fraud prevention, and threat detection.

Relevant search results

Hyperspace's hybrid search combines vector-based search and keyword search, allowing a versatile and efficient approach to information retrieval. While vector search tends to excel at capturing semantic relationships, it behaves unexpectedly in many cases. A keyword search can pinpoint explicit matches and retrieve documents based on specific terms, improving relevancy when vector search falls short. By combining these two methods, hybrid search allows comprehensive results with high relevancy.

Hyperspace stores vectors and metadata

Hyperspace collections hold documents that contain fields with designated types. The list of supported types is available under data types. Most fields support the use of list types, where vector field type is used for vector and hybrid search. Other types (metadata) are used in lexical and hybrid search.

Hyperspace document prototype

document = {
        'some_counter'       : 8,                        # scalar integer
        'visit_times_1y'     : ['Saturday, August 21, 2010 11:22:31 AM','Friday, August 20, 2010 7:35:51 AM']          
        'field0_embeded32'   : [0.45,0.99,0.543,0.324],  # K elements vector
        'field2_int'         : None,                     # null integer
        'filed3_float'       : 7.45,                     # scalar float
        'filed4_list_float'  : [7.459],                  # list of float - 1 element 
        'filed5_list_float'  : [7.459, 3.4],             # list of float - 2 element 
        'filed6_list_float'  : [],                       # list of float - 0 element 
        'filed7_str'         : None,                     # scalar string - null
        'filed8_str'         : 'jojo',                   # scalar string
        'filed10_list_str'   : [],                       # list of string - 0 element 
        'filed11_list_str'   : ['jojo'],                 # list of string - 1 element 
        'filed12_list_str'   : ['jojo', 'koko']          # list of string - 2 element
        # list of time - 2 elements
        'filed21_list_ip'    : [171.180.143.162, 211.34.144.18, 35.115.68.135]          
        # list of IPs - 3 elements
        # many more fields
}

Creating collections and ingesting documents

Create and manage collections, upload and modify data:

Creating collection, add two documents to the collection
hyperspace_client.create_collection('schema.json', 'collection_name')

documents = [
              {'document_id': '1',
               'metadata field 1: 'value 1',
               'metadata field 2: 'value 2',
               'dense_vector 1': [0.85,0.2,0.2, 0.1] 
              },
              {'document_id': '2',
               'metadata field 1: 'value 4'
               'dense_vector 1': [0.2,0.1,0.2, 0.85],
               'dense_vector 2': [0.9,0.3,0.3, 0.1] 
               },
              ]

batch = [hyperspace.document(str(i), data_point) for i, data_point in enumerate(documents)]

hyperspace_client.add_batch(batch, collection_name)

Creating hybrid queries

The score function is the way you can specify the keyword search behavior. It allows filtering and scoring (including TF/IDF). Below is an example of a simple score function that filters the results based on two fields and applies score manipulation. More details on score functions in the next chapters.

Basic score function
def score_function ( params, V ) :
   score = 0.0
   if match ( 'metadata 1' ):
       score = 1.0
   if match ( 'metadata 2' ) :
       score = 2.0
   else:
       score = score + 1.0
   return score

Next, we specify the hybrid search behavior. Here is a simple keyword search followed by a vector search on the resulting documents. We specify how both scores will be merged via the boost parameter, in this example, the keyword search is given twice the weight of the vector search. Document 2 is used for the query documents.

Basic hybrid query
query_document = hyperspace_client.get_document(document_id='2',
                                            collection_name=collection_name)
hybrid_query = {
     'params': query_document,
     'knn': {
         'query': {'boost': 2},
         'vector': {'boost': 1}
      }
}

Running the query

Lastly, we call the search API, in this call we specify the name of the hybrid query, the number of documents to return, the score function name, and the collection name. The result of this call goes to the 'result' dictionary, containing the top document ids along with their scores.

Calling the search API
results = hyperspace_client.search(hybrid_query,
                                   size=15,
                                   function_name='score_function',
                                   collection_name=collection_name)

That's our first query!

In the following chapters, we'll discuss these features in greater detail.

Last updated

#108: Max's Nov 6 changes

Change request updated