Overview

Hyperspace is a cloud search database that leverages cloud hardware to enhance search speed and relevancy. Hyperspace uses a Search Processing Unit (SPU) virtual chip—a domain-specific architecture optimized for search tasks, to provide unmatched performance for real-time applications at scale, maintaining cost efficiency without compromising over logic complexity.

Hyperspace is a managed SaaS solution, combining hardware-level speed with software-level flexibility and designed to support a wide range of AI applications such as real-time recommendations, fraud prevention, Ad-tech, RAG, and threat detection.

Fast queries at scale

Hyperspace excels in delivering query results with minimal latency, even when dealing with billions of documents. Using designated processing units in the cloud, Hyperspace provides latencies that are 10 to 100 times faster than industry benchmarks, all while reducing costs. Furthermore, Hyperspace's search functionality is designed to operate at an extremely large scale without compromising on performance or stability.

Relevant search results

Hyperspace's hybrid search combines keyword search, term and value matching, with vector-based search, allowing a versatile and efficient approach to information retrieval. While vector search tends to excel at capturing semantic relationships, it behaves unexpectedly in many cases. A keyword search can pinpoint explicit matches and retrieve documents based on specific terms, improving relevancy when vector search falls short. Hyperspace hybrid search allows to create complicated functions, combining these two methods, and allowing comprehensive results with high relevancy.

Hyperspace documents store vectors and metadata

Hyperspace documents include fields with a variety of types, including keywords, numerical values, lists and vectors. The list of supported types is available under data_types.

Creating collections and ingesting documents

Hyperspace stores data under collections, each with its own index. Hyperspace allows to easily create and manage data collections, upload and modify data:

hyperspace_client.create_collection('schema.json', 'collection_name')

documents = [
              {'document_id': '1',
               'field 1: 'value 1',
               'field 2: 'value 2',
               'dense_vector 1': [0.85,0.2,0.2, 0.1] 
              },
              {'document_id': '2',
               'field 1: 'value 4'
               'dense_vector 1': [0.2,0.1,0.2, 0.85],
               'dense_vector 2': [0.9,0.3,0.3, 0.1] 
               },
              ]
              
hyperspace_client.add_batch(documents, collection_name)

Creating keyword based queries

Hyperspace allows you to run search queries in standard DSL

search_query = 
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "genres": "genres_value"
              }
            },
            {
              "term": {
                "adult": "adult_value"
              }
            },
            {
              "bool": {
                "must_not": [
                  {
                    "term": {
                      "title": "title_value"
                    }
                  }
                ]
              }
            }
          ],
          "should": [
            {
              "range": {
                "rating": {
                  "gt": 7.0
                }
              }
            }
          ]
        }
      },
      "boost_mode": "multiply",
      "boost": 2.0
    }
  }
}

Submitting a DSL query

We call the search_dsl API to submit the query. In this call we specify the name of the query, the number of documents to return, and the collection name. The result of this call goes to the 'result' dictionary, containing the top document ids along with their scores.

results = hyperspace_client.search_dsl(search_query,
                                   size=10,
                                   collection_name=collection_name)

That's our first query!

In the following chapters, we'll discuss these features in greater detail.

Creating hybrid queries

Hyperspace score function allows you to specify the keyword search and vector search behavior in python syntax. Score functions allow filtering and scoring (including TF/IDF). Below is an example of a simple hybrid score function that filters the results based on two fields and applies score manipulation.

Basic score function
def score_function ( params, V ) :
   score = 0.0
   if match ( 'metadata 1' ):
       score = 1.0
   if match ( 'metadata 2' ) :
       score = 2.0
   else:
         score = score + 1.0
   return score + 2.0 * distance("vector_field")

Here is a simple keyword search followed by a vector search on the resulting documents. We specify how both scores will be merged via the boost parameter, in this example, the keyword search is given twice the weight of the vector search. Document 2 is used for the query documents.

hyperspace_client.set_function(score_function, collection_name)
hybrid_query = {
     'params': { 'metadata 1':{'Value 1, 'Value 2'}, 
                   'metadata 2':{'Value 3, 'Value 4'},
                   'vector_field": [....]},
     'knn': {                   
         'query': {'boost': 2},
         'vector': {'boost': 1}
      }      
}

In python client, you can provide either the function directly or the name of the file that contains the function. In Java client, you can only provide the name of the file.

Submitting a python query

Lastly, we call the search API to submit the query. In this call we specify the name of the hybrid query, the number of documents to return, the score function name, and the collection name. The result of this call goes to the 'result' dictionary, containing the top document ids along with their scores.

results = hyperspace_client.search(query_schema,
                                   size=15,
                                   function_name='score_function',
                                   collection_name=collection_name,
                                   async_req=True)

In the following chapters, we'll discuss these features in greater detail.

Last updated