Quick Start

This guide explains how to set up the Hyperspace database in minutes.

To start using Hyperspace, follow these steps:

1. Install the Hyperspace API Client

Run the following shell command in your code or your data terminal –

d host address, use the following code to connect to the database through the Hyperspace API.

pip install hyperspace-py

for more information, see here.

2. Create a local instance of the Hyperspace client

Once you receive credentials and host address, use the following code to connect to the database through the Hyperspace API.

hyperspace_client = hyperspace.HyperspaceClientApi(host=host_address,
                                                      username=username,
                                                      password=password)

3. Run Hyperspace queries

Create a schema file

The schema files outline the data structure, index and metric types, and similar configurations. More info can be found in the configuration file section.

Create a collection

Copy the following code snippet to create a collection

collection_name = 'new_collection'
hyperspace_client.create_collection('schema.json', collection_name)

Where

  • 'schema.json' – Specifies the path to the configuration file that you created locally on your machine.

  • collection_name' – Specifies the name of the collection to be created in the Hyperspace database.

Alternatively, you can define the database config schema as a local python object

schema = {
        "configuration": {
            "name": {
                "type": "keyword"
            },
            "id": {
                "type": "keyword",
                "id": True,
            }
        }
    }
hyperspace_client.create_collection(schema, 'collection_name')

Where

  • schema – Specifies the python dictionary that outlines the configuration schema.

  • 'collection_name' – Specifies the name of the collection to be created in the Hyperspace database.

Upload Data

Data can be uploaded in batches. Copy the following code snippet to upload data

batch_size = 250
batch = []

for i, data_point in enumerate(documents):
   batch.append(data_point)
   if (i+1) % batch_size == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      batch.clear()
      
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
  
hyperspace_client.commit(collection_name)

Where

  • data_point – Represents the document to upload. Each document must have dictionary like structure with a keys according to the database schema configuration file.

  • batch_size – Specifies the number of documents in a batch.

  • commit is required for vector search only

Build and run a query (Python only)

Hyperspace queries can be of one of the following types –

  • Lexical Search

  • Vector Search

  • Hybrid Search

Lexical search can be performed in DSL syntax, or as using a score function of the following form:

 def score_function (params , doc) :
     score = 0.0
     if match ('metadata field 1'):
       score = 1.0
       if match ('metadata field 1'):
          score 2.0
 return score

To set a hybrid or lexical search query –

Specify that this score function file is to be used for the Search, as follows –

hyperspace_client.set_function(score_function_name,
                                collection_name=collection_name,
                                function_name='score_function')

To run a hybrid or lexical search query –

define the query schema and run

params= {
         "name": "John"
        }
results = hyperspace_client.search(params,
                                   size=10,                 
                                   collection_name=collection_name
                                   function_name='score_function')

query_body is the query in DSL syntax. query_body must have a similar structure to the database documents, according to the query schema config file. If query_body includes fields of type

To run a lexical search query in DSL syntax–

define the query schema and run

results = hyperspace_client.dsl_search({'params': query_body},
                                   size=10,                 
                                   collection_name=collection_name)

query_body is the query in DSL syntax.

results is a dictionary with two keys – {'similarity': {}, 'took_ms': ..}

  • took_ms – is a float value that specifies how long the query took to run, such as 8.73ms

  • similarity – Returns a list. Each element of the list represents a matching document. For each document, it specifies the score and the vector_id that you can use to retrieve the document from the Collection.

Here is an example of what results might look like if they were printed on the screen –

print(results['similarity']) 

[{'score: 513.7000122070312, 'vector_id': '78254'}, {'score: 512.5500126784442, 'vector_id': '23091'}, {'score: 485.5471220787652, 'vector_id': '85432'}]

You can retrieve additional document fields in the query, using the "fields" keyword.

To run a lexical search query in DSL syntax–

define the query schema and run

query = {
    "query": {
        "bool": {
            "must": [
                {"term": {"name": "John"}}
            ]
        }
    }
}
results = hyperspace_client.search({'params': query_body},
                                   size=10,                 
                                   collection_name=collection_name
                                   function_name='score_function',
                                   fields = ["title", "date"])

query_body is the query in DSL syntax.

In this scenario, each entry in results['similarity'] includes a key named "fields", that includes the fields "title" and "date" per retrieved document.

a more detailed guide is available here.

Last updated