Quick Start

This guide explains how to set up the Hyperspace database in minutes.

To start using Hyperspace, follow these steps:

1. Install the Hyperspace API Client

Run the following shell command in your code or your data terminal –

d host address, use the following code to connect to the database through the Hyperspace API.

pip install hyperspace-py

npm install https://github.com/hyper-space-io/hyperspace-js

for more information, see here.

2. Create a local instance of the Hyperspace client

Once you receive credentials and host address, use the following code to connect to the database through the Hyperspace API.

hyperspace_client = hyperspace.HyperspaceClientApi(host=host_address,
                                                      username=username,
                                                      password=password)

import io.hyperspace.client.HyperspaceClient;
HyperspaceClient client = new HyperspaceClient(host, username, password);

const hs = require('hyperspace-js')
const hyperspaceClient = new hs.HyperspaceClient(host, username, password)

3. Run Hyperspace queries

Create a schema file

The schema files outline the data structure, index and metric types, and similar configurations. More info can be found in the configuration file section.

Create a collection

Copy the following code snippet to create a collection

collection_name = 'new_collection'
hyperspace_client.create_collection('schema.json', collection_name)

JsonObject schema = (JsonObject) 
JsonParser.parseReader(new FileReader("schema.json"));
client.createCollection(collectionName, schema);

const collection_name = 'new_collection'
await hyperspaceClient.createCollection('schema.json', collection_name)

Where –

'schema.json' – Specifies the path to the configuration file that you created locally on your machine.
collection_name' – Specifies the name of the collection to be created in the Hyperspace database.

Alternatively, you can define the database config schema as a local python object

schema = {
        "configuration": {
            "name": {
                "type": "keyword"
            },
            "id": {
                "type": "keyword",
                "id": True,
            }
        }
    }
hyperspace_client.create_collection(schema, 'collection_name')

String schema = "{" +
                "  \"configuration\": {" +
                "  \"name\": {" +
                "               \"type\":\"keyword\"" +
                "            }" +
                "  \"id\":   {" +
                "               \"type\":\"keyword\"" +
                "               \"id\":\"true\"" +                
                "            }" +    
                "        }" +
                "      }";
                

hyperspaceClient.createCollection(collectionName, schema);

const schema = {
    "configuration": {
        "name": {
            "type": "keyword"
        },
        "id": {
            "type": "keyword",
            "id": true,
        }
    }
};

await hyperspaceClient.createCollection(collectionName, schema);

Where –

schema – Specifies the python dictionary that outlines the configuration schema.
'collection_name' – Specifies the name of the collection to be created in the Hyperspace database.

Upload Data

Data can be uploaded in batches. Copy the following code snippet to upload data

batch_size = 250
batch = []

for i, data_point in enumerate(documents):
   batch.append(data_point)
   if (i+1) % batch_size == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      batch.clear()
      
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
  
hyperspace_client.commit(collection_name)

import java.util.ArrayList;
final int batchSize = 250;

for (int i= 0; index < documents.size(); i++) {
    batch.add(documents.get(i));
    if ((i+ 1) % batchSize == 0) {
          List<DataPoint> batchCopy = new ArrayList<>(batch);
          futures.add(hyperspaceClient.addBatch(batchCopy, collectionName));
          batch.clear();
      }    
}

if (!batch.isEmpty()) {
    futures.add(hyperspaceClient.addBatch(new ArrayList<>(batch), collectionName));
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
hyperspaceClient.commit(collectionName).join();

let BATCH_SIZE = 250;
let batch: any[] = [];
let collectionName = "new_collection";

for (const [i, document] of documents.entries()) {
    batch.push(document);
    if ((i + 1) % BATCH_SIZE == 0) {
        await client.addBatch(collectionName, batch);
        batch = [];
    }
}

if (batch.length != 0) {
    await client.addBatch(collectionName, batch);
}
await client.commit(collectionName)

Where –

data_point – Represents the document to upload. Each document must have dictionary like structure with a keys according to the database schema configuration file.
batch_size – Specifies the number of documents in a batch.
commit is required for vector search only

Build and run a query (Python only)

Hyperspace queries can be of one of the following types –

Lexical Search
Vector Search
Hybrid Search

Lexical search can be performed in DSL syntax, or as using a score function of the following form:

 def score_function (params , doc) :
     score = 0.0
     if match ('metadata field 1'):
       score = 1.0
       if match ('metadata field 1'):
          score 2.0
 return score

To set a hybrid or lexical search query –

Specify that this score function file is to be used for the Search, as follows –

hyperspace_client.set_function(score_function_name,
                                collection_name=collection_name,
                                function_name='score_function')

String function = Files.readString(Paths.get("score_function.py"));
client.setFunction(collectionName, "score_function", function);

await hyperspaceClient.setFunction(score_function_name,
                                collection_name=collection_name,
                                function_name='score_function')

To run a hybrid or lexical search query –

define the query schema and run

params= {
         "name": "John"
        }
results = hyperspace_client.search(params,
                                   size=10,                 
                                   collection_name=collection_name
                                   function_name='score_function')

JsonObject params = new JsonObject();
params.add("name", new JsonPrimitive("John"));
JsonObject query = new JsonObject();
query.add("query", params);

Object response = client.search(collectionName, 10, query, "my_score_function");

const size = 10;
let params= {
    "name": "John"
}
let functionName = 'score_function';
await hyperspaceClient.search(collectionName, size, params, functionName)

query_body is the query in DSL syntax. query_body must have a similar structure to the database documents, according to the query schema config file. If query_body includes fields of type

To run a lexical search query in DSL syntax–

define the query schema and run

results = hyperspace_client.dsl_search({'params': query_body},
                                   size=10,                 
                                   collection_name=collection_name)

String queryJson =  "{" +
                    "  \"query\": {" +
                    "    \"bool\": {" +
                    "      \"must\": [" +
                    "        {" +
                    "          \"term\":{" +
                    "            \"name\":\"John\"" +
                    "           }" +
                    "        }" +
                    "      ]" +
                    "    }" +
                    "  }" +
                    "}";
JsonObject query = JsonParser.parseString(queryJson).getAsJsonObject();
Object response = hyperspaceClient.dslSearch(collectionName, 10, query));
JsonObject queryResponse = new Gson().toJsonTree(response).getAsJsonObject();
System.out.println(queryResponse);

const size = 10;
const query = {
    "query": {
        "bool": {
            "must": [
                {"term": {"name": "John"}}
            ]
        }
    }
}
await hyperspaceClient.search(collectionName, size, query)

query_body is the query in DSL syntax.

results is a dictionary with two keys – {'similarity': {}, 'took_ms': ..}

took_ms – is a float value that specifies how long the query took to run, such as 8.73ms
similarity – Returns a list. Each element of the list represents a matching document. For each document, it specifies the score and the vector_id that you can use to retrieve the document from the Collection.

Here is an example of what results might look like if they were printed on the screen –

print(results['similarity'])

[{'score: 513.7000122070312, 'vector_id': '78254'}, {'score: 512.5500126784442, 'vector_id': '23091'}, {'score: 485.5471220787652, 'vector_id': '85432'}]

You can retrieve additional document fields in the query, using the "fields" keyword.

To run a lexical search query in DSL syntax–

define the query schema and run

query = {
    "query": {
        "bool": {
            "must": [
                {"term": {"name": "John"}}
            ]
        }
    }
}
results = hyperspace_client.search({'params': query_body},
                                   size=10,                 
                                   collection_name=collection_name
                                   function_name='score_function',
                                   fields = ["title", "date"])

String queryJson =  "{" +
                    "  \"query\": {" +
                    "    \"bool\": {" +
                    "      \"must\": [" +
                    "        {" +
                    "          \"term\":{" +
                    "            \"name\":\"John\"" +
                    "           }" +
                    "        }" +
                    "      ]" +
                    "    }" +
                    "  }" +
                    "}";
JsonObject query = JsonParser.parseString(queryJson).getAsJsonObject();
Object response = client.dslSearch(collectionName, 10, query));

const size = 10;
const query = {
    "query": {
        "bool": {
            "must": [
                {"term": {"name": "John"}}
            ]
        }
    }
}
await hyperspaceClient.dslSearch(collectionName, size, query,
                                    fields = ["title", "date"])

query_body is the query in DSL syntax.

In this scenario, each entry in results['similarity'] includes a key named "fields", that includes the fields "title" and "date" per retrieved document.

a more detailed guide is available here.

PreviousHyperspace Search NextSetting Up

Last updated 11 months ago