Quick Start
This guide explains how to set up the Hyperspace database in minutes.
To start using Hyperspace, follow these steps:
1. Install the Hyperspace API Client
Run the following shell command in your code or your data terminal –
d host address, use the following code to connect to the database through the Hyperspace API.
pip install hyperspace-pynpm install https://github.com/hyper-space-io/hyperspace-jsfor more information, see here.
2. Create a local instance of the Hyperspace client
Once you receive credentials and host address, use the following code to connect to the database through the Hyperspace API.
hyperspace_client = hyperspace.HyperspaceClientApi(host=host_address,
username=username,
password=password)import io.hyperspace.client.HyperspaceClient;
HyperspaceClient client = new HyperspaceClient(host, username, password);const hs = require('hyperspace-js')
const hyperspaceClient = new hs.HyperspaceClient(host, username, password)3. Run Hyperspace queries
Create a schema file
The schema files outline the data structure, index and metric types, and similar configurations. More info can be found in the configuration file section.
Create a collection
Copy the following code snippet to create a collection
collection_name = 'new_collection'
hyperspace_client.create_collection('schema.json', collection_name)JsonObject schema = (JsonObject)
JsonParser.parseReader(new FileReader("schema.json"));
client.createCollection(collectionName, schema);const collection_name = 'new_collection'
await hyperspaceClient.createCollection('schema.json', collection_name)Where –
'schema.json' – Specifies the path to the configuration file that you created locally on your machine.
collection_name' – Specifies the name of the collection to be created in the Hyperspace database.
Alternatively, you can define the database config schema as a local python object
schema = {
"configuration": {
"name": {
"type": "keyword"
},
"id": {
"type": "keyword",
"id": True,
}
}
}
hyperspace_client.create_collection(schema, 'collection_name')String schema = "{" +
" \"configuration\": {" +
" \"name\": {" +
" \"type\":\"keyword\"" +
" }" +
" \"id\": {" +
" \"type\":\"keyword\"" +
" \"id\":\"true\"" +
" }" +
" }" +
" }";
hyperspaceClient.createCollection(collectionName, schema);const schema = {
"configuration": {
"name": {
"type": "keyword"
},
"id": {
"type": "keyword",
"id": true,
}
}
};
await hyperspaceClient.createCollection(collectionName, schema);Where –
schema – Specifies the python dictionary that outlines the configuration schema.
'collection_name' – Specifies the name of the collection to be created in the Hyperspace database.
Upload Data
Data can be uploaded in batches. Copy the following code snippet to upload data
batch_size = 250
batch = []
for i, data_point in enumerate(documents):
batch.append(data_point)
if (i+1) % batch_size == 0:
response = hyperspace_client.add_batch(batch, collection_name)
batch.clear()
if batch:
response = hyperspace_client.add_batch(batch, collection_name)
hyperspace_client.commit(collection_name)import java.util.ArrayList;
final int batchSize = 250;
for (int i= 0; index < documents.size(); i++) {
batch.add(documents.get(i));
if ((i+ 1) % batchSize == 0) {
List<DataPoint> batchCopy = new ArrayList<>(batch);
futures.add(hyperspaceClient.addBatch(batchCopy, collectionName));
batch.clear();
}
}
if (!batch.isEmpty()) {
futures.add(hyperspaceClient.addBatch(new ArrayList<>(batch), collectionName));
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
hyperspaceClient.commit(collectionName).join();let BATCH_SIZE = 250;
let batch: any[] = [];
let collectionName = "new_collection";
for (const [i, document] of documents.entries()) {
batch.push(document);
if ((i + 1) % BATCH_SIZE == 0) {
await client.addBatch(collectionName, batch);
batch = [];
}
}
if (batch.length != 0) {
await client.addBatch(collectionName, batch);
}
await client.commit(collectionName)Where –
data_point – Represents the document to upload. Each document must have dictionary like structure with a keys according to the database schema configuration file.
batch_size – Specifies the number of documents in a batch.
commitis required for vector search only
Build and run a query (Python only)
Hyperspace queries can be of one of the following types –
Lexical Search
Vector Search
Hybrid Search
Lexical search can be performed in DSL syntax, or as using a score function of the following form:
def score_function (params , doc) :
score = 0.0
if match ('metadata field 1'):
score = 1.0
if match ('metadata field 1'):
score 2.0
return scoreTo set a hybrid or lexical search query –
Specify that this score function file is to be used for the Search, as follows –
hyperspace_client.set_function(score_function_name,
collection_name=collection_name,
function_name='score_function')String function = Files.readString(Paths.get("score_function.py"));
client.setFunction(collectionName, "score_function", function);await hyperspaceClient.setFunction(score_function_name,
collection_name=collection_name,
function_name='score_function')To run a hybrid or lexical search query –
define the query schema and run
params= {
"name": "John"
}
results = hyperspace_client.search(params,
size=10,
collection_name=collection_name
function_name='score_function')JsonObject params = new JsonObject();
params.add("name", new JsonPrimitive("John"));
JsonObject query = new JsonObject();
query.add("query", params);
Object response = client.search(collectionName, 10, query, "my_score_function");const size = 10;
let params= {
"name": "John"
}
let functionName = 'score_function';
await hyperspaceClient.search(collectionName, size, params, functionName)query_body is the query in DSL syntax. query_body must have a similar structure to the database documents, according to the query schema config file. If query_body includes fields of type
To run a lexical search query in DSL syntax–
define the query schema and run
results = hyperspace_client.dsl_search({'params': query_body},
size=10,
collection_name=collection_name)String queryJson = "{" +
" \"query\": {" +
" \"bool\": {" +
" \"must\": [" +
" {" +
" \"term\":{" +
" \"name\":\"John\"" +
" }" +
" }" +
" ]" +
" }" +
" }" +
"}";
JsonObject query = JsonParser.parseString(queryJson).getAsJsonObject();
Object response = hyperspaceClient.dslSearch(collectionName, 10, query));
JsonObject queryResponse = new Gson().toJsonTree(response).getAsJsonObject();
System.out.println(queryResponse);const size = 10;
const query = {
"query": {
"bool": {
"must": [
{"term": {"name": "John"}}
]
}
}
}
await hyperspaceClient.search(collectionName, size, query)
query_body is the query in DSL syntax.
results is a dictionary with two keys – {'similarity': {}, 'took_ms': ..}
took_ms – is a float value that specifies how long the query took to run, such as 8.73ms
similarity – Returns a list. Each element of the list represents a matching document. For each document, it specifies the score and the vector_id that you can use to retrieve the document from the Collection.
Here is an example of what results might look like if they were printed on the screen –
print(results['similarity']) [{'score: 513.7000122070312, 'vector_id': '78254'}, {'score: 512.5500126784442, 'vector_id': '23091'}, {'score: 485.5471220787652, 'vector_id': '85432'}]
You can retrieve additional document fields in the query, using the "fields" keyword.
To run a lexical search query in DSL syntax–
define the query schema and run
query = {
"query": {
"bool": {
"must": [
{"term": {"name": "John"}}
]
}
}
}
results = hyperspace_client.search({'params': query_body},
size=10,
collection_name=collection_name
function_name='score_function',
fields = ["title", "date"])String queryJson = "{" +
" \"query\": {" +
" \"bool\": {" +
" \"must\": [" +
" {" +
" \"term\":{" +
" \"name\":\"John\"" +
" }" +
" }" +
" ]" +
" }" +
" }" +
"}";
JsonObject query = JsonParser.parseString(queryJson).getAsJsonObject();
Object response = client.dslSearch(collectionName, 10, query));const size = 10;
const query = {
"query": {
"bool": {
"must": [
{"term": {"name": "John"}}
]
}
}
}
await hyperspaceClient.dslSearch(collectionName, size, query,
fields = ["title", "date"])query_body is the query in DSL syntax.
In this scenario, each entry in results['similarity'] includes a key named "fields", that includes the fields "title" and "date" per retrieved document.
a more detailed guide is available here.
Last updated