The schema files outline the data structure, index and metric types, and similar configurations. More info can be found in the configuration file section.
Create a collection
Copy the following code snippet to create a collection
schema – Specifies the python dictionary that outlines the configuration schema.
'collection_name' – Specifies the name of the collection to be created in the Hyperspace database.
Upload Data
Data can be uploaded in batches. Copy the following code snippet to upload data
batch_size = 250
batch = []
for i, data_point in enumerate(documents):
batch.append(data_point)
if (i+1) % batch_size == 0:
response = hyperspace_client.add_batch(batch, collection_name)
batch.clear()
if batch:
response = hyperspace_client.add_batch(batch, collection_name)
hyperspace_client.commit(collection_name)
import java.util.ArrayList;
final int batchSize = 250;
for (int i= 0; index < documents.size(); i++) {
batch.add(documents.get(i));
if ((i+ 1) % batchSize == 0) {
List<DataPoint> batchCopy = new ArrayList<>(batch);
futures.add(hyperspaceClient.addBatch(batchCopy, collectionName));
batch.clear();
}
}
if (!batch.isEmpty()) {
futures.add(hyperspaceClient.addBatch(new ArrayList<>(batch), collectionName));
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
hyperspaceClient.commit(collectionName).join();
let BATCH_SIZE = 250;
let batch: any[] = [];
let collectionName = "new_collection";
for (const [i, document] of documents.entries()) {
batch.push(document);
if ((i + 1) % BATCH_SIZE == 0) {
await client.addBatch(collectionName, batch);
batch = [];
}
}
if (batch.length != 0) {
await client.addBatch(collectionName, batch);
}
await client.commit(collectionName)
Where –
data_point – Represents the document to upload. Each document must have dictionary like structure with a keys according to the database schema configuration file.
batch_size – Specifies the number of documents in a batch.
commit is required for vector search only
Build and run a query (Python only)
Hyperspace queries can be of one of the following types –
Lexical Search
Vector Search
Hybrid Search
Lexical search can be performed in DSL syntax, or as using a score function of the following form:
def score_function (params , doc) :
score = 0.0
if match ('metadata field 1'):
score = 1.0
if match ('metadata field 1'):
score 2.0
return score
To set a hybrid or lexical search query –
Specify that this score function file is to be used for the Search, as follows –
JsonObject params = new JsonObject();
params.add("name", new JsonPrimitive("John"));
JsonObject query = new JsonObject();
query.add("query", params);
Object response = client.search(collectionName, 10, query, "my_score_function");
const size = 10;
let params= {
"name": "John"
}
let functionName = 'score_function';
await hyperspaceClient.search(collectionName, size, params, functionName)
query_body is the query in DSL syntax. query_body must have a similar structure to the database documents, according to the query schema config file. If query_body includes fields of type
results is a dictionary with two keys – {'similarity': {}, 'took_ms': ..}
took_ms – is a float value that specifies how long the query took to run, such as 8.73ms
similarity – Returns a list. Each element of the list represents a matching document. For each document, it specifies the score and the vector_id that you can use to retrieve the document from the Collection.
Here is an example of what results might look like if they were printed on the screen –
In this scenario, each entry in results['similarity'] includes a key named "fields", that includes the fields "title" and "date" per retrieved document.