To start using Hyperspace, follow these steps:
1. Install the Hyperspace API Client
Run the following shell command in your code or your data terminal –
d host address, use the following code to connect to the database through the Hyperspace API.
Python JavaScript
Copy pip install hyperspace - py
Copy npm install https : // github . com / hyper - space - io / hyperspace - js
for more information, see here .
2. Create a local instance of the Hyperspace client
Once you receive credentials and host address, use the following code to connect to the database through the Hyperspace API.
Python Java JavaScript
Copy hyperspace_client = hyperspace . HyperspaceClientApi (host = host_address,
username = username,
password = password)
Copy import io . hyperspace . client . HyperspaceClient ;
HyperspaceClient client = new HyperspaceClient(host , username , password) ;
Copy const hs = require ( 'hyperspace-js' )
const hyperspaceClient = new hs .HyperspaceClient (host , username , password)
3. Run Hyperspace queries
Create a schema file
The schema files outline the data structure, index and metric types, and similar configurations. More info can be found in the configuration file section.
Create a collection
Copy the following code snippet to create a collection
Python Java JavaScript
Copy collection_name = 'new_collection'
hyperspace_client . create_collection ( 'schema.json' , collection_name)
Copy JsonObject schema = (JsonObject)
JsonParser . parseReader ( new FileReader( "schema.json" ) );
client . createCollection (collectionName , schema);
Copy const collection_name = 'new_collection'
await hyperspaceClient .createCollection ( 'schema.json' , collection_name)
Where –
'schema.json ' – Specifies the path to the configuration file that you created locally on your machine.
collection_name ' – Specifies the name of the collection to be created in the Hyperspace database.
Alternatively, you can define the database config schema as a local python object
Python Java JavaScript
Copy schema = {
"configuration" : {
"name" : {
"type" : "keyword"
},
"id" : {
"type" : "keyword" ,
"id" : True ,
}
}
}
hyperspace_client . create_collection (schema, 'collection_name' )
Copy String schema = "{" +
" \"configuration\": {" +
" \"name\": {" +
" \"type\":\"keyword\"" +
" }" +
" \"id\": {" +
" \"type\":\"keyword\"" +
" \"id\":\"true\"" +
" }" +
" }" +
" }" ;
hyperspaceClient . createCollection (collectionName , schema);
Copy const schema = {
"configuration" : {
"name" : {
"type" : "keyword"
},
"id" : {
"type" : "keyword" ,
"id" : true ,
}
}
} ;
await hyperspaceClient . createCollection (collectionName, schema) ;
Where –
schema – Specifies the python dictionary that outlines the configuration schema.
'collection_name ' – Specifies the name of the collection to be created in the Hyperspace database.
Upload Data
Data can be uploaded in batches. Copy the following code snippet to upload data
Python Java JavaScript
Copy batch_size = 250
batch = []
for i , data_point in enumerate (documents):
batch . append (data_point)
if (i + 1 ) % batch_size == 0 :
response = hyperspace_client . add_batch (batch, collection_name)
batch . clear ()
if batch :
response = hyperspace_client . add_batch (batch, collection_name)
hyperspace_client . commit (collection_name)
Copy import java . util . ArrayList ;
final int batchSize = 250 ;
for ( int i = 0 ; index < documents . size (); i ++ ) {
batch . add ( documents . get (i));
if ((i + 1 ) % batchSize == 0 ) {
List < DataPoint > batchCopy = new ArrayList <>(batch);
futures . add ( hyperspaceClient . addBatch (batchCopy , collectionName));
batch . clear ();
}
}
if ( ! batch . isEmpty ()) {
futures . add ( hyperspaceClient . addBatch ( new ArrayList <>(batch) , collectionName));
}
CompletableFuture . allOf ( futures . toArray ( new CompletableFuture [ 0 ])) . join ();
hyperspaceClient . commit (collectionName) . join ();
Copy let BATCH_SIZE = 250 ;
let batch : any [] = [];
let collectionName = "new_collection" ;
for ( const [ i , document ] of documents .entries ()) {
batch .push (document);
if ((i + 1 ) % BATCH_SIZE == 0 ) {
await client .addBatch (collectionName , batch);
batch = [];
}
}
if ( batch . length != 0 ) {
await client .addBatch (collectionName , batch);
}
await client .commit (collectionName)
Where –
data_point – Represents the document to upload. Each document must have dictionary like structure with a keys according to the database schema configuration file.
batch_size – Specifies the number of documents in a batch.
commit
is required for vector search only
Build and run a query (Python only)
Hyperspace queries can be of one of the following types –
Lexical search can be performed in DSL syntax, or as using a score function of the following form:
Copy def score_function ( params , doc ) :
score = 0.0
if match ( 'metadata field 1' ):
score = 1.0
if match ( 'metadata field 1' ):
score 2.0
return score
To set a hybrid or lexical search query –
Specify that this score function file is to be used for the Search, as follows –
Python Java JavaScript
Copy hyperspace_client . set_function (score_function_name,
collection_name = collection_name,
function_name = 'score_function' )
Copy String function = Files . readString (Paths.get( "score_function.py" ));
client .setFunction (collectionName , "score_function" , function );
Copy await hyperspaceClient . setFunction (score_function_name,
collection_name = collection_name,
function_name = 'score_function' )
To run a hybrid or lexical search query –
define the query schema and run
Python Java JavaScript
Copy params = {
"name" : "John"
}
results = hyperspace_client . search (params,
size = 10 ,
collection_name = collection_name
function_name = 'score_function' )
Copy JsonObject params = new JsonObject() ;
params . add ( "name" , new JsonPrimitive( "John" ) );
JsonObject query = new JsonObject() ;
query . add ( "query" , params);
Object response = client . search (collectionName , 10 , query , "my_score_function" );
Copy const size = 10 ;
let params = {
"name" : "John"
}
let functionName = 'score_function' ;
await hyperspaceClient . search (collectionName, size, params, functionName)
query_body
is the query in DSL syntax. query_body
must have a similar structure to the database documents, according to the query schema config file. If query_body includes fields of type
To run a lexical search query in DSL syntax–
define the query schema and run
Python Java JavaScript
Copy results = hyperspace_client . dsl_search ({ 'params' : query_body},
size = 10 ,
collection_name = collection_name)
Copy String queryJson = "{" +
" \"query\": {" +
" \"bool\": {" +
" \"must\": [" +
" {" +
" \"term\":{" +
" \"name\":\"John\"" +
" }" +
" }" +
" ]" +
" }" +
" }" +
"}" ;
JsonObject query = JsonParser . parseString (queryJson) . getAsJsonObject ();
Object response = hyperspaceClient . dslSearch (collectionName , 10 , query));
JsonObject queryResponse = new Gson() . toJsonTree (response) . getAsJsonObject ();
System . out . println (queryResponse);
Copy const size = 10 ;
const query = {
"query" : {
"bool" : {
"must" : [
{ "term" : { "name" : "John" }}
]
}
}
}
await hyperspaceClient .search (collectionName , size , query)
query_body
is the query in DSL syntax.
results is a dictionary with two keys – {'similarity': {}, 'took_ms': ..}
took_ms – is a float value that specifies how long the query took to run, such as 8.73ms
similarity – Returns a list. Each element of the list represents a matching document. For each document, it specifies the score and the vector_id that you can use to retrieve the document from the Collection.
Here is an example of what results might look like if they were printed on the screen –
Copy print (results[ 'similarity' ])
[{'score: 513.7000122070312, 'vector_id': '78254'},
{'score: 512.5500126784442, 'vector_id': '23091'},
{'score: 485.5471220787652, 'vector_id': '85432'}]
You can retrieve additional document fields in the query, using the "fields" keyword.
To run a lexical search query in DSL syntax–
define the query schema and run
Python Java JavaScript
Copy query = {
"query" : {
"bool" : {
"must" : [
{ "term" : { "name" : "John" }}
]
}
}
}
results = hyperspace_client . search ({ 'params' : query_body},
size = 10 ,
collection_name = collection_name
function_name = 'score_function' ,
fields = [ "title" , "date" ])
Copy String queryJson = "{" +
" \"query\": {" +
" \"bool\": {" +
" \"must\": [" +
" {" +
" \"term\":{" +
" \"name\":\"John\"" +
" }" +
" }" +
" ]" +
" }" +
" }" +
"}" ;
JsonObject query = JsonParser . parseString (queryJson) . getAsJsonObject ();
Object response = client . dslSearch (collectionName , 10 , query));
Copy const size = 10 ;
const query = {
"query" : {
"bool" : {
"must" : [
{ "term" : { "name" : "John" }}
]
}
}
}
await hyperspaceClient .dslSearch (collectionName , size , query ,
fields = [ "title" , "date" ])
query_body
is the query in DSL syntax.
In this scenario, each entry in results['similarity'] includes a key named "fields ", that includes the fields "title " and "date " per retrieved document.
a more detailed guide is available here .
Last updated 5 months ago