Quick Start
Run the following shell command in your code or your data terminal –
1 pip install git+https://github.com/hyper-space-io/hyperspace-py
Once you receive credentials and host address, use the following code to connect to the database through the Hyperspace API.
1 hyperspace_client = hyperspace.HyperspaceClientApi(host=host_address,
2 username=username,
3 password=password)
4 hyperspace_client.collections_info()
Create a schema file
The schema files outline the data structure, index and metric types, and similar configurations. More info can be found in the configuration file section.
Create a collection
Copy the following code snippet to create a collection
1 hyperspace_client.create_collection('schema.json', 'collection_name')
Where –
- 'schema.json' – Specifies the path to the configuration file that you created locally on your machine.
- 'collection_name' – Specifies the name of the collection to be created in the Hyperspace database.
Upload Data
Data can be uploaded in batches. Copy the following code snippet to upload data
1 BATCH_SIZE = 250
2 batch = []
3 for i, data_point in enumerate(documents):
4 batch.append(hyperspace.document(str(i), data_point))
5 if (i+1) % BATCH_SIZE == 0:
6 response = hyperspace_client.add_batch(batch, collection_name)
7 batch.clear()
8 if batch:
9 response = hyperspace_client.add_batch(batch, collection_name)
10 hyperspace_client.commit(collection_name)
Where –
- data_point – Represents the document to upload. The structure of each document must be according to the database schema configuration file.
- i – Specifies the identifier that you assign to the document that you are uploading, which must be unique per Collection. You can assign any identifier as long as it's unique.
- BATCH_SIZE – Specifies the number of documents in a batch.
commit
is required for vector search only
Build and run a query
Hyperspace query can define one of the following types of search –
- Classic Search
- Vector Search
- Hybrid Search
1 def score_function (Q , V) :
2 score = 0.0
3 if match ('metadata field 1'):
4 score = 1.0
5 if match ('metadata field 1'):
6 score 2.0
7 return score
Specify that this score function file is to be used for the Classic Search, as follows –
1 hyperspace_client.set_function(score_function_filename,
2 collection_name=collection_name,
3 function_name='score_function')
To run a hybrid search query –
define the query schema and run
1 hybrid_query_schema = {
2 'params': data_point,
3 'knn': {
4 'query': {'boost': 1},
5 'vector': {
6 'top_k': 400,
7 'boost': 1
8 }
9 }
10 }
11 results = hyperspace_client.search(vector_query_schema,
12 size=5,
13 collection_name=collection_name
14 function_name='score_function')
Add the key named knn that will include the query key for the Classic Search and the vector key for the Vector Search.
Retrieve Results
To retrieve results, use the following command
1 results = hyperspace_client.search(vector_query_schema, size=5, collection_name=collection_name)
results is a dictionary has two keys – {'similarity': {}, 'took_ms'}
- took_ms – is a float value that specifies how long the query took to run, such as 8.73ms
- similarity – Returns a list. Each element of the list represents a matching document. For each document, it specifies the score and the vector_id that you can use to retrieve the document from the Collection.
Here is an example of what results might look like if they were printed on the screen –
1 print(results)['similarity'])
[{'score: 513.7000122070312, 'vector_id': '78254'},
{'score: 512.5500126784442, 'vector_id': '23091'},
{'score: 485.5471220787652, 'vector_id': '85432'}]
Last modified 5d ago