Uploading Data

Data can be uploaded in batches or as single documents.

Uploading a Single Document

To upload a document into a Collection

Hyperspace data points must be of type dictionary.

Example-

data_point = {"title": "star wars", 
              "genres": ["Action","Adventure","Sc-ifi"],
              "embedded_description" = [0.01,0.003,...]}

Use the following to upload a single data point –

hyperspace_client.add_document(data_point, collection_name)

Where –

  • data_point – Contains the data to be uploaded in the structure specified in the data schema configuration file.

  • collection_name – Specifies the name of the Collection into which to load the document.

Uploading a Batch of Data Points

Data can be uploaded in batches by conversion of the data points to a document object before uploading. The basic data point object for the Hyperspace database is a structure of python dictionaries.

To upload a batch of documents into a Collection –

For verification purposes, we recommend that you upload data to a Collection in batches of many documents each which has the structure specified in the data schema configuration file.

The following code snippet builds a list of documents in a temporary variable named batch and then uploads each batch using –

response = hyperspace_client.add_batch(batch, collection_name)

The following example builds batches of 250 random documents for Hybrid Search. Each time it creates a random document, it loads it into a batch and then uploads the batch. Once a batch reaches 250 documents, it's uploaded to the Hyperspace Collection. Replace the yellow highlighted line below with code that retrieves the next document to be uploaded.

Copy the following code snippet -

BATCH_SIZE = 250
batch = []
for i, data_point in enumerate(documents):
   batch.append(data_point)
   if (i+1) % BATCH_SIZE == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      print(i + 1, response)
      batch.clear()
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
  hyperspace_client.commit(collection_name)

In the above example, Hyperspace will assign the each document with a random id. If you want to assign id manually, each document must include an id type field, as explained in Database schema config file. The id must be of type keyword/string.

At the moment, it is not possible to upload additional documents after commit. This will be changed in next versions.

Last updated

#108: Max's Nov 6 changes

Change request updated