Hyperspace Docs
Hyperspace Homepage
  • Getting started
    • Overview
      • Hyperspace Advantages
      • Hyperspace Search
    • Quick Start
  • flows
    • Setting Up
      • Installing the Hyperspace API Client
      • Connecting to the Hyperspace Server
      • Creating a Database Schema Configuration File
        • Vector Similarity Metrics
        • Index Type Methods
      • Creating a Collection
      • Uploading Data to a Collection
      • Building and Running Queries
        • Building a Lexical Search Query
        • Building a Vector Search Query
        • Building a Hybrid Search Query
      • Retrieving Results
    • Data Collections
      • Uploading Data
      • Accessing Data
      • Supported Data Types
    • Queries
      • DSL Query interface
        • Aggregations
        • Bool Query
        • Candidate Generation and Metadata Filtering
        • Scoring and Ranking
  • Reference
    • Hyperspace Query Flow
    • Features and Benefits
    • Search Processing Unit (SPU)
    • Hyperspace Document Prototype
  • API Documentation
    • Hyperspace Client
      • add_batch
      • add_document
      • async_req
      • clear_collection
      • collections_info
      • commit
      • create_collection
      • delete_collection
      • delete_by_query
      • dsl_search
      • get_schema
      • get_document
      • reset_password
      • search
      • update_by_query
      • update_document
    • DSL Query Framework
      • Aggregations
        • Cardinality Aggregation
        • Date Histogram
        • Metric Aggregations
        • Terms Aggregation
      • Bool Queries
        • Free Text Search
        • 'match' Clause
        • 'filter' Clause
        • 'must' Clause
        • 'must_not' Clause
        • 'should' Clause
        • 'should_not' Clause
      • Candidate Generation and Metadata Filtering
        • Geo Coordinates Match
        • Range Match
        • Term Match
      • Scoring and Ranking
        • Boost
        • 'dis_max'
        • Function Score
        • Rarity Score (TF-IDF)
  • Releases
    • 2024 Releases
Powered by GitBook
On this page
  • Uploading a Single Document
  • Assigning Id to a Document
  • Uploading a Batch of Documents
  1. flows
  2. Setting Up

Uploading Data to a Collection

Data points of all types are uploaded into Hyperspace Collection as documents and stored according to the identifier you specify during upload, as described below. Data upload can be performed in batches or by uploading a single vector, as follows.

Uploading a Single Document

Use the following command to upload a single document –

document = { "category": "product",
             "vec1" : [0,1]
           }
             
hyperspace_client.add_document(document, collection_name)
Document document = new Document();
document.putAdditionalProperty("category", "product");
document.putAdditionalProperty("vec1", [0,0,1]);
document.putAdditionalProperty("vec2", [0,1,0]);
client.addDocument(collectionName, document, true, false);
const document = {"category": "product",
                    "vec1" : [0,0,1],
                    "vec2" : [0,1,0]};
             
await hyperspaceClient.index({
    index: collectionName,
    body: document 
});

Where –

  • document – Represents the document to upload. The structure of each document must be according to the database schema configuration file. Must be of type dictionary.

  • collection_name – Specifies the name of the Collection into which to load the document.

Assigning Id to a Document

Each document must have a unique identifier, under the field "_id". You can manually set an id per document by defining a designated field named "_id" in the document. Use the following example-

document = {"_id": "1",
             "category": "product",
             "vec1" : [0,1]
           }
             
hyperspace_client.add_document(document, collection_name)
Document document = new Document();
document.setId("1");
document.putAdditionalProperty("category", "product");
document.putAdditionalProperty("vec1", [0,0,1]);
document.putAdditionalProperty("vec2", [0,1,0]);
client.addDocument(collectionName, document, true, false);
const document = {"category": "product",
                    "vec1" : [0,0,1],
                    "vec2" : [0,1,0]};
             
await hyperspaceClient.index({
    id: "1",
    index: collectionName,
    body: document 
});

If no id is assigned in the document, "_id" will be assigned automatically.

Uploading a Batch of Documents

Data can be uploaded in batches by conversion of the data points to a document object before uploading. The basic data point object for the Hyperspace database is a document of type dictionary.

To upload a batch of documents into a Collection –

For verification purposes, we recommend that you upload data to a Collection in batches of documents each which has the structure specified in the data schema configuration file.

The following code snippet builds a list of documents in a temporary variable named batch and then uploads each batch using –

hyperspace_client.add_batch(batch, collection_name)
hyperspaceClient.addBatch(collection_name, batch);
await hyperspaceClient.addBatch(collection_name, batch);

The following example uploads batches of 250 documents for. Documents are added to the batch, and once a batch reaches 250 documents, it's uploaded to the Hyperspace Collection.

Copy the following code snippet

BATCH_SIZE = 250
batch = []
collection_name = "new_collection"
for i, document in enumerate(documents):
   batch.append(document )
   if (i+1) % BATCH_SIZE == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      batch.clear()
      
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
hyperspace_client.commit(collection_name)
import java.util.ArrayList;
final int batchSize = 250;

for (int i= 0; index < documents.size(); i++) {
    batch.add(documents.get(i));
    if ((i+ 1) % batchSize == 0) {
          List<DataPoint> batchCopy = new ArrayList<>(batch);
          futures.add(hyperspaceClient.addBatch(batchCopy, collectionName));
          batch.clear();
      }    
}

if (!batch.isEmpty()) {
    futures.add(hyperspaceClient.addBatch(new ArrayList<>(batch), collectionName));
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
hyperspaceClient.commit(collectionName).join();
const batchSize = 250;
let batch = [];

documents.forEach((dataPoint, index) => {
    batch.push(dataPoint);
    if ((index + 1) % batchSize === 0) {
        await hyperspaceClient.addBatch(batch, collectionName);
        batch = [];
    }
});

if (batch.length > 0) {
    await hyperspaceClient.addBatch(collectionName, documents)
};
hyperspaceClient.commit(collectionName);

Where –

  • document – Represents the document to upload. The structure of each document must be according to the database schema configuration file. Must be of type dictionary.

  • BATCH_SIZE – Specifies the number of documents in a batch.

  • commit - is required for vector search only. commit should only be performed after the data upload is complete.

In this method, each document will be assigned with an automatic identifier.

Optimizing the batch size can improve the data upload speed. Larger batches will be uploaded faster, but in case of a upload failure (i.e. mismatch between a document and the data schema), the whole batch should be re-uploaded

To manually assign Id to documents, copy the following code snippet

BATCH_SIZE = 250
batch = []
for i, data_point in enumerate(documents):
   data_point["Id"] = str(i)
   batch.append(data_point)
   if (i+1) % BATCH_SIZE == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      batch.clear()
      
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
hyperspace_client.commit(collection_name)
import java.util.ArrayList;
final int batchSize = 250;

for (int i = 0; index < documents.size(); i++) {
    Document document = documents.get(i);
    String Id= String.valueOf(i);    
    document.setId(Id);
    
    batch.add(document);
    if ((i+ 1) % batchSize == 0) {
          List<DataPoint> batchCopy = new ArrayList<>(batch);
          futures.add(hyperspaceClient.addBatch(batchCopy, collectionName));
          batch.clear();
      }    
}

if (!batch.isEmpty()) {
    futures.add(hyperspaceClient.addBatch(new ArrayList<>(batch), collectionName));
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
hyperspaceClient.commit(collectionName).join();
const batchSize = 250;
let batch = [];

documents.forEach((dataPoint, index) => {
    dataPoint["Id"] = String(i);
    batch.push(dataPoint);
    if ((index + 1) % batchSize === 0) {
        await hyperspaceClient.addBatch(batch, collectionName);
        batch = [];
    }
});

if (batch.length > 0) {
    await hyperspaceClient.addBatch(collectionName, documents)
}
hyperspaceClient.commit(collection_name);

Where –

  • i – Specifies the identifier that you assign to the document that you are uploading, which must be unique per Collection. You can assign any identifier as long as it's unique.

This step is optional. If no id is defined in the data schema configuration file, automatic Id will be set during the upload.

PreviousCreating a CollectionNextBuilding and Running Queries

Last updated 10 months ago

Id - Represents the id field of the documents. The field should be set in the

Database Schema Configuration file