Hyperspace Docs
Hyperspace Homepage
  • Getting started
    • Overview
      • Hyperspace Advantages
      • Hyperspace Search
    • Quick Start
  • flows
    • Setting Up
      • Installing the Hyperspace API Client
      • Connecting to the Hyperspace Server
      • Creating a Database Schema Configuration File
        • Vector Similarity Metrics
        • Index Type Methods
      • Creating a Collection
      • Uploading Data to a Collection
      • Building and Running Queries
        • Building a Lexical Search Query
        • Building a Vector Search Query
        • Building a Hybrid Search Query
      • Retrieving Results
    • Data Collections
      • Uploading Data
      • Accessing Data
      • Supported Data Types
    • Queries
      • DSL Query interface
        • Aggregations
        • Bool Query
        • Candidate Generation and Metadata Filtering
        • Scoring and Ranking
  • Reference
    • Hyperspace Query Flow
    • Features and Benefits
    • Search Processing Unit (SPU)
    • Hyperspace Document Prototype
  • API Documentation
    • Hyperspace Client
      • add_batch
      • add_document
      • async_req
      • clear_collection
      • collections_info
      • commit
      • create_collection
      • delete_collection
      • delete_by_query
      • dsl_search
      • get_schema
      • get_document
      • reset_password
      • search
      • update_by_query
      • update_document
    • DSL Query Framework
      • Aggregations
        • Cardinality Aggregation
        • Date Histogram
        • Metric Aggregations
        • Terms Aggregation
      • Bool Queries
        • Free Text Search
        • 'match' Clause
        • 'filter' Clause
        • 'must' Clause
        • 'must_not' Clause
        • 'should' Clause
        • 'should_not' Clause
      • Candidate Generation and Metadata Filtering
        • Geo Coordinates Match
        • Range Match
        • Term Match
      • Scoring and Ranking
        • Boost
        • 'dis_max'
        • Function Score
        • Rarity Score (TF-IDF)
  • Releases
    • 2024 Releases
Powered by GitBook
On this page
  • Uploading a Single Document
  • Uploading a Batch of Documents
  1. flows
  2. Data Collections

Uploading Data

Data can be uploaded in batches or as single documents.

Uploading a Single Document

To upload a document into a Collection –

Hyperspace documents must be of type dictionary.

Use the following to upload a a batch of documents –

Use the following to upload a single document –

hyperspace_client.add_document(document, collection_name)
document = {"title": "star wars", 
            "genres": ["Action","Adventure","Sc-ifi"],
            "embedded_description" = [0.01,0.003,...]}
              
hyperspace_client.add_document(document, collection_name)              
Document document = new Document();
document.putAdditionalProperty("title", "star wars");
document.putAdditionalProperty("genres", ["Action","Adventure","Sc-ifi"]);
document.putAdditionalProperty("embedded_description", [0.01,0.003,...])

hyperspaceClient.updateDocument(collectionName, doc, true, false);
const document = {"title": "star wars", 
                "genres": ["Action","Adventure","Sc-ifi"],
                "embedded_description" = [0.01,0.003,...]};
              
hyperspaceClient.addDocument(document, collection_name);

Where –

  • document – Contains the data to be uploaded in the structure specified in the data schema configuration file.

  • collection_name – Specifies the name of the Collection into which to load the document.

Uploading a Batch of Documents

Data can be uploaded in batches by conversion of the documents to a document object before uploading. The basic data point object for the Hyperspace database is a structure of python dictionaries.

To upload a batch of documents into a Collection –

We recommend that you upload data to a Collection in batches of many documents each, which has the structure specified in the data schema configuration file.

The following code snippet builds a list of documents in a temporary variable named batch and then uploads each batch using –

hyperspace_client.add_batch(batch, collection_name)
hyperspaceClient.addBatch(batch, collectionName);
hyperspaceClient.addBatch(batch, collectionName);

The following example builds batches of 250 random documents for Hybrid Search. Each time it creates a random document, it loads it into a batch and then uploads the batch. Once a batch reaches 250 documents, it's uploaded to the Hyperspace Collection. Replace the yellow highlighted line below with code that retrieves the next document to be uploaded.

Copy the following code snippet -

BATCH_SIZE = 250
batch = []

for i, data_point in enumerate(documents):
   batch.append(data_point)
   if (i+1) % BATCH_SIZE == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      print(i + 1, response)
      batch.clear()
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
  hyperspace_client.commit(collection_name)
import java.util.ArrayList;
final int batchSize = 250;

for (int i= 0; index < documents.size(); i++) {
    batch.add(documents.get(i));
    if ((i+ 1) % batchSize == 0) {
          List<DataPoint> batchCopy = new ArrayList<>(batch);
          futures.add(hyperspaceClient.addBatch(batchCopy, collectionName));
          batch.clear();
      }    
}

if (!batch.isEmpty()) {
    futures.add(hyperspaceClient.addBatch(new ArrayList<>(batch), collectionName));
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
hyperspaceClient.commit(collectionName).join();
const batchSize = 250;
let batch = [];

documents.forEach((dataPoint, index) => {
    batch.push(dataPoint);
    if ((index + 1) % batchSize === 0) {
        await hyperspaceClient.addBatch(batch, collectionName);
        batch = [];
    }
});

if (batch.length > 0) {
    await hyperspaceClient.addBatch(collectionName, documents)
};
hyperspaceClient.commit(collectionName);

At the moment, it is not possible to upload additional documents after commit. This will be changed in next versions.

PreviousData CollectionsNextAccessing Data

Last updated 11 months ago

In the above example, Hyperspace will assign the each document with a random id. If you want to manually assign id , each document must include an id type field, as explained in . The id must be of type keyword/string.

Database schema config file