# Uploading Data to a Collection

Data points of all types are uploaded into Hyperspace Collection as documents and stored according to the identifier you specify during upload, as described below. Data upload can be performed in batches or by uploading a single vector, as follows.

#### Uploading a Single Document

Use the following command to upload a single data point –

```python
hyperspace_client.add_document(data_point, collection_name)
```

**Where –**

* <mark style="color:purple;">data\_point</mark> – Represents the document to upload. The structure of each document must be according to the database schema configuration file. Must be of **type dictionary**.
* <mark style="color:purple;">collection\_name</mark> – Specifies the name of the Collection into which to load the document.

#### Uploading a Batch of Documents

Data can be uploaded in batches by conversion of the data points to a document object before  uploading. The basic data point object for the Hyperspace database is a document of type dictionary.

**To upload a batch of documents into a Collection –**

For verification purposes, we recommend that you upload data to a Collection in batches of documents each which has the structure specified in the data schema configuration file.

The following code snippet builds a list of documents in a temporary variable named batch and then uploads each batch using –

```
response = hyperspace_client.add_batch(batch, collection_name)
```

The following example builds batches of 250 random documents for Hybrid Search. Each time it creates a random document, it loads it into a batch and then uploads the batch. Once a batch reaches 250 documents, it's uploaded to the Hyperspace Collection.

**Copy the following code snippet**

```python
BATCH_SIZE = 250
batch = []
for i, data_point in enumerate(documents):
   batch.append(data_point)
   if (i+1) % BATCH_SIZE == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      batch.clear()
      
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
hyperspace_client.commit(collection_name)
```

**Where** –

* <mark style="color:purple;">data\_point</mark> – Represents the document to upload. The structure of each document must be according to the database schema configuration file. Must be of **type dictionary**.
* <mark style="color:purple;">BATCH\_SIZE</mark> – Specifies the number of documents in a batch.
* <mark style="color:purple;">`commit`</mark> is required for vector search only. commit should only be performed after the data upload is complete.

In this method, each <mark style="color:purple;">data\_point</mark> will be assigned with an automatic identifier.

{% hint style="info" %}
Optimizing the batch size can improve the data upload speed. Larger batches will be uploaded faster, but in case of a upload failure (i.e. mismatch between a document and the data schema), the whole batch should be re-uploaded
{% endhint %}

**To manually assign Id to documents, copy the following code snippet**

```python
BATCH_SIZE = 250
batch = []
for i, data_point in enumerate(documents):
   data_point["Id"] = str(i)
   batch.append(data_point)
   if (i+1) % BATCH_SIZE == 0:
      response = hyperspace_client.add_batch(batch, collection_name)
      batch.clear()
      
if batch:
  response = hyperspace_client.add_batch(batch, collection_name)
hyperspace_client.commit(collection_name)
```

**Where** –

* <mark style="color:purple;">Id</mark> - Represents the id field of the documents. The field should be set in the [Database Schema Configuration file](https://docs.hyper-space.io/hyperspace-docs/projects/setting-up/creating-a-database-schema-configuration-file)
* <mark style="color:purple;">i</mark> – Specifies the identifier that you assign to the document that you are uploading, which must be unique per Collection. You can assign any identifier as long as it's unique.

This step is optional. If no id defined in data schema configuration file, automatic Id will be set during the upload.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.hyper-space.io/hyperspace-docs/~/changes/uCQNjcW7J3OXknfYJaVa/projects/setting-up/uploading-data-to-a-collection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
