Hyperspace Docs
Hyperspace Homepage
  • Getting started
    • Overview
      • Hyperspace Advantages
      • Hyperspace Search
    • Quick Start
  • flows
    • Setting Up
      • Installing the Hyperspace API Client
      • Connecting to the Hyperspace Server
      • Creating a Database Schema Configuration File
        • Vector Similarity Metrics
        • Index Type Methods
      • Creating a Collection
      • Uploading Data to a Collection
      • Building and Running Queries
        • Building a Lexical Search Query
        • Building a Vector Search Query
        • Building a Hybrid Search Query
      • Retrieving Results
    • Data Collections
      • Uploading Data
      • Accessing Data
      • Supported Data Types
    • Queries
      • DSL Query interface
        • Aggregations
        • Bool Query
        • Candidate Generation and Metadata Filtering
        • Scoring and Ranking
  • Reference
    • Hyperspace Query Flow
    • Features and Benefits
    • Search Processing Unit (SPU)
    • Hyperspace Document Prototype
  • API Documentation
    • Hyperspace Client
      • add_batch
      • add_document
      • async_req
      • clear_collection
      • collections_info
      • commit
      • create_collection
      • delete_collection
      • delete_by_query
      • dsl_search
      • get_schema
      • get_document
      • reset_password
      • search
      • update_by_query
      • update_document
    • DSL Query Framework
      • Aggregations
        • Cardinality Aggregation
        • Date Histogram
        • Metric Aggregations
        • Terms Aggregation
      • Bool Queries
        • Free Text Search
        • 'match' Clause
        • 'filter' Clause
        • 'must' Clause
        • 'must_not' Clause
        • 'should' Clause
        • 'should_not' Clause
      • Candidate Generation and Metadata Filtering
        • Geo Coordinates Match
        • Range Match
        • Term Match
      • Scoring and Ranking
        • Boost
        • 'dis_max'
        • Function Score
        • Rarity Score (TF-IDF)
  • Releases
    • 2024 Releases
Powered by GitBook
On this page
  • Creating the Scoring Function (Optional)
  • Specifying the Document for Lexical Search
  • Running the Lexical (Classic) Search Query
  • Running the Lexical (Classic) Search Query in DSL Syntax
  1. flows
  2. Setting Up
  3. Building and Running Queries

Building a Lexical Search Query

Describes how to build and run a Lexical Search query.

PreviousBuilding and Running QueriesNextBuilding a Vector Search Query

Last updated 10 months ago

Lexical Search, or Classic Search, is a fundamental approach to retrieve information based on keyword, integer, float, date etc. value matching. This search operates by assessing the similarity of a collection documents to a query document, using a score function that you define, that assigns a similarity score to each (document, query) pair. It then selects the documents with the top scores and return their id's to the user. (check for a detailed discussion on score functions)

Steps to build a Lexical search query

  • Step 1 - Creating the Scoring Function (Optional)

  • Step 2 - Specifying the Document for Similarity Search

  • Step 3 - Defining the Lexical Query Schema

  • Step 4 - Running the Lexical Search Query

  • Step 5 - Viewing Results

Creating the Scoring Function (Optional)

Create a which describes your own scoring logic, with higher scores indicating a stronger match. The structure can be similar to the following example.

The following Python scoring function initially sets the score to 0. If there is a match of the country (or any country in a list), the score is set to 5. If there is a match of both the country and the street, the score is set as 10.

def score_function ( params , doc ) :
   score = 0.0
   if match ( 'country' ):
      score = 5.0
      if match ( 'street' ) :
         score 10.0
   return score

Here is another example that illustrates a more complex scoring mechanism that can be used to weight different attributes for recommendations.

def score_function_recommendation( params , doc ):
   score = 0.0 
   boost = 1.0
   if match('genres') and match('countries') and not match('title'):
      score  = 1.0
      if doc['rating'] > 7.0:   
         boost = 2.0
   return boost * score  

This scoring function provides a recommendation which is a based on the requirement that at least one element of the fields "genres' and 'countries' match. Movies with high rating recieve a boost.

You can specify the score function file is be used for the Lexical Search, as follows –

hyperspace_client.set_function(score_function_recommendation,
                                collection_name=collection_name,
                                function_name='score_function')
String function =  "def score_function_recommendation( params , doc ):" +
"   score = 0.0 "+
"   boost = 1.0"+
"   if match('genres') and match('countries') and not match('title'):"+
"      score  = 1.0"+
"      if doc['rating'] > 7.0:   "+
"         boost = 2.0"+
"   return boost * score  ";
hyperspaceClient.setFunction(collectionName, "score_function", function);
let scoreFunction = "
   def score_function_recommendation( params , doc ):
   score = 0.0 
   boost = 1.0
   if match('genres') and match('countries') and not match('title'):
      score  = 1.0
      if doc['rating'] > 7.0:   
         boost = 2.0
   return boost * score  "
hyperspaceClient.setFunction(score_function,
                                collection_name=collection_name,
                                function_name='score_function')
  • score_function_recommendation– Specifies the name of the function containing the logic to be used in the search query, which is described in step #1 above.

  • collection_name – Specifies the name of the Collection that contains the data to be searched.

  • function_name – Assigns the score function a local object name to be used later when running the search query.

You can also run the score function from a file, using the command -

hyperspace_client.set_function(score_function_filename ,
                                collection_name=collection_name,
                                function_name='score_function')
String function = Files.readString(Paths.get(score_function_filename));
hyperspaceClient.setFunction(collectionName, "score_function", function);
await hyperspaceClient.setFunction(score_function_filename,
                                     collection_name=collection_name,
                                     function_name='score_function');
  • score_function_filename – Specifies the name and path of the file containing the logic to be used in the search query, which is described in step #1 above. This loads the contents of this file to a local object.

  • collection_name – Specifies the name of the Collection that contains the data to be searched.

  • function_name – Assigns the score function a local object name to be used later when running the search query.

Specifying the Document for Lexical Search

If you want to use a database document as for the query, use the function "get_document". Specify the collection name and identifier of the document (for example, '47') that contains the data to which you want to find similarities by placing it in a local object named document.

document = hyperspace_client.get_document(document_id='47',
                                            collection_name=collection_name
intdocumentId = "47";
Object document = client.getDocument(collectionName, "124", false);
let documentId = "47";
const {document} = await hyperspaceClient.get(documentId, collectionName)
console.log(data);

Running the Lexical (Classic) Search Query

If you are using a python score function, copy the following code snippet to run the lexical search query

params= {
         "name": "John"
        }
results = hyperspace_client.search(params,
                                   size=10,                 
                                   collection_name=collection_name
                                   function_name='score_function')
query = {
    "query": {
        "bool": {
            "must": [
                {"term": {"name": "John"}}
            ]
        }
    }
}
results = hyperspace_client.search({'params': query_body},
                                   size=10,                 
                                   collection_name=collection_name
                                   function_name='score_function',
                                   fields = ["title", "date"])
let size = 10;
let params= {
    "name": "John"
}
const functionName = 'score_function';
await hyperspaceClient.search(collectionName, size, params, functionName)

Where–

  • document– Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Lexical Query Schema.

  • size – Specifies the number of results to return.

  • function_name – Specifies the scoring function to be used in the Lexical Search query as described in Step 1, Creating the Scoring Function.

  • collection_name – Specifies the Collection in which to search.

Running the Lexical (Classic) Search Query in DSL Syntax

Alternatively, if you use DSL syntax, copy the following code snippet

query = {
        "query": {
            "bool": {
                "must": [
                    {"term": {"name": "John"}}
                ]
            }
        }
    }
results = hyperspace_client.dsl_search(query,
                                   size=10,                 
                                   collection_name=collection_name)
String queryJson =
                    "{" +
                    "  \"query\": {" +
                    "    \"bool\": {" +
                    "      \"must\": [" +
                    "        {" +
                    "          \"term\":{" +
                    "            \"name\":\"John\"" +
                    "           }" +
                    "        }" +
                    "      ]" +
                    "    }" +
                    "  }" +
                    "}";
JsonObject query = JsonParser.parseString(queryJson).getAsJsonObject();
Object response = client.dslSearch(collectionName, 10, query));
let size = 10;
let query = {
    "query": {
        "bool": {
            "must": [
                {"term": {"name": "John"}}
            ]
        }
    }
}
await hyperspaceClient.dslSearch(collectionName, size, query)

Where query_string is your query logic, see example below.

query = {
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "genres": "genres_value"
              }
            },
            {
              "term": {
                "adult": "adult_value"
              }
            },
            {
              "bool": {
                "must_not": [
                  {
                    "term": {
                      "title": "title_value"
                    }
                  }
                ]
              }
            }
          ],
          "should": [
            {
              "range": {
                "rating": {
                  "gt": 7.0
                }
              }
            }
          ]
        }
      },
      "boost_mode": "multiply",
      "boost": 2.0
    }
  }
}
String queryJson =" { " + 
"    \"function_score\": { " + 
"      \"query\": { " + 
"        \"bool\": { " + 
"          \"must\": [ " + 
"            { " + 
"              \"term\": { " + 
"                \"genres\": \"genres_value\" " + 
"              } " + 
"            }, " + 
"            { " + 
"              \"term\": { " + 
"                \"adult\": \"adult_value\" " + 
"              } " + 
"            }, " + 
"            { " + 
"              \"bool\": { " + 
"                \"must_not\": [ " + 
"                  { " + 
"                    \"term\": { " + 
"                      \"title\": \"title_value\" " + 
"                    } " + 
"                  } " + 
"                ] " + 
"             } " + 
"            } " + 
"          ], " + 
"          \"should\": [ " + 
"            { " + 
"              \"range\": { " + 
"                \"rating\": { " + 
"                  \"gt\": 7.0 " + 
"                } " + 
"              } " + 
"            } " + 
"          ] " + 
"        } " + 
"      }, " + 
"      \"boost_mode\": \"multiply\", " + 
"      \"boost\": 2.0 " + 
"    } " + 
"  } " + 
"} " + 
JsonObject query = JsonParser.parseString(queryJson).getAsJsonObject();
Object response = client.dslSearch(collectionName, 10, query));
let query = {
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "genres": "genres_value"
              }
            },
            {
              "term": {
                "adult": "adult_value"
              }
            },
            {
              "bool": {
                "must_not": [
                  {
                    "term": {
                      "title": "title_value"
                    }
                  }
                ]
              }
            }
          ],
          "should": [
            {
              "range": {
                "rating": {
                  "gt": 7.0
                }
              }
            }
          ]
        }
      },
      "boost_mode": "multiply",
      "boost": 2.0
    }
  }
};

the query flow
Python syntax query function