Building a Lexical (Classic) Search Query

Describes how to build and run a Classic Search query.

Lexical Search, or Classic Search, is a fundamental approach to retrieve information based on keyword, integer, float, date etc. value matching. This search operates by assessing the similarity of a collection documents to a query document, using a score function that you define, that assigns a similarity score to each (document, query) pair. It then selects the documents with the top scores and return their id's to the user. (check the query flow for a detailed discussion on score functions)

Steps to build a Classic search query

  • Step 1 - Creating the Scoring Function (Optional)

  • Step 2 - Specifying the Document for Similarity Search

  • Step 3 - Defining the Classic Query Schema

  • Step 4 - Running the Classic Search Query

  • Step 5 - Viewing Results

Creating the Scoring Function (Optional)

Create a Python syntax query function which describes your own scoring logic, with higher scores indicating a stronger match. The structure can be similar to the following example.

The following Python scoring function initially sets the score to 0. If there is a match of the country (or any country in a list), the score is set to 5. If there is a match of both the country and the street, the score is set as 10.

def score_function ( params , doc ) :
   score = 0.0
   if match ( 'country' ):
      score = 5.0
      if match ( 'street' ) :
         score 10.0
   return score

Here is another example that illustrates a more complex scoring mechanism that can be used to weight different attributes for recommendations.

def score_function_recommendation( params , doc ):
   score = 0.0 
   boost = 1.0
   if match('genres') and match('countries') and not match('title'):
      score  = 1.0
      if doc['rating'] > 7.0:   
         boost = 2.0
   return boost * score  

This scoring function provides a recommendation which is a based on the requirement that at least one element of the fields "genres' and 'countries' match. Movies with high rating recieve a boost.

You can specify the score function file is be used for the Classic Search, as follows –

hyperspace_client.set_function(score_function_recommendation,
                                 collection_name=collection_name,
                                 function_name='score_function')
  • score_function_recommendation– Specifies the name of the function containing the logic to be used in the search query, which is described in step #1 above.

  • collection_name – Specifies the name of the Collection that contains the data to be searched.

  • function_name – Assigns the score function a local object name to be used later when running the search query.

You can also run the score function from a file, using the command -

hyperspace_client.set_function(score_function_filename ,
                                 collection_name=collection_name,
                                 function_name='score_function')
  • score_function_filename – Specifies the name and path of the file containing the logic to be used in the search query, which is described in step #1 above. This loads the contents of this file to a local object.

  • collection_name – Specifies the name of the Collection that contains the data to be searched.

  • function_name – Assigns the score function a local object name to be used later when running the search query.

If you want to use a database document as for the query, use the function "get_document". Specify the collection name and identifier of the document (for example, '47') that contains the data to which you want to find similarities by placing it in a local object named document.

document = hyperspace_client.get_document(document_id='47',
                                            collection_name=collection_name)

Running the Lexical (Classic) Search Query

If you are using a python score function, copy the following code snippet to run the lexical search query –

results = hyperspace_client.search({'params’: document},
                                   size=5, 
                                   function_name='score_function',               
                                   collection_name=collection_name)                         

Where

  • document– Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Classic Query Schema.

  • size – Specifies the number of results to return.

  • function_name – Specifies the scoring function to be used in the Classic Search query as described in Step 1, Creating the Scoring Function.

  • collection_name – Specifies the Collection in which to search.

Alternatively, you can replace {'params’: document} with document to obtain the same result.

results = hyperspace_client.search(document,
                                   size=5, 
                                   function_name='score_function',               
                                   collection_name=collection_name)                         

Running the Lexical (Classic) Search Query in DSL Syntax

Alternatively, if you use DSL syntax, copy the following code snippet

results = hyperspace_client.search( data_point,
                                    "query": {query_string},
                                    size=5,              
                                    collection_name=collection_name)                         

Where query_string is your query logic, see example below.

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "genres": "genres_value"
              }
            },
            {
              "term": {
                "adult": "adult_value"
              }
            },
            {
              "bool": {
                "must_not": [
                  {
                    "term": {
                      "title": "title_value"
                    }
                  }
                ]
              }
            }
          ],
          "should": [
            {
              "range": {
                "rating": {
                  "gt": 7.0
                }
              }
            }
          ]
        }
      },
      "boost_mode": "multiply",
      "boost": 2.0
    }
  }
}

Last updated