Building a Lexical Search Query

Describes how to build and run a Lexical Search query.

Lexical Search, or Classic Search, is a fundamental approach to retrieve information based on keyword, integer, float, date etc. value matching. This search operates by assessing the similarity of a collection documents to a query document, using a score function that you define, that assigns a similarity score to each (document, query) pair. It then selects the documents with the top scores and return their id's to the user. (check the query flow for a detailed discussion on score functions)

Steps to build a Lexical search query

  • Step 1 - Creating the Scoring Function (Optional)

  • Step 2 - Specifying the Document for Similarity Search

  • Step 3 - Defining the Lexical Query Schema

  • Step 4 - Running the Lexical Search Query

  • Step 5 - Viewing Results

Creating the Scoring Function (Optional)

Create a Python syntax query function which describes your own scoring logic, with higher scores indicating a stronger match. The structure can be similar to the following example.

The following Python scoring function initially sets the score to 0. If there is a match of the country (or any country in a list), the score is set to 5. If there is a match of both the country and the street, the score is set as 10.

def score_function ( params , doc ) :
   score = 0.0
   if match ( 'country' ):
      score = 5.0
      if match ( 'street' ) :
         score 10.0
   return score

Here is another example that illustrates a more complex scoring mechanism that can be used to weight different attributes for recommendations.

def score_function_recommendation( params , doc ):
   score = 0.0 
   boost = 1.0
   if match('genres') and match('countries') and not match('title'):
      score  = 1.0
      if doc['rating'] > 7.0:   
         boost = 2.0
   return boost * score  

This scoring function provides a recommendation which is a based on the requirement that at least one element of the fields "genres' and 'countries' match. Movies with high rating recieve a boost.

You can specify the score function file is be used for the Lexical Search, as follows –

hyperspace_client.set_function(score_function_recommendation,
                                collection_name=collection_name,
                                function_name='score_function')
  • score_function_recommendation– Specifies the name of the function containing the logic to be used in the search query, which is described in step #1 above.

  • collection_name – Specifies the name of the Collection that contains the data to be searched.

  • function_name – Assigns the score function a local object name to be used later when running the search query.

You can also run the score function from a file, using the command -

hyperspace_client.set_function(score_function_filename ,
                                collection_name=collection_name,
                                function_name='score_function')
  • score_function_filename – Specifies the name and path of the file containing the logic to be used in the search query, which is described in step #1 above. This loads the contents of this file to a local object.

  • collection_name – Specifies the name of the Collection that contains the data to be searched.

  • function_name – Assigns the score function a local object name to be used later when running the search query.

If you want to use a database document as for the query, use the function "get_document". Specify the collection name and identifier of the document (for example, '47') that contains the data to which you want to find similarities by placing it in a local object named document.

document = hyperspace_client.get_document(document_id='47',
                                            collection_name=collection_name

Running the Lexical (Classic) Search Query

If you are using a python score function, copy the following code snippet to run the lexical search query

params= {
         "name": "John"
        }
results = hyperspace_client.search(params,
                                   size=10,                 
                                   collection_name=collection_name
                                   function_name='score_function')

Where

  • document– Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Lexical Query Schema.

  • size – Specifies the number of results to return.

  • function_name – Specifies the scoring function to be used in the Lexical Search query as described in Step 1, Creating the Scoring Function.

  • collection_name – Specifies the Collection in which to search.

Running the Lexical (Classic) Search Query in DSL Syntax

Alternatively, if you use DSL syntax, copy the following code snippet

query = {
        "query": {
            "bool": {
                "must": [
                    {"term": {"name": "John"}}
                ]
            }
        }
    }
results = hyperspace_client.dsl_search(query,
                                   size=10,                 
                                   collection_name=collection_name)

Where query_string is your query logic, see example below.

query = {
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "genres": "genres_value"
              }
            },
            {
              "term": {
                "adult": "adult_value"
              }
            },
            {
              "bool": {
                "must_not": [
                  {
                    "term": {
                      "title": "title_value"
                    }
                  }
                ]
              }
            }
          ],
          "should": [
            {
              "range": {
                "rating": {
                  "gt": 7.0
                }
              }
            }
          ]
        }
      },
      "boost_mode": "multiply",
      "boost": 2.0
    }
  }
}

Last updated