Building a Lexical (Classic) Search Query

Describes how to build and run a Classic Search query.

Lexical Search, or Classic Search, is a fundamental approach to retrieving information based on keyword and value matching. This search operates by assessing the similarity of documents in a collection to a provided document, using a scoring function that you define to assign a similarity match score to each document. It then identifies the documents with the top highest scores for retrieval, often utilizing word statistics to determine relevance. A multiplier (weight) option can be assigned to these score values to refine the results further.

To build a Classic search query –

Step 1 - Creating the Scoring Function
Step 2 - Specifying the Document for Similarity Search
Step 3 - Defining the Classic Query Schema
Step 4 - Running the Classic Search Query
Step 5 - Viewing Results

Creating the Scoring Function

Create a Python syntax query function and save it as a separate .py file, such as score_function.py, in which you'll write the logic for the classic query. Within this file, you can write your own scoring logic, with higher scores indicating a stronger match. The structure can be similar to the following example. The following Python scoring function initially sets the score to 0. If there is a match of the country (or any country in a list), the score is set to 5. If there is a match of both the country and the street, the score is set as 10.

def score_function ( params , doc ) :
   score = 0.0
   if match ( 'country' ):
      score = 5.0
      if match ( 'street' ) :
         score 10.0
   return score

Here is another Python example that illustrates a more complex scoring mechanism that can be used to weight different attributes for recommendations. This scoring function provides a recommendation which is a combination of two scores, score1 and score0, which are initially set to 0.0 and 1.0, respectively. If there is a match for both 'genres' and 'adult', and no match for 'title', score0 is set to 0.0. If there is a match for 'genres', score1 is incremented by the rarity sum of 'genres'. The sum of score1 and score2 is then multiplied by a boost factor to produce the final score. If the initial condition isn't met, the function returns 0.0.1

def score_function_recommendation( params , doc ):
    score1 = 0.0
    score0 = 1.0
    boost = 1.0
    if match('genres') and match('adult') and not match('title'):
       if doc['rating'] > 7.0:
          boost = 2.0
       if match('genres'):
          score1 += rarity_sum('production_companies')
          sum_score = score1 + score0
          return boost * sum_score
    return 0.0

Specify that this score function file is to be used for the Classic Search, as follows –

hyperspace_client.set_function(score_function_filename,
                                 collection_name=collection_name,
                                 function_name='score_function')

score_function_filename – Specifies the name and path of the file containing the logic to be used in the search query, which is described in step #1 above. This loads the contents of this file to a local object.
collection_name – Specifies the name of the Collection that contains the data to be searched.
function_name – Assigns the score function a local object name to be used later when running the search query.

Specifying the Document for Classic Search

Specify the collection name and identifier of the document (for example, 47) that contains the data to which you want to find similarities by placing it in a local object named data_point.

data_point = hyperspace_client.get_document(document_id='47',
                                            collection_name=collection_name)

Running the Lexical (Classic) Search Query

Copy the following code snippet to run the lexical search query –

results = hyperspace_client.search({'params’: data_point},
                                   size=5,
                                   function_name='score_function',               
                                   collection_name=collection_name)

Where–

data_point– Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Classic Query Schema.
size – Specifies the number of results to return.
function_name – Specifies the scoring function to be used in the Classic Search query as described in Step 1, Creating the Scoring Function.
collection_name – Specifies the Collection in which to search.

The query is constructed according to data_point fields, where data_point is provided under the "params" key.

If data_point includes fields of type "dense_vector", the corresponding vector search will automatically be perfomed.

PreviousBuilding and Running Queries NextBuilding a Vector Search Query

Last updated 1 year ago