Building a Hybrid Search Query

The following describes how to build and run a Hybrid Search query. A Hybrid search performs both a Classic Search and a Vector Search. It then assigns a multiplier (weight) to the resulting matches and then retrieves the documents with the top highest scores for retrieval.

To build a hybrid search query –

Define the Hybrid Search query schema by specifying the following –

hybrid_query_schema = {
                         'params': data_point
                      }

Running the Hybrid Search Query

Copy the following code snippet to run the lexical search query –

results = hyperspace_client.search(hybrid_query_schema, 
                                   size=5,
                                   function_name='score_function',               
                                   collection_name=collection_name)

Where

  • lexical_query_schema – Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Classic Query Schema.

  • size – Specifies the number of results to return.

  • function_name – Specifies the scoring function to be used in the Classic Search query as described in Step 1, Creating the Scoring Function.

  • collection_name – Specifies the Collection in which to search.

Assigning Score Weights

By default, all query components, vector and lexical, are assigned with weight = 1.0. To change that, add a key named "knn" that includes a key "query" for the lexical search and designated keys with vector field name ('vector_field_1' in the example) for the vector search.

hybrid_query_schema = {
              'params': data_point,
              'knn': [{'field':'query','boost': 0.05}, 
                      {'field':'vector_field_1','boost': 0.6}] 
}

In the above example, the vector_field_1 score will be multiplied by 0.6 in the overall score and the lexical query score by 0.05.

All fields of type dense_vector in data_point will be included in the vector search. Unless specified otherwise in a relevant 'boost; key under the 'knn' key, the corresponding weight will be assigned the default value of 1.0.

Building a Hybrid Score Function

The KNN score can be included in the lexical score function. To do that, use the function distance('vector_field_1') or knn_filter('vector_field_1', min_score=params['min_score']). The distance function returns the KNN distance, while the knn_filter function returns 1 if the KNN score is above min_score and 0 otherwise. Both functions can only be used in the last return statement.

def score_function_hybrid( params , doc ):
    score0 = 1.0
    boost = 1.0
    if match('genres') and match('adult') and not match('title'):
       if doc['rating'] > 7.0:
          boost = 2.0
       if match('genres'):
          score0 += boost * rarity_sum('production_companies')
    
    return score0 * knn_filter('vector_field_1', min_score=0.3)

In the above example, the score function will return score0 if the KNN score of 'vector_field_1' is above 0.3, and zero otherwise.

def score_function_hybrid( params , doc ):
    score0 = 1.0
    boost = 1.0
    if match('genres') and match('adult') and not match('title'):
       if doc['rating'] > 7.0:
          boost = 2.0
       if match('genres'):
          score0 += boost * rarity_sum('production_companies')
    
    return score0 * knn_filter('vector_field_1', min_score=0.3)

Last updated

#108: Max's Nov 6 changes

Change request updated