Building a Hybrid Search Query

The following describes how to build and run a Hybrid Search query. A Hybrid search performs both a Lexical Search and a Vector Search. It then assigns a multiplier (weight) to the resulting matches and retrieves the documents with the top highest scores for retrieval.

Running the Hybrid Search Query

If you are using a score function, copy the following code snippet to run the hybrid search query –

results = hyperspace_client.search(query, 
                                   size=5, 
                                   function_name='score_function',               
                                   collection_name=collection_name)

Where

  • document – Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Lexical Query Schema.

  • size – Specifies the number of results to return.

  • function_name – Specifies the scoring function to be used in the Lexical Search query as described in Step 1, Creating the Scoring Function.

  • collection_name – Specifies the Collection in which to search.

Alternatively, if you use DSL syntax, copy the following code snippet

results = hyperspace_client.search_dsl(query,
                                    size=5,
                                    collection_name=collection_name)    

Where

  • query_string is your query logic, see example below.

Creating the Hybrid Search Query

You can create Hybrid Search queries in two methods

  1. Linear combination of vector and lexical search (default)

  2. Hybrid score function

Linear Combination of Scores

The query will be a hybrid search query with the score being a linear combination of lexical and vector scores. By default, query components are assigned with weight = 1.0.

Assigning Weights

To change the weights, add a key named "knn" that includes a key "query" for the lexical search and designated keys with vector field name ('vector_field_1' in the example) for the vector search.

hybrid_query_schema = {
                        'params':  {"name": "John", "Age": 30},
                        'knn': [{'field':'query','boost': 0.05}, 
                                {'field':'vector_field_1','boost': 0.6}] 
                      }

In the above example, the vector_field_1 score will be multiplied by 0.6 in the overall score and the lexical query score by 0.05.

All fields of type dense_vector under 'params' will be included in the vector search, unless the corresponding 'boost' key is set to 0. The weight will be assigned the default value of 1.0.

Building a Hybrid Score Function

You can also preform Hybrid Search using a hybrid score function, by including the KNN search function in the lexical score function. The score in this case is determined by the score function logic and any weights assigned in the query schema will be ignored.

You can access the KNN score by using the function distance('vector_field', min_score) that returns the KNN distance, and the function

knn_filter('vector_field', min_score) that filters according to distance. The default value of min_score is 1 for both functions.

Both functions can only be used in the last return statement. All other return statements must return 0, None, or False.

Example 1

def score_function_hybrid_1( params , doc ):
    score0 = 0.0
    if match('production_companies'):
        return 2 * knn_filter('vector_field_1', min_score=0.3)
    return 0.0

In the above example, the score function will return 2 if the KNN score of 'vector_field_1' is above 0.3, and 0 otherwise.

Example 2

def score_function_hybrid_2( params , doc ):
    score0 = 0.0
    if match('production_companies'):
        score0 = 2
        return 2 + 0.5 * distance('vector_field_1', min_score=params['min_score'])
    return 0.0

In the above example, the score function will return 2+ 0.5 * KNN('vector_field_1') if the KNN score of 'vector_field_1' is above params['min_score'], and 2 otherwise. You need to provide the key "min_score" under the query "params" key.

Last updated