Building a Hybrid Search Query
The following describes how to build and run a Hybrid Search query. A Hybrid search performs both a Classic Search and a Vector Search. It then assigns a multiplier (weight) to the resulting matches and then retrieves the documents with the top highest scores for retrieval.
To build a hybrid search query –
Define the Hybrid Search query schema by specifying the following –
hybrid_query_schema = {
'params': data_point
}
Running the Hybrid Search Query
Copy the following code snippet to run the lexical search query –
results = hyperspace_client.search(hybrid_query_schema,
size=5,
function_name='score_function',
collection_name=collection_name)
Where–
lexical_query_schema – Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Classic Query Schema.
size – Specifies the number of results to return.
function_name – Specifies the scoring function to be used in the Classic Search query as described in Step 1, Creating the Scoring Function.
collection_name – Specifies the Collection in which to search.
Assigning Score Weights
By default, all query components, vector and lexical, are assigned with weight = 1.0. To change that, add a key named "knn" that includes a key "query" for the lexical search and designated keys with vector field name ('vector_field_1' in the example) for the vector search.
hybrid_query_schema = {
'params': data_point,
'knn': [{'field':'query','boost': 0.05},
{'field':'vector_field_1','boost': 0.6}]
}
In the above example, the vector_field_1 score will be multiplied by 0.6 in the overall score and the lexical query score by 0.05.
Building a Hybrid Score Function
The KNN score can be included in the lexical score function. To do that, use the function distance
('vector_field_1') or knn_filter('vector_field_1', min_score=params['min_score']). The distance function returns the KNN distance, while the knn_filter function returns 1 if the KNN score is above min_score and 0 otherwise. Both functions can only be used in the last return statement.
def score_function_hybrid( params , doc ):
score0 = 1.0
boost = 1.0
if match('genres') and match('adult') and not match('title'):
if doc['rating'] > 7.0:
boost = 2.0
if match('genres'):
score0 += boost * rarity_sum('production_companies')
return score0 * knn_filter('vector_field_1', min_score=0.3)
In the above example, the score function will return score0 if the KNN score of 'vector_field_1' is above 0.3, and zero otherwise.
def score_function_hybrid( params , doc ):
score0 = 1.0
boost = 1.0
if match('genres') and match('adult') and not match('title'):
if doc['rating'] > 7.0:
boost = 2.0
if match('genres'):
score0 += boost * rarity_sum('production_companies')
return score0 * knn_filter('vector_field_1', min_score=0.3)
Last updated