Building a Hybrid Search Query
The following describes how to build and run a Hybrid Search query. A Hybrid search performs both a Lexical Search and a Vector Search. It then assigns a multiplier (weight) to the resulting matches and retrieves the documents with the top highest scores for retrieval.
Running the Hybrid Search Query
If you are using a score function, copy the following code snippet to run the hybrid search query –
Where–
document – Specifies the document for similarity search and the multiplier of the return score, as described in Step 3, Defining the Lexical Query Schema.
size – Specifies the number of results to return.
function_name – Specifies the scoring function to be used in the Lexical Search query as described in Step 1, Creating the Scoring Function.
collection_name – Specifies the Collection in which to search.
Alternatively, if you use DSL syntax, copy the following code snippet
Where–
query_string is your query logic, see example below.
Creating the Hybrid Search Query
You can create Hybrid Search queries in two methods
Linear combination of vector and lexical search (default)
Hybrid score function
Linear Combination of Scores
The query will be a hybrid search query with the score being a linear combination of lexical and vector scores. By default, query components are assigned with weight = 1.0.
Assigning Weights
To change the weights, add a key named "knn" that includes a key "query" for the lexical search and designated keys with vector field name ('vector_field_1' in the example) for the vector search.
In the above example, the vector_field_1 score will be multiplied by 0.6 in the overall score and the lexical query score by 0.05.
All fields of type dense_vector under 'params' will be included in the vector search, unless the corresponding 'boost' key is set to 0. The weight will be assigned the default value of 1.0.
Building a Hybrid Score Function
You can also preform Hybrid Search using a hybrid score function, by including the KNN search function in the lexical score function. The score in this case is determined by the score function logic and any weights assigned in the query schema will be ignored.
You can access the KNN score by using the function distance
('vector_field', min_score)
that returns the KNN distance, and the function
knn_filter
('vector_field', min_score)
that filters according to distance. The default value of min_score is 1 for both functions.
Both functions can only be used in the last return statement. All other return statements must return 0, None, or False.
Example 1
In the above example, the score function will return 2 if the KNN score of 'vector_field_1' is above 0.3, and 0 otherwise.
Example 2
In the above example, the score function will return 2+ 0.5 * KNN('vector_field_1') if the KNN score of 'vector_field_1' is above params['min_score'], and 2 otherwise. You need to provide the key "min_score" under the query "params" key.
Last updated