distance

Combining vector distance with lexical score in a score function

The function distance(str vector_fieldname1, str vector_fieldname2, float min_score) calculates vector distance (KNN) between document fields, according to the distance metric defined in the data schema config file.

if the distance score is below min_score, the function will return 0. Otherwise, it will return the KNN_score.

Any arithmetic combination (+, *, *, /) of distance()and a variable or a constant is allowed.

min_score can be a dynamic value, included in the query parameters.

Input

  • vector_fieldname1 (str) - the name of the query field to use in the KNN calculation. params[fieldname1] must be of type dense_vector.

  • vector_fieldname2 (str, default=fieldname1) - the name of the document field to use in the KNN calculation. params[fieldname1] must be of type dense_vector. By default vector_fieldname2 is set to vector_fieldname1.

  • min_score (float, default=0) - the score threshold. If the distance score is below this value, distance() will return 0.

Output

  • (int) - The returned values will be the KNN distance between vector_fieldname1 to vector_fieldname2 if it is greater the min_thershold, and 0 otherwise.

Limitations

  • params[vector_fieldname2] and doc[vector_fieldname2] must be indexed using the same metric.

  • The distance function can only be used as part of a return statement. In addition, all other return statements must only return 0, False or none. For example, only return statements of the following types are allowed:

def score_function(params, doc):
    if match("genre"):
        return
    else if match("countries"):
        return False
    score = 1       
    if score < 1:
        return 0
    return score  * distance("tagline_embedding", "overview_embedding")

The distance() function allows to you to combine the KNN score with the lexical score, as part of the score function.

Last updated