Candidate Filtering

Hyperspace score functions generate candidates by filtering the database documents. Hyperspace supports several filtering methods, both range-match based and keyword-match based.

The filtering is performed at the external conditions stack, that is, only the external “if” conditions will affect the candidates list. The final candidate list can then be created using additional filters and scores.

For Example -

def score_function_recommendation( params, doc):
    if match('genres'):
        if match('countries'):
          return 1.0       
    return 0.0

In the above example, the ifmatch('genres') condition in will create the candidate list, while the if match('countries') condition allows to modify their score but will not change the overall candidate list

As second example Example -

def score_function_recommendation( params, doc):
    if match('genres') and doc['budget'] >= 100000000:
        if match('countries'):
          return 1.0       
    return 0.0

In the above example, only the ifmatch('genres') and doc['budget'] >= 100000000 condition will create the candidate list.

You can filter candidates using the following methods:

  • Exact match between keywords

  • Window Match Between Dates

  • Match Between Geo Coordinates

Exact Match Between Keywords

You can match keywords using the function match(str fieldname). The function operates on either keywords or lists of keywords. For keywords, the function returns True for an exact match between the keywords and for lists of keywords, it returns True for an exact match between any two keywords in the two lists. Hyperspace allows two forms of matching

  • Match between a field in the query and the same field in the database documents

  • Match between a field in the query and a different field in the database documents

Example:

def score_function_recommendation( params, doc):
    if match("city",”shipping_city”) and match("street"):
          return 1.0       
    return 0.0

In the above example:

  • The field 'street' is compared between the query and each document. If the field includes a matching value, the corresponding match function will return true.

  • The field 'city' in the query is compared with the field 'shipping_city' in the database documents. If there is a matching value, the corresponding match function will return true.

Window Match Between Dates

Window matching between dates can be performed using the function window_match(str fieldname, unsigned int Dt0, unsigned int Dt1).

The function compares the dates doc[fieldname] - dt0 and doc[fieldname] - dt1 to params[fieldname].

In other words, the function operates on date fields and returns True

if doc[fieldname] - dt0 < params[fieldname] < doc[fieldname] - dt1,

and False otherwise.

Where-

  • params is the query document

  • doc is the candidate vector

  • dt1, dt0 state the range of the window to match. dt1 and dt0 must include units (s/m/h/d).

Example:

def score_function_recommendation( params, doc):
    if match("city",”shipping_city”) and window_match(Arrival_times, “3d”,“1d"):
          return 1.0       
    return 0.0   

The window_match condition will return True if

doc[fieldname] - 3d < params[fieldname] < doc[fieldname] - 2d

For example, if params[fieldname]=1698225495 equivalent to GMT October 25, 2023 9:18:15 AM) and doc[fieldname]=1698311895 (GMT October 26, 2023 9:18:15 AM), then params[fieldname] > doc[fieldname] - 2d and window_match will return False.

Match Between Geo Coordinates

Geographical coordinates can be compared using the function geo_dist_match(str fieldname, float thresh).

The function returns True if the distance between the coordinates is below the threshold, and False otherwise.

Example:

def score_function_recommendation( params, doc):
    if match("city",”shipping_city”) and geo_dist_match("geolocation", 45.02):
          return 1.0       
    return 0.0   

Comparison of Document Field Values

You can directly access the input query values and database documents using the syntax params[fieldname] or doc[fieldname], correspondingly. You can then use the retrieved values in the score function.

Example:

def score_function_recommendation( params, doc):
    score = 0.0
    if match('genres') and (doc['budget'] > 10000000 or doc['year'] == 2003):
       score = 1.0
    return score 

Filtering Based on KNN score

You can filter based on the vector search score using the function knn_filter(str vector_fieldname1, str vector_fieldname2, float min_score).

knn_filter() filters based on the KNN score, according to the metric defined in the data configuration schema file. The function returns 1 if the score is abovemin_score_threhold,or 0 otherwise. min_score can be a dynamic value, defined in the query params.

The function operates on params[vector_fieldname1] and doc[vector_fieldname2].

By defaultvector_fieldname2 = vector_fieldname1 and min_score_threhold = 0

Limitations

  • knn_filter() can only be used as part of the last return statement.

  • All other return statements must return 0, False or none.

Example 1:

def score_function(params, doc):
    if match("genre"):
        return
    else if match("countries"):
        return False
    score = rarity_max("tags")        
    if score < 1:
        return 0      
    return score + 0.3 * knn_filter("tagline_embedding", 0.2)

In the above example, knn_filter calculates the KNN score between params["tagline_embedding"]and doc["tagline_embedding"]. If the score is above 0.2, the function will return score1 + 0.3. Otherwise it will return score1.

Example 2:

def score_function(params, doc):
    score = rarity_max("tags")            
    return score  * knn_filter("tagline_embedding", "overview_embedding", params["min_score"])

In the above example, knn_filter calculates the KNN score between the fieldsparams["tagline_embedding"]and doc["overview_embedding"]. If the score exceeds params["min_score"], the score function will return score1. Otherwise it will return 0.

Last updated