Comment on page
Candidate Filtering
Hyperspace score function generates candidates by filtering the database. Hyperspace supports several filtering methods, both range-match based and keyword-match based.
The filtering is only performed at the external conditions stack, that is, only the external “if” conditions will affect the candidates list. The final candidate list can then be created using additional filters and scores.
As an example, only the
if
match
('genres')
in the following query will create candidates, while the if match('countries') condition will allow to modify candidate score but will not change the overall candidate list1
def score_function_recommendation( params, doc):
2
if match('genres'):
3
if match('countries'):
4
return 1.0
5
return 0.0
Hyperspace candidate filtering can be performed in multiple method
- Exact match between keywords
- Window Match Between Dates
- Match Between Geo Coordinates
Exact keyword matching can be performed using the function
match(str fieldname)
. The function operates on either keywords or lists of keywords. For keywords, the function returns True for an exact match between the keywords and for lists of keywords, it returns True for an exact match between any 2 keywords in the two lists. Hyperspace allows two forms of matching- Match between a field in the query and the same field in the database documents
- Match between a field in the query and a different field in the database documents
Example:
1
if match("city",”shipping_city”) and match("street"):
2
pass
In the above example:
- The field 'street' is compared between the query and each document. If the field includes a matching value, the corresponding match function will return true.
- The field 'city' in the query is compared with the field 'shipping_city' in the database documents. If there is a matching value, the corresponding match function will return true.
Window matching between dates can be performed using the function
window_match(str fieldname, unsigned int Dt0, unsigned int Dt1)
. The function compares the dates doc
[fieldname] - dt0 and
doc[fieldname] - dt1 to params[fieldname].
In other words, the function operates on date fields and returns True
if
doc[fieldname] - dt0 < params[fieldname] <
doc[fieldname] - dt1
, and False otherwise.Where-
- params is the query document
- doc is the candidate vector
- dt1, dt0 state the range of the window to match. dt1 and dt0 must include units (s/m/h/d).
Example:
1
if window_match(Arrival_times, “3d”,“1d"):
2
pass
The window_match condition will return True if
doc[fieldname] - 3d < params[fieldname] < doc[fieldname] - 2d
For example, if
params[fieldname]=
1698225495 equivalent to GMT October 25, 2023 9:18:15 AM) and doc[fieldname]=
1698311895 (GMT October 26, 2023 9:18:15 AM), then params[fieldname] > doc[fieldname] - 2d
and window_match will return False.Geographical coordinates can be compared using the function
geo_dist_match(str fieldname, float thresh)
.The function returns True if the distance between the coordinates is below the threshold, and False otherwise.
Example:
1
if geo_dist_match("geolocation", 45.02):
2
pass
The input query values and database documents values can be accessed using the syntax
params[fieldname]
or doc[fieldname]
, correspondingly. The retrieved values can than be used as part of the score function. Example:
1
def score_function_recommendation( params, doc):
2
score = 0.0
3
if match('genres') and (doc['budget'] > 10000000 or doc['year'] == 2003):
4
score = 1.0
5
return score
Filtering based on the distance between vectors can be performed using the function knn_filter(
str vector_fieldname1, str vector_fieldname2, r32 min_score)
knn_filter()
operates on a one or two vector fields and calculates the KNN score, based on the metric defined in the data configuration schema file. It will then return 1 if it is above the min_score_threhold,
or 0 otherwise. min_score
can be a dynamic value, included in the query params.By default
,vector_fieldname2 = vector_fieldname1 and min_score_threhold = 0
knn_filter() can
only performed at the last return statement.All other
return
statements must return 0
, False
or none. For example:Example 1:
1
def score_function(params, doc):
2
if match("genre"):
3
return
4
else if match("countries"):
5
return False
6
score = rarity_max("tags")
7
if score < 1:
8
return 0
9
return score1 + 0.3 * knn_filter("tagline_embedding", 0.2)
In the above example, knn_filter calculates the KNN score between
params["tagline_embedding"]
and doc["
tagline_embedding"].
If the score is above 0.2, the function will return score1 + 0.3. Otherwise it will return score1.Example 2:
1
def score_function(params, doc):
2
score1 = rarity_max("tags")
3
return score1 * knn_filter("tagline_embedding", "overview_embedding", params["min_score"])
In the above example, knn_filter calculates the KNN score between
params["tagline_embedding"]
and doc["overview_embedding"]
. If the score is above params["min_score"], it will return score1. Otherwise it will return 0.Last modified 3d ago