Comment on page
Candidate Score
Hyperspace support various methods of scoring and arithmetics, based on rarity of keywords in the collection.
The rarity
score
can be calculated for matched keywords. Hyperspace calculate this score over keywords or lists of keywords, using the TF-IDF formula. Two different types of usages are currently allowed -
- rarity_max(str fieldname) returns the maximum rarity out of all the keywords in the list,
- rarity_sum(str fieldname) returns the sum of rarities of all the keywords in the list.
For keyword fields (non lists) the two functions will return the same result.
Example:
score = rarity_max("cities") + rarity_sum("streets")
Hyperspace allows multiple methods for score arithmetic, as explained below
- Sum
- Max
- Arithmetic operations
The function receives n scores (results of score functions) and returns their sum
Syntax
sum (float score1, float score2,...
)
Example
score1 = rarity_max("city")
score2 = rarity_max("Country")
score3 = rarity_max("Continent")
score4 = .....
score_sum = sum(score1, score2, score3...)
Where -
- score1, score2, score3 are the results of a score function.
- score_sum is the sum of score1, score2, score3...
The function receives n scores (results of score functions) and returns the maximum of their values
Syntax
max(float score1, float score2)
Example
score1 = rarity_max("city")
score2 = rarity_max("Country")
score3 = rarity_max("Continent")
score4 = .....
score_max = max(score1, score2, score3...)
Where -
- score1, score2, score3 are the results of a score function.
- score_max is the maximum between score1, score2, score3...
rarity_sum and rarity_max may only return different score for list[keywords]. In particular, when used for matching fields of type keyword, they will always return the same score.
Hyperspace allows arithmetic operations between scores, using the operators
+, *, -, /
. These operators can be used in combination with the operator =
Example
score0 = 0.0
if (match("field 1") or match("field2") or match("Expiration date")):
score0 += rarity_max("visit_times_in_personal care")
score0 -= rarity_sum("Credit card")
score1 = 2 * score0
Where-
- score0 is the result of a score function.
Hyperspace allows to include the KNN vector score in the lexical score function, by using the function
distance
(str vector_fieldname1, str vector_fieldname2, r32 min_score)
. The
distance()
function calculates the KNN score based on the metric defined in the data configuration schema file. It will then return the score if it is above the min_score_threhold,
or 0 otherwise min_score
can be a dynamic value, provided as part of the query params.By default,vector_fieldname2= vector_fieldname1 and min_score_threhold = 0
The distance function can only be used as part of the last return statement.
In addition, all other
return
statements mustreturn 0
, False
or none
. For example:Example 1:
def score_function(params, doc):
if match("genre"):
return
else if match("countries"):
return False
score = rarity_max("tags")
if score < 1:
return 0
return score1 + 0.3 * distance("tagline_embedding", 0.2)
In the above example, distance calculates the KNN score between
params["tagline_embedding"]
and doc["
tagline_embedding"]
. If the score is above 0.2, the function will return score1 + 0.3 * knn_score. Otherwise it will return score1.Example 2:
def score_function(params, doc):
score1 = rarity_max("tags")
return score1 + distance("tagline_embedding", "overview_embedding", params["min_score"])
In the above example, distance calculates the KNN score between
params["tagline_embedding"]
and doc["overview_embedding"]
. If the score is above params["min_score"], it will return score1 + distance. Otherwise it will return score1.Last modified 3d ago