Candidate Score

Hyperspace support various methods of scoring and arithmetics, based on rarity of keywords in the collection.

Rarity Score (TF-IDF)

The rarity score can be calculated for matched keywords. Hyperspace calculate this score over keywords or lists of keywords, using the TF-IDF formula.

Two different types of usages are currently allowed -

  • rarity_max(str fieldname) returns the maximum rarity out of all the keywords in the list,

  • rarity_sum(str fieldname) returns the sum of rarities of all the keywords in the list.

For keyword fields (non lists) the two functions will return the same result.

Example:

score = rarity_max("cities") + rarity_sum("streets")

Score Operations

Hyperspace allows multiple methods for score arithmetic, as explained below

  • Sum

  • Max

  • Arithmetic operations

Sum of Scores

The function receives n scores (results of score functions) and returns their sum

Syntax

sum (float score1, float score2,...)

Example

Where -

  • score1, score2, score3 are the results of a score function.

  • score_sum is the sum of score1, score2, score3...

Max of Scores

The function receives n scores (results of score functions) and returns the maximum of their values

Syntax

Example

Where -

  • score1, score2, score3 are the results of a score function.

  • score_max is the maximum between score1, score2, score3...

rarity_sum and rarity_max may only return different score for list[keywords]. In particular, when used for matching fields of type keyword, they will always return the same score.

Arithmetic Operators

Hyperspace allows arithmetic operations between scores, using the operators +, *, -, / . These operators can be used in combination with the operator =

Example

Where-

score0 is the result of a score function.

Vector Distance

Hyperspace allows to include the KNN vector score in the lexical score function, by using the function distance(str vector_fieldname1, str vector_fieldname2, r32 min_score).

The distance()function calculates the KNN score based on the metric defined in the data configuration schema file. It will then return the score if it is above the min_score_threhold,or 0 otherwise min_score can be a dynamic value, provided as part of the query params.

By default,vector_fieldname2= vector_fieldname1 and min_score_threhold = 0

Limitations

The distance function can only be used as part of the last return statement.

In addition, all other return statements mustreturn 0, False or none. For example:

Example 1:

In the above example, distance calculates the KNN score between params["tagline_embedding"]and doc["tagline_embedding"]. If the score is above 0.2, the function will return score1 + 0.3 * knn_score. Otherwise it will return score1.

Example 2:

In the above example, distance calculates the KNN score between params["tagline_embedding"]and doc["overview_embedding"]. If the score is above params["min_score"], it will return score1 + distance. Otherwise it will return score1.

Last updated