Scoring and Ranking
Hyperspace supports various methods of scoring and arithmetic based on the rarity of keywords in the collection.
Rarity Score (TF-IDF)
The term Frequency-Inverse Document Frequency (TF-IDF) is a numerical statistic that measures the importance of a term within a document in a corpus. It is used as the default score for matched terms.
Example
In the following example, documents that match are scored using the TF-IDF formula and ranked accordingly.
'dis_max' clause
The dis_max query selects the highest score from a list of subqueries.
Example
In the following example, documents that match are scored using the TF-IDF formula, and the highest score among them is returned.
Function Score
The function_score
query modifies the relevance score of documents returned by a query. It is particularly useful for introducing custom scoring logic, boosting certain documents, or applying mathematical functions to influence the relevance of search results. The function_score
query wraps around an existing query (such as a match
query) and modifies the scores produced by that query.
Scoring functions are defined within the functions
array. Each function applies specific logic to modify the relevance score of documents. Common types of functions include –
weight
– Assigns a static weight to the documents.field_value_factor
– Scales scores based on the values of a numeric field.script_score
– Enables you to define custom scoring logic using a script.random_score
– Introduces randomness to the scores.
Combining Functions
Multiple scoring functions can be defined within the functions
array. The results of these functions are combined to produce the final relevance score. You can control how the scores are combined using parameters like score_mode
and boost_mode
.
Boost Mode
The boost_mode
parameter specifies how the scores from different functions are combined. Common options include –
multiply
– Multiply the scores from different functions.sum
– Add the scores from different functions.replace
– Use the score of the first function that produces a non-zero score.
Score Mode
The score_mode
parameter determines how the scores of individual functions are combined. Common options include –
multiply
– Multiply the scores.sum
– Add the scores.avg
– Calculate the average of the scores.
In the following example, the function_score
query is applied to a match
query. It includes two functions – one that assigns a static weight of 2, and another that scales the scores based on the square root of a numeric field.
The first function (weight) multiplies the score by 2.0 (weight * base score).
The second function (field_value_factor) uses the square root of the numeric field.
The final score for this document would be the sum of these scores (basis_score + first function+ second function)
Boost
The "boost" clause controls the relevance or importance of specific conditions within a search query by manipulating scores. It is often employed when certain criteria or attributes should carry more weight in the search results, allowing for fine-tuned control over the relevance scoring.
In the above example, the boost cause is used to specify a constant score. If a document has a field named "City" with a value of "Washington", the score is 1.5. Otherwise, it is 0, regardless of rarity.
Example
Last updated