Scoring and Ranking

Hyperspace supports various methods of scoring and arithmetic based on the rarity of keywords in the collection.

Rarity Score (TF-IDF)

The term Frequency-Inverse Document Frequency (TF-IDF) is a numerical statistic that measures the importance of a term within a document in a corpus. It is used as the default score for matched terms.

Example

In the following example, documents that match are scored using the TF-IDF formula and ranked accordingly.

{
  "query": {
    "bool": {
      "must": 
        {
          "term": {
            "Color": "Black"
         }
      }
    }
  }
}

'dis_max' clause

The dis_max query selects the highest score from a list of subqueries.

Example

In the following example, documents that match are scored using the TF-IDF formula, and the highest score among them is returned.

{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "State": "MA"
          }
        },
        {
          "match": {
             "City": "Boston"
          }
        }
      ]
    }
  }
}

Function Score

The function_score query modifies the relevance score of documents returned by a query. It is particularly useful for introducing custom scoring logic, boosting certain documents, or applying mathematical functions to influence the relevance of search results. The function_score query wraps around an existing query (such as a match query) and modifies the scores produced by that query. Scoring functions are defined within the functions array. Each function applies specific logic to modify the relevance score of documents. Common types of functions include –

weight– Assigns a static weight to the documents.
field_value_factor – Scales scores based on the values of a numeric field.
script_score– Enables you to define custom scoring logic using a script.
random_score – Introduces randomness to the scores.

Combining Functions

Multiple scoring functions can be defined within the functions array. The results of these functions are combined to produce the final relevance score. You can control how the scores are combined using parameters like score_mode and boost_mode.

Boost Mode

The boost_mode parameter specifies how the scores from different functions are combined. Common options include –

multiply– Multiply the scores from different functions.
sum – Add the scores from different functions.
replace– Use the score of the first function that produces a non-zero score.

Score Mode

The score_mode parameter determines how the scores of individual functions are combined. Common options include –

multiply– Multiply the scores.
sum – Add the scores.
avg – Calculate the average of the scores.

In the following example, the function_score query is applied to a match query. It includes two functions – one that assigns a static weight of 2, and another that scales the scores based on the square root of a numeric field.

The first function (weight) multiplies the score by 2.0 (weight * base score).
The second function (field_value_factor) uses the square root of the numeric field.
The final score for this document would be the sum of these scores (basis_score + first function+ second function)

{
  "query": {
    "function_score": {
      "query": {
        "match": { "field": "value" }
      },
      "functions": [
        {
          "weight": 2
        },
        {
          "field_value_factor": {
            "field": "numeric_field",
            "factor": 1.5,
            "modifier": "sqrt"
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

Boost

The "boost" clause controls the relevance or importance of specific conditions within a search query by manipulating scores. It is often employed when certain criteria or attributes should carry more weight in the search results, allowing for fine-tuned control over the relevance scoring.

In the above example, the boost cause is used to specify a constant score. If a document has a field named "City" with a value of "Washington", the score is 1.5. Otherwise, it is 0, regardless of rarity.

Example

{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "City": "Washington"
        }
      },
      "boost": 1.5
    }
  }
}

PreviousCandidate Generation and Metadata Filtering NextHyperspace Query Flow

Last updated 1 year ago