Scoring and Ranking

Hyperspace support various methods of scoring and arithmetic, based on rarity of keywords in the collection.

Rarity Score (TF-IDF)

Term Frequency-Inverse Document Frequency(TF-IDF), is a numerical statistic that reflects the importance of a term within a document in a corpus. It is the default score for matched terms.

Example:

{
  "query": {
    "bool": {
      "must": 
        {
          "term": {
            "Color": "Black"
         }
      }
    }
  }
}

In the above example, matched documents will be assigned with a score based on the TF-IDF formula and will be ranked accordingly.

'dis_max' clause

The dis_max query allows you to select the highest score of a list of subqueries.

Example:

{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "State": "MA"
          }
        },
        {
          "match": {
             "City": "Boston"
          }
        }
      ]
    }
  }
}

In the above example, matched documents will be assigned with a score based on the TF-IDF formula and the maximum score will be returned.

Function Score

The function_score query allows you to modify the relevance score of documents returned by a query. It's particularly useful when you want to introduce custom scoring logic, boost certain documents, or apply mathematical functions to influence the relevance of search results. The function_score query wraps around an existing query (e.g., a match query) and modifies the scores produced by that query. Scoring functions are defined within the functions array. Each function applies a specific logic to modify the relevance score of documents. Common types of functions include:

  • weight: Assigns a static weight to the documents.

  • field_value_factor: Scales scores based on the values of a numeric field.

  • script_score: Allows you to define custom scoring logic using a script.

  • random_score: Introduces randomness to the scores.

Combining Functions:

Multiple scoring functions can be defined within the functions array. The results of these functions are combined to produce the final relevance score. You can control how the scores are combined using parameters like score_mode and boost_mode.

Boost Mode:

The boost_mode parameter specifies how the scores from different functions are combined. Common options include:

  • multiply: Multiply the scores from different functions.

  • sum: Add the scores from different functions.

  • replace: Use the score of the first function that produces a non-zero score.

Score Mode:

The score_mode parameter determines how the scores of individual functions are combined. Common options include:

  • multiply: Multiply the scores.

  • sum: Add the scores.

  • avg: Calculate the average of the scores.

{
  "query": {
    "function_score": {
      "query": {
        "match": { "field": "value" }
      },
      "functions": [
        {
          "weight": 2
        },
        {
          "field_value_factor": {
            "field": "numeric_field",
            "factor": 1.5,
            "modifier": "sqrt"
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

In the above example, the function_score query is applied to a match query. It includes two functions: one that assigns a static weight of 2, and another that scales the scores based on the square root of a numeric field.

  • The first function (weight) multiplies the score by 2.0 (weight * base score).

  • The second function (field_value_factor) uses the square root of the numeric field.

  • The final score for this document would be the sum of these scores (basis_score + first function+ second function)

Boost

The "boost" clause allows you to control the relevance or importance of specific conditions within a search query through manipulation of scores. The "boost" clause is often employed when certain criteria or attributes should carry more weight in the search results, allowing for fine-tuned control over the relevance scoring.

Example

{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "City": "Washington"
        }
      },
      "boost": 1.5
    }
  }
}

In the above example, the boost cause is used to specify constant score. If a document has a field named "City" with a value "Washington", the score will be 1.5. The score will otherwise be zero. This is regardless of rarity.

Last updated