Candidate Generation and Metadata Filtering

Candidate filtering is the process of narrowing down a set of potential documents that might be relevant to a search query before the scoring phase. You can filter candidates using the following methods –

  • Exact Match Between Keywords

  • Window Match Between Dates

  • Match Between Geo Coordinates

Term Match

A Term query searches for documents that contain a specific, exact value in a particular field. It is designed for exact matches and is commonly used for fields that are not analyzed, such as keyword fields.

You can match either keywords or lists of keywords. For individual keywords, a match requires an exact match between the keywords. For lists of keywords, a match requires an exact match between any two keywords in the two lists.

Example 1

In the following example, candidates must include the field 'Continent' and contain the value "Asia" in the 'Continent' field

{
  "query": {
    "term": {
      "Continent": "Asia"
    }
  }
}

Example 2

In the following example, candidates must include the field 'Continent' with any of the following values - "Asia", "Europe", "Africa" in the field 'Continent' –

{
  "query": {
    "term": {
      "Continent": ["Asia", "Europe", "Africa"]
    }
  }
}

Range Match

The range query filters documents based on a specified range of values within a given field. It can be used for numeric and date fields. The Range query uses the following terms –

  • "gte" – The document's value must be greater than or equal to the provided values.

  • "gt" – The document's value must be greater than the provided values.

  • "lte" – The document's value must be smaller than or equal to the provided values.

  • "lt" – The document's value must be smaller than the provided values.

Example 1

The following example requires candidates to have a field named "date" with values that are greater than or equal to "2023-01-01" and smaller than or equal to "2023-12-31".

{
  "query": {
    "range": {
      "date": {
        "gte": "2023-01-01",
        "lte": "2023-12-31"
      }
    }
  }
}

Example 2

The above example requires candidates to have a field named "DateTime" with values greater than or equal to "2023-01-01T08:00:00" and smaller than "2023-01-01T17:30:00."

{
  "query": {
    "range": {
      "datetime": {
        "gte": "2023-01-01T08:00:00",
        "lt": "2023-01-01T17:30:00""
      }
    }
  }
}

Example 3

The above example requires candidates to have a field named "price" with values greater than "10" and smaller or equal to "30".

{
  "query": {
    "range": {
      "price": {
        "gt": 10,
        "lte": 30
      }
    }
  }
}

Geo Coordinates Match

The geo_distance query performs proximity searches based on geographic coordinates to find documents that are within a specified distance from a given geographical point. The query uses the following terms –

  • "distance"– Specifies the distance within which to search. This can be expressed in units like "km" (kilometers), "mi" (miles), "m" (meters), "yd" (yards) and "ft" (feet).

  • "point" – Specifies the point around which to center, in geocoordinates "lat", or "lon".

Example

In the following example, the query is searching for documents with a field named "point", whose value is within a 10-kilometer radius of the coordinates (31.19, -44.41)

{
  "query": {
    "geo_distance": {
      "distance": "10km",
      "point": {
        "lat": 31.19,
        "lon": -44.41
      }
    }
  }
}

Last updated