Candidate Generation and Metadata Filtering

Candidate filtering is the process of narrowing down a set of potential documents that might be relevant to a search query before the scoring phase. You can filter candidates using the following methods:

  • Exact match between keywords

  • Window Match Between Dates

  • Match Between Geo Coordinates

Term Match

The term query is used to search for documents that contain a specific exact value in a particular field. It is designed for exact matches and is commonly used for fields that are not analyzed, such as keyword fields.

You can match either keywords or keywords or lists . For keywords, match requires exact match between the keywords and for lists of keywords, it requires an exact match between any two keywords in the two lists.

Example 1:

{
  "query": {
    "term": {
      "Continent": "Asia"
    }
  }
}

In the above example, Candidates must include the field 'Continent' and contain the value "Asia" under the field 'Continent' will be returned

Example 2:

{
  "query": {
    "term": {
      "Continent": ["Asia", "Europe", "Africa"]
    }
  }
}

In the above example, candidates must include the field 'Continent' with any of the following values - "Asia", "Europe", "Africa" under the field 'Continent' will be returned

Range Match

The range query allow you to filter documents based on a specified range of values within a given field. It can be used for numeric and date fields. The Range query uses the following terms:

  • "gte": the document must be greater than or equals to the provided values

  • "gt": the document must be greater than the provided values

  • "lte": the document must be smaller than or equals to the provided values

  • "lt": the document must be smaller than the provided values

Example 1:

{
  "query": {
    "range": {
      "date": {
        "gte": "2023-01-01",
        "lte": "2023-12-31"
      }
    }
  }
}

The above example requires candidates to have a field named "date" with values that are greater than or equal to "2023-01-01" and smaller than or equal to "2023-12-31".

Example 2:

{
  "query": {
    "range": {
      "datetime": {
        "gte": "2023-01-01T08:00:00",
        "lt": "2023-01-01T17:30:00""
      }
    }
  }
}

The above example requires candidates to have a field named "datetime" with values that are greater than or equal to "2023-01-01T08:00:00" and smaller than "2023-01-01T17:30:00"".

Example 3:

{
  "query": {
    "range": {
      "price": {
        "gt": 10,
        "lte": 30
      }
    }
  }
}

The above example requires candidates to have a field named "price" with values that are greater than "10" and smaller or equal to than "30"".

Geo Coordinates Match

geo_distance query allows you to perform proximity searches based on geographic coordinates and allows you to find documents that are within a specified distance from a given geographical point. The query uses the following terms -

  • "distance" - Specifies the distance within which you want to search. This can be expressed in units like "km" (kilometers), "mi" (miles), "m" (meters), "yd" (yards) and "ft" (feet).

  • "point" - Specifies the point around which to center, in geo coordinates "lat", "lon".

Example:

{
  "query": {
    "geo_distance": {
      "distance": "10km",
      "point": {
        "lat": 31.19,
        "lon": -44.41
      }
    }
  }
}

In the above example, the query is looking for documents with a field named "point", whose value is within a 10-kilometer radius of the coordinates (31.19, -44.41)

Last updated