Aggregations

Hyperspace allows aggregations of numerical fields over candidate lists. The aggregations can be implemented inside the score function, where each aggregation is performed over the candidates that passed the filtering up to its position in the code.

The aggregation result will be returned under the query results objects, as a separate key. Example:

def score_function_recommendation( params , doc):
      score1 = 0.0
      score2 = 1.0
      sum_score = 0.0
      if match('genres'):
         aggregate_max("max_rating", "rating")
         if match('languages'):
             aggregate_sum ("sum_budget", "budget")
         score1 += rarity_sum('genres')
         sum_score = score1 + score2
         if match('collection'):
           sum_score += 10
        return boost * sum_score
     return 0.0

In the above example, the query results will include a

  • a key named “max_rating”, with a value of the max value of the rating of all candidates that passed the filter over genres.

  • a key named "sum_budget", which includes the sum over the field “budget” of all candidates that passed the filters over genres and languages.

The following aggregations types are supported

  • aggregate_sum(str agg_name, str fieldname) - Returns the sum of the field over the relevant candidates

  • aggregate_min (str agg_name, str fieldname) - Returns the min of the field over the relevant candidates

  • aggregate_max (str agg_name, str fieldname) - Returns the max of the field over the relevant candidates

  • aggregate_avg (str agg_name, str fieldname) - Returns the average of the field over the relevant candidates

  • aggregate_median (str agg_name, str fieldname) - Returns the median of the field over the relevant candidates

  • aggregate_count (str agg_name) - Returns the total number of valid field entries in the relevant candidates

  • aggregate_cardinality (str agg_name, str fieldname) - Returns the total number of valid field values in the relevant candidates

Date Histogram

Hyperspace allows to create histograms by date of the aggregation results, using the function date_histogram(str agg_name, str fieldname, str time_interval).

The aggregation result will be saved under the same key as a standard aggregation. However, results will be segmented as a histogram with resolution determined by time_interval. The available units for time_intervalare s/m/h/d.

Example:

with date_histogram("agg_0", "fieldname1", "1d") as obj_0:
    obj_0.aggregate_max("agg_max","fieldname1")

In this example, the aggregation results will be binned into a histogram where the width of each bin is 1d.

In score functions, only the outer "if" condition generate candidates (if match('genres') in the example) while the inner "if" conditions only change their score. By contrast, when using aggregations, all "if" conditions have the same effect.

Last updated

#108: Max's Nov 6 changes

Change request updated