Aggregations

Hyperspace allows aggregations of numerical fields over selected documents. The aggregations function is implemented inside the score function, where each aggregation is performed over the candidates that passed the filtering up to this position in the code.

The aggregation result is stored as a key under the query results object.

For example -

def score_function_recommendation( params , doc):
      score1 = 0.0
      score2 = 1.0
      sum_score = 0.0
      if match('genres'):
         aggregate_max("max_rating", "rating")
         if match('languages'):
             aggregate_sum ("sum_budget", "budget")
             aggregate_percentile ("percentile_budget", "budget", [10,15,32,75])
         score1 += rarity_sum('genres')
         sum_score = score1 + score2
         if match('collection'):
           sum_score += 10
        return boost * sum_score
     return 0.0

In the above example, the query results will include a key named "'aggregations", with the following sub keys:

a key named “max_rating”, with a value of the max value of the rating of all candidates that passed the filter over genres.
a key named "sum_budget", which includes the sum over the field “budget” of all candidates that passed the filters over genres and languages.
a key named "percentile_budget", which includes the 10,15,32 and 75 percentile over the field “budget” of all candidates that passed the filters over genres and languages.

The following aggregations types are supported

aggregate_sum(str agg_name, str fieldname) - Returns the sum of the field over the relevant candidates
aggregate_min (str agg_name, str fieldname) - Returns the min of the field over the relevant candidates
aggregate_max (str agg_name, str fieldname) - Returns the max of the field over the relevant candidates
aggregate_avg (str agg_name, str fieldname) - Returns the average of the field over the relevant candidates
aggregate_count (str agg_name) - Returns the total number of valid field entries in the relevant candidates
aggregate_cardinality (str agg_name, str fieldname) - Returns the total number of valid field values in the relevant candidates
aggregate_percentile(str agg_name, str fieldname, list[float] percentiles) - Returns the percentiles of the field over the relevant candidates.

Date Histogram

You can create store the aggregation results as a histograms by date, using the function date_histogram(str agg_name, str fieldname, str time_interval).

The aggregation result will be stored under key "agg_name". Results will be binned to a histogram with resolution determined by time_interval. The available units for time_intervalare s/m/h/d.

Example:

with date_histogram("agg_0", "fieldname1", "1d") as obj_0:
    obj_0.aggregate_max("agg_max","fieldname1")

In this example, the aggregation results is binned into a histogram where the width of each bin is 1d.

Aggregation functions include all data points that reached the relevant code part, regardless if they are included in the candidate list

PreviousDebugging the Score Function NextHyperspace Query Flow

Last updated 8 months ago