Write SQL Aggregations

In this article, we provide guidance to help you make the most of the Fusion SQL aggregation engine.

Project fields into the signals_aggr collection

For legacy reasons, the <collection>_signals_aggr collection relies on dynamic field names, such as doc_id_s and query_s instead of doc_id and query. Consequently, when you project fields to be written to the <collection>_signals_aggr collection, you should use dynamic field suffixes as shown in the SQL snippet below:

SELECT SUM(typed_aggr_count_i) AS aggr_count_i,
        query AS query_s,
        query AS query_t,
        doc_id AS doc_id_s,
        filters AS filters_s,
        SPLIT(filters, ' \\$ ') AS filters_ss,
        weighted_sum(...) AS weight_d
      FROM signal_type_groups
  GROUP BY query, doc_id, filters

You’re not required to use this approach, but if you don’t use dynamic field suffixes as shown above, you’ll need to change the boosting stages in Fusion to work with different field names.

Use WITH to organize complex queries

A common pattern in SQL aggregation queries is the use of subqueries to break up the logic into comprehensible units. For more information about the WITH clause, see https://modern-sql.com/feature/with. Let’s work through an example to illustrate the key points:

 1:  WITH signal_type_groups AS (
 2:      SELECT SUM(count_i) AS typed_aggr_count_i,
 3:             doc_id,
 4:             user_id,
 5:             type,
 6:             time_decay(count_i, timestamp_tdt) AS typed_weight_d
 7:        FROM product_signals
 8:       WHERE type IN ('click','cart','purchase')
 9:    GROUP BY user_id, doc_id, type
10: ) SELECT SUM(typed_aggr_count_i) AS aggr_count_i,
11:          doc_id AS doc_id_s,
12:          user_id AS user_id_s,
13:      weighted_sum(...) AS weight_d
14:     FROM signal_type_groups
15: GROUP BY doc_id, user_id
  • At line 1, we declare a statement scoped view named signal_type_groups using the WITH keyword.

  • Lines 2-9 define the subquery for the signal_type_groups view.

  • At line 7, we read from the product_signals collection in Fusion.

  • Line 8 filters the input to only include click, cart, and purchase signals. Behind the scenes, Fusion translates this WHERE IN clause to a Solr filter query, e.g. fq=type:(click OR cart OR purchase). The signal_type_groups view produces rows grouped by user_id, doc_id, and type (line 9).

  • Starting at line 10, we define a subquery that performs a rollup over the rows in the signal_type_groups view by grouping on doc_id and user_id (line 15). Notice how the WITH statement helps break this complex query up into two units that help make aggregation queries easier to comprehend. You are encouraged to adopt this pattern in your own SQL aggregation queries.