Looking for the old docs site? You can still view it for a limited time here.

Built-in SQL Functions

In addition to the SQL functions provided by Spark, Fusion provides several additional functions to simplify common aggregation tasks. To recap, a UDAF aggregates multiple rows for the same group by key and a UDF performs some operation on a single row.


The weighted_sum UDAF takes a weight, type, and type-weight mapping to produce an aggregated weight. For example, consider the following SQL snippet:

   SELECT query,
          weighted_sum(typed_weight_d, type, 'click:1.0,cart:10.0') AS weight_d
     FROM signal_type_groups
 GROUP BY query, doc_id, filters

When applied to the rows in the table below, the weighted_sum function produces a final weight_d of 12.0 (2*1.0 + 1*10.0). The UDAF is passed rows grouped by query, doc_id, and filters.

query type doc_id filters typed_weight_d











significantTerms The significantTerms UDF allows the user to query data for significant terms within the results. This identifies terms that appear an unusaully frequently in a specific result set relative to the index. It can be used to identify anomalies in data.

This UDF can be used on tokenized text and multi-value and single-value string fields.


select significantTerms(fieldX), score from collectionName where fieldy = 'my-query' limit 10


The timediff UDF computes the difference, in milliseconds, between to timestamps in the same row. From the session_rollup job, the timediff function computes the difference between the current time and the last activity in a session.


The click_pos UDF computes either a reciprocal rank or a raw click position (using a 0-based index) of a document in a page of results. This UDF is used to compute the mean reciprocal rank (MRR) for experiments. For example, given the following list of documents and a doc ID, the click_pos UDF will return 2:

docs: a,b,c,d
doc ID: c


The concat_text UDF combines multivalued text fields coming from Solr into a field with a single value delimited by spaces. This UDF is useful when a field returned from Solr uses the _txt suffix, which indicates a multivalued text field.