Looking for the old site?

How To

Browse By

  • Objective

  • Products

  • User Role

    How To
    Documentation
      Learn More

        Create and Run an SQL Aggregation Job

        You can perform a SQL aggregation on a signals collection for a datasource (or on some other collection), through the Fusion UI or using the Fusion API.

        Preliminaries

        Before you can create and run a SQL aggregation job, you must create an app, create a collection (or use the default collection for the app), and add a datasource. Before you run a SQL aggregation job, you need signal data. Otherwise, there is nothing to aggregate.

        Use the Fusion UI

        You can use the Fusion UI to perform a SQL aggregation.

        Set up a SQL aggregation job
        1. With the app open, navigate to Collections Collections > Jobs.

        2. Click Add and select Aggregation from the dropdown list.

        3. Specify an arbitrary Spark Job ID.

        4. For the Source Collection, select the collection that contains the data to aggregate.

          This is not the base collection. For example, to aggregate the signals in experiment_signals, you would select experiment_signals, not experiment.
        5. Under Aggregation Settings, expand SQL Aggregation.

        6. Enter or paste SQL in the SQL text box. Optionally, click Open Editor to open a dialog box with a larger editor. Click Close to close the dialog box.

        7. Click Save to save the job.

        Run a SQL aggregation job
        1. With the app open, navigate to Collections Collections > Jobs.

        2. In the list of jobs, click the job you want to run, and then click Run.

        3. Click Start.

        4. Click Close Close to close the job management part of the Fusion UI.

        Use the Fusion API

        You can use the Fusion API to perform a SQL aggregation. For example:

        Configure a SQL aggregation job:
        curl ":{api-port}/api/spark/configurations/experiment_signals_aggregation"
        {
          "type" : "spark",
          "id" : "experiment_click_signals_aggregation",
          "definition" : {
            "id" : "experiment_click_signals_aggregation",
            "sql" : "SELECT SUM(count_i) AS aggr_count_i, query AS query_s, doc_id AS doc_id_s, time_decay(count_i, timestamp_tdt) AS weight_d FROM default_signals WHERE type_s='click' GROUP BY query, doc_id",
            "selectQuery" : "*:*",
            "outputPipeline" : "_system",
            "rollupAggregator" : "SQL",
            "sourceRemove" : false,
            "sourceCatchup" : true,
            "outputRollup" : true,
            "parameters" : [ {
              "key" : "optimizeOutput",
              "value" : "4"
            } ]
          },
          "inputCollection" : "experiment_click_signals",
          "rows" : 10000
        }
        Run a SQL aggregation job:
        curl -u user:pass -X POST -H "Content-Type: application/json" http://localhost:{api-port}/api/jobs/spark:experiment_click_signals_aggregation/actions -d '{"action": "start"}'