Schedules

Schedules in Fusion allow you to execute any Fusion service, Solr request, or other HTTP request on a defined timetable.

For example, you could schedule a Solr query to run at a specified time every day, or you could define a datasource to be re-crawled once a week. The schedules service does not execute any business logic; the service at the specified endpoint must provide this.

The Fusion scheduler is fault-tolerant and distributed across the nodes of your cluster. Several instances of the scheduler service can run on different nodes, but only one of them at a time executes and modifies schedules. This instance is elected as the schedule "leader", which occurs in ZooKeeper in a similar way to how SolrCloud node leaders are elected. The instances that are not the leader are on standby in case the leader goes down. The schedule job definitions are also kept in ZooKeeper, which allows them to be restored to any node whenever needed.

Scheduler job definitions

When defining a job with the scheduler service, there are two main aspects to configuration:

Time properties

These properties define the start time, end time, and repeat interval, if any.

  • startTime defines when the job should first run.

  • endTime defines when the job should no longer run.

    The endTime does not stop a running job; instead it has the same effect as setting the entire schedule to "inactive" at a certain date.

  • interval is an integer.

    • millisecond or ms

    • second or sec

    • minute or min

    • hour or hr

    • day

    • week

    • month

    The interval can be "0", in which case the scheduled job only runs once. When the interval is higher than "0", then repeatUnit must also be defined.

  • repeatUnit defines the unit of time to use in conjunction with the interval. The allowed values are:

These values are case-insensitive, meaning they can be entered in upper or lower case as you prefer.

Call properties

The call properties are where the actual task of the schedule is defined.

  • uri

    This can take several forms:

    • An HTTP or HTTPS request: <protocol>://<path>

    • A Solr request: solr://<collection>/…​

      For example, you could periodically issue a commit request to Solr. Or you could periodically run a query against a specific collection

    • A Fusion service request: service://<serviceName>/<path>

      The services available are stored in ZooKeeper. You can find them in the Admin UI under the "System" tab, or with a REST API call to the /introspect endpoint

  • method is the HTTP method to use.

  • header contains any additional required headers.

  • queryParams contains any additional query parameters.

    For Solr requests, queryParams may be any valid query parameter for the specified URI.

  • entity is the request body, if any.

How to define a scheduler job

There are two ways to define a scheduler job:

Defining a scheduler job in the Fusion UI

  1. Navigate to DevOps > Scheduler.

  2. Click Add a Schedule.

  3. Enter the parameters for the scheduler job:

    • Schedule Name – Any arbitrary string (required)

    • Service – The endpoint and method for the service to run (required)

      Select the protocol:

      • http:// or https://

      • solr://{collection}/…​

        A SolrCloud request.

      • service://{serviceName}/{path}

        A load-balanced Fusion service request.

    • Start Time – The date and time at which to begin running the first instance of this job

    • End Time – The date and time after which this job will be disabled

    • Run Once – To run the job at regular intervals, uncheck this option.

    • Interval – The interval at which to repeat this job

    • Active? – To disable the job, uncheck this option.

    Now your configuration should look something like this:

    Adding a schedule via the UI

    In this example, the scheduler job crawls MyDataSource every hour. If you want this to happen every hour on the hour, you can set the start time to 11:00:00 (or any other hour). In the case of a crawl job like this one, you can check the job history by navigating to Applications > Collections > CollectionName > Datasources > DatasourceName > Job History.

  4. Click Save.

API Examples

Each of these examples shows setting a schedule for an action in the system, using the Scheduler API. To see the results of a job, you will likely need to query the History API.

Issue a commit every 10 seconds
{"creatorType":"human", "creatorId":"me", "repeatUnit":"SECOND", "interval":10, "active":true, "callParams":{"uri":"solr://myCollection/update", "method":"GET", "queryParams":{"stream.body":"<commit/>"}}}

In this example, we’ve defined the callParams with a URI for Solr that calls a collection named 'myCollection' and the 'update' updateHandler. The method is GET. The queryParams define the commit call for Solr. For timing, we’ve defined the job to run every 10 seconds.

Run a datasource every 20 minutes
{"creatorType":"human", "creatorId":"me", "repeatUnit":"MINUTE", "interval":20, "active":true, "callParams":{"uri":"service://connectors/jobs/TwitterSearch", "method":"POST"}}

In this example, we’ve defined the callParams with a URI for Fusion that calls the TwitterSearch datasource job. The method is POST, which is the method to use when starting a crawl. There aren’t any other properties needed to define the task. For timing, we’ve defined the job to run every 20 minutes.

Remove signals older than 1 month
{"creatorType":"human", "creatorId":"me", "repeatUnit":"MONTH", "interval":1, "active":true, "callParams":{"uri":"solr://myCollection_signals/update", "queryParams":"stream.body=<delete><query>timestamp_dt:[* TO NOW-1MONTH]</query></delete>", "method":"GET"}}

In this example, we’re again calling Solr’s 'update' updateHandler with a collection named 'myCollection_signals', which is the default location for signals. This time we’ve also defined queryParams to delete documents that match a date query that finds all documents older than 1 month old. For timing, we’ve set this to run once a month.