Schedules

Schedules in Fusion allow you to execute any Fusion service, Solr request, or other HTTP request on a defined timetable. A scheduler job definition consists of these parameters:

  • The start time

  • The repeat interval

  • The address to an endpoint that will perform the requested actions

For example, you could schedule a Solr query to run at a specified time every day, or you could define a datasource to be re-crawled once a week. The schedules service does not execute any business logic; the service at the specified endpoint must provide this.

The Fusion scheduler is fault-tolerant and distributed across the nodes of your cluster. Several instances of the scheduler service can run on different nodes, but only one of them at a time executes and modifies schedules. This instance is elected as the schedule "leader", which occurs in ZooKeeper in a similar way to how SolrCloud node leaders are elected. The instances that are not the leader are on standby in case the leader goes down. In addition, the schedule job definitions are also kept in ZooKeeper, which allows them to be restored to any node whenever needed.

Defining a Scheduler Job

When defining a job with the scheduler service, there are two main aspects to configuration:

  • Define some time parameters such as when the job will run and how often it will repeat.

  • Define the call that will be executed.

You can do this through the Fusion UI or the Scheduler API.

Defining a Scheduler Job in The Fusion UI

  1. Navigate to Applications > Scheduler.

  2. Click Add a Schedule.

  3. Enter the parameters for the scheduler job:

    • Schedule Name - Any arbitrary string (required)

    • Service - The endpoint and method for the service to run (required)

      Select the protocol:

      • "http://" or "https://"

      • "solr://{collection}/…​": A SolrCloud request.

      • "service://{serviceName}/{path}": A load-balanced Fusion service request.

    • Start Time - The date and time at which to begin running the first instance of this job

    • End Time - The date and time after which this job will be disabled

    • Run Once - To run the job at regular intervals, uncheck this option.

    • Interval - The interval at which to repeat this job

    • Active? - To disable the job, uncheck this option.

      Now your configuration should look something like this:

      Adding a schedule via the UI

      In this example, the scheduler job crawls MyDataSource every hour. If you want this to happen every hour on the hour, you can set the start time to 11:00:00 (or any other hour). In the case of a crawl job like this one, you can check the job history by navigating to Applications > Collections > CollectionName > Datasources > DatasourceName > Job History.

  4. Click Save.

Time Properties

The time properties allow defining an initial startTime and also an endTime after which the scheduled job should no longer occur. The endTime does not stop a job, instead it has the same effect as setting the entire schedule to "inactive" at a certain date.

Two properties allow defining the repeat interval. First there is the "interval" itself, which is an integer. The interval can be '0', in which case the scheduled job would only be run once. When the interval is higher than '0', meaning the job will be run more than once, a second property, "repeatUnit", allows defining the unit of time to use in conjunction with the interval. The allowed values are:

  • "millisecond" or "ms"

  • "second" or "sec"

  • "minute" or "min"

  • "hour" or "hr"

  • "day"

  • "week"

  • "month"

These values are case insensitive, meaning they can be entered in upper or lower case as you prefer.

Call Properties

The call properties are where the actual task of the schedule is defined.

First and most importantly, the "uri" needs to be defined. This can take many forms:

  • An HTTP or HTTPS request: <protocol>://<path>

  • A Solr request, in the form of "solr://<collection>/…​". For example, you could periodically issue a commit request to Solr. Or you could periodically run a query against a specific collection

  • A Fusion service request, in the form of "service://<serviceName>/<path>". The services available are stored in ZooKeeper. You can find them in the Admin UI under the "System" tab, or with a REST API call to the /introspect endpoint

Once the uri has been defined, you then also need to tell the schedule what "method" of call to use (i.e., GET, POST, PUT, DELETE) and any additional "headers" that might be required (such as, "Content-type" or "If-Modified-Since", etc.).

Depending on the request, you may also want to define "queryParams" or an "entity". For Solr requests, queryParams would be pretty much any valid query parameter for the URI being used.

API Examples

Each of these examples shows setting a schedule for an action in the system, using the Scheduler API. To see the results of a job, you will likely need to query the History API.

Issue a commit every 10 seconds
{"creatorType":"human", "creatorId":"me", "repeatUnit":"SECOND", "interval":10, "active":true, "callParams":{"uri":"solr://myCollection/update", "method":"GET", "queryParams":{"stream.body":"<commit/>"}}}

In this example, we’ve defined the callParams with a URI for Solr that calls a collection named 'myCollection' and the 'update' updateHandler. The method is GET. The queryParams define the commit call for Solr. For timing, we’ve defined the job to run every 10 seconds.

Run a datasource every 20 minutes
{"creatorType":"human", "creatorId":"me", "repeatUnit":"MINUTE", "interval":20, "active":true, "callParams":{"uri":"service://connectors/jobs/TwitterSearch", "method":"POST"}}

In this example, we’ve defined the callParams with a URI for Fusion that calls the TwitterSearch datasource job. The method is POST, which is the method to use when starting a crawl. There aren’t any other properties needed to define the task. For timing, we’ve defined the job to run every 20 minutes.

Remove signals older than 1 month
{"creatorType":"human", "creatorId":"me", "repeatUnit":"MONTH", "interval":1, "active":true, "callParams":{"uri":"solr://myCollection_signals/update", "queryParams":"stream.body=<delete><query>timestamp_dt:[* TO NOW-1MONTH]</query></delete>", "method":"GET"}}

In this example, we’re again calling Solr’s 'update' updateHandler with a collection named 'myCollection_signals', which is the default location for signals. This time we’ve also defined queryParams to delete documents that match a date query that finds all documents older than 1 month old. For timing, we’ve set this to run once a month.