Skip to main content
A datasource job specifies the origin of the data imported and indexed into Fusion. Datasources include databases, files, and data obtained from websites or applications. Sources of data can be uploaded to Fusion in System > Blobs. In addition, you can create and configure the datasource job using the following methods:
  • The Index Workbench. Navigate to Indexing > Index Workbench > Configure Datasource. You can upload a particular ZIP file or select a datasource type which includes supported Fusion connectors. The list is divided datasources already installed and datasources not yet installed. When you select an uninstalled connector from the list, the system installs the connector.
  • Datasources. Navigate to Indexing > Datasources. If you click Add, you can select from a list of supported Fusion connectors. The list is divided into connectors already installed and connectors not yet installed. When you select an uninstalled connector from the list, the system installs the connector.
  • The Connector Datasources API. You can also use these endpoints to manage the crawl database and view connector schema.
Running the datasource job obtains and indexes the data in Fusion based on configuration parameters including the ability to:
  • Generate diagnostic logs
  • Limit document crawl levels or the number of documents obtained and indexed
  • Exclude or include files based on file extension, text or patterns in the document, and authentication methods
  • Set recrawl rules
  • Enter links where you want the crawl to begin

Learn more

  1. Create a datasource.
  2. Navigate to System > Scheduler.
  3. Select the datasource from the job list.
  4. Click New Schedule.
  5. Select and configure a trigger:
    • After Another Job Completes
      Enter the job ID and job result that trigger this one.
    • Cron String
      Enter a Quartz cron expression. See the Quartz documentation for details.
    • Start + Interval
      Enter a start date/time, an interval, and the interval units.
To start a datasource job, complete the following:
  1. Navigate to Collections > Jobs.
  2. Select the job you want to run and click Run > Start.
  3. To view the job status information and its result, click Job History.
To stop a datasource job, complete the following:
  1. After you start the job, you can click Stop in the Run window to stop the job. Stop a job
If you want to stop the job using the API, execute the following command in the Jobs API:
curl -u USERNAME:PASSWORD \
  -X POST \
  -H "Content-Type: application/json" \
  https://FUSION_HOST:FUSION_PORT/api/jobs/datasource:DATASOURCE_ID/actions \
  -d '{"action": "abort"}'
For more information, see Start/pause/abort a job.
Parsing and indexing of documents emitted prior to clicking stop are not stopped, so if you need to stop all ingestion activities, you must also complete the rest of the steps in this procedure.
  1. To stop parsing, execute the command to delete the applicable parser in the Parsers CRUD API:
curl -u USERNAME:PASSWORD \
  -X DELETE \
  https://FUSION_HOST:FUSION_PORT/api/parser/datasource/DATASOURCE_ID
For more information, see Delete a parser.
  1. To cancel Lucidworks AI requests, execute the following command in the Index Pipelines API:
curl -u USERNAME:PASSWORD \
  -X POST \
  https://FUSION_HOST:FUSION_PORT/api/index-pipelines/INDEX_PROFILE_ID/async-enrichment/skip-pending
For more information, see Skip pending async requests.
  1. To reset the index subscription, execute the following command in the Subscriptions API:
curl -u USERNAME:PASSWORD \
  -X POST \
  https://FUSION_HOST:FUSION_PORT/api/subscriptions/SUBSCRIPTION_ID/refresh?action=reset
For more information, see Refresh a subscription.