Connector Jobs API

The Connector Jobs API provides methods to start, monitor, and stop a datasource. A datasource is specific connector instance that connects to a defined repository, collects content, and sends it through an index pipeline.

The Connector Jobs API can also provide detailed information about the indexing job and each stage of the index pipeline.

To define and launch a job, simply use the datasource id with a POST request to start crawling.

Start, Stop or Check Status of a Job

The request path is:

/api/apollo/connectors/jobs/<id>

where <id> is the name of the datasource. A GET request will return the status of a job. A POST will create and start a job. PUT will start an existing job. DELETE will stop the job.

DELETE requests have the following parameters which control how running jobs are stopped:

Parameter Description

abort

When false, the default, the job will be allowed to finish processing before stopping. Use true, if you need the job to stop immediately.

wait_time

The wait_time allows you to configure the number of milliseconds to wait while stopping a job before the job is aborted.

If the job id is not specified, the DELETE request has the following parameters:

Parameter Description

connector

The name of a connector. This would allow you to stop all jobs using the 'lucid.dih' connector, for example, to stop all database crawls.

collection

The name of a collection, to stop all jobs for a single named collection.

Input

None.

Output

The output will include the state of the job (RUNNING, FINISHED, etc.), the start time, and documents retrieved so far (in counters).

Examples

Note
Use port 8765 in local development environments only. In production, use port 8764.

Start crawling a datasource named "MyDocs":

REQUEST

curl -u user:pass -X POST http://localhost:8764/api/apollo/connectors/jobs/MyDocs

RESPONSE

{
  "id" : "MyDocs",
  "dataSourceId" : "MyDocs",
  "state" : "RUNNING",
  "message" : null,
  "startTime" : 1397840639000,
  "endTime" : -1,
  "finished" : false,
  "counters" : { },
  "exception" : null,
  "running" : true
}

Get Detailed Job Statistics

To GET detailed job statistics, the request path is:

`/api/apollo/connectors/jobs/<id>/pipeline`

where <id> is the name of the datasource.

Input

None.

Output

The output will show the results of each stage of the index pipeline, including the number of documents processed at each stage.

For example, the field-mapping stage would just show the number of documents that passed through the stage. The tika-parser stage, however, will show the number of documents input from the connector and the number output to the next stage. In a solr-index stage, the number of documents processed would indicate the number of documents that were added to the index.

If a stage encountered errors for any or all documents, the error would be shown for each document. If all documents failed due to a badly formed index pipeline, the output of this REST API may be quite lengthy.

Examples

REQUEST

curl -u user:pass http://localhost:8764/api/apollo/connectors/jobs/TwitterSearch/pipeline

RESPONSE

{
  "stats" : {
    "counters" : {
      "stage.field-mapping::twitter-mapping" : {
        "processed" : 101
      },
      "stage.logging::conn_logging" : {
        "info" : 202,
        "processed" : 202
      },
      "stage.solr-index::solr-default" : {
        "command.ok.commit" : 1,
        "processed" : 201
      },
      "stage.tika-parser::tika" : {
        "input" : 101,
        "processed" : 202
      }
    },
    "gauges" : { },
    "histograms" : { },
    "meters" : { },
    "timers" : { }
  },
  "history" : {
    "events" : [ ]
  }
}