A datasource is the configured connection between Fusion’s Connectors and your data sources.
In order to connect a data source to Fusion, you must create and configure the datasource in Fusion:
Using the Fusion UI
Click Admin > Collections.
Select a Collection, or click Add a Collection to create a new one.
Click Datasources > Add a Datasource.
Select a datasource type; these correspond to Fusion’s Connectors.
Enter configuration information in the datasource panel that appears.
Using the Connector Datasources API
A datasource must be configured with the following set of properties:
A datasource configuration may also include the following properties, depending on the application, data repository, and connector type:
authentication information, such as username, group names or IDs, passwords, or credential files.
rules for crawling a website or filesystem:
which nodes to crawl
which files to retrieve or exclude
The Connectors and Datasources Reference provides complete details about datasource configuration.
In the Fusion UI, the datasource configuration panel is accessed via the "Datasource" link on the home panel for a collection.
Once configured, this panel provides a control to run the datasource; when running it provides controls to stop or abort the run.
Once a run has started, a "job history" controls provides information on current and completed jobs.
Job status information is stored in ZooKeeper.
When crawling or spidering a website or filesystem, a record of the crawl
is stored in directory
Datasources are defined via the Connector Datasources API. Once configured, the connector job is initiated using the Connector Jobs API, and this service is used to request job history information as well.
Information on all runs of a particular datasource is retrieved by sending a GET request to the Fusion API services endpoint:
For example, to see the status of all jobs run for a datasource named 'myDatasource',
you can submit the following request
the following GET request can be sent you can get information on all runs by sending a GET request
via the command-line
curl -u user:pass http://localhost:8764/api/apollo/connectors/jobs/myDatasource
|For a limited range of document formats, documents can be added to a collection by pushing to an index pipeline directly, without use of a connector and datasource. Use cases for this include loading a massive dataset, or for application development, testing and troubleshooting. See Pushing Documents to a Pipeline for details.|