Connector Datasources API

The connector datasources API is used to create and configure datasources, look at the crawl database, or clear items and tables from the crawl database.

See Connectors and Datasources for related information.

Working with the crawl database

Some of the connectors use a crawl database to track documents that have been seen by prior crawls and are able to use this information to understand which documents are new or have been updated or removed and take appropriate action in the index. The /api/connectors/datasources/{id}/db endpoints allow looking into the crawl database and dropping tables or clearing the database.

The connectors that support the crawl database are currently lucid.fs and lucid.solrxml. The lucid.anda connector also uses a crawl database, but it is not the same database, and does not have a REST API or other interface to access it.

Examining a crawlDB

The output from a GET request to /api/connectors/datasources/<id>/db will include several sections detailing the database structure:

  • counters: The counters section reports the document counts of database activities, such as table inserts.

  • ops: The ops section reports on database operations that have occurred, such as initiating tables, retrieving items, processing items and table drops.

  • tables: The tables section lists the tables in the database with a count of the number of items in each table. Inspecting the items is described in the next section.

Dropping tables from a crawlDB

The output from a DELETE request to /api/connectors/datasources/<id>/db/<table> will be empty. When dropping the database, note that no documents will be removed from the index. However, the crawl database will be empty, so on the next datasource run, all documents will be treated as though they were never seen by the connectors.

When dropping tables, be aware that the items table does not delete documents from the index, but instead changes the database so database considers them new documents. When dropping other tables, such as the errors table, it will merely clear out old error messages.

Getting or deleting items from a crawlDB

A GET request to /api/connectors/datasources/<id>/db/items/<item> retrieves information about an item or items.

A DELETE request removes the information from the Crawl Database only. Note that this doesn’t affect the Solr Index.

Examples

Note
Use port 8765 in local development environments only. In production, use port 8764.
Get datasources assigned to the "demo" collection:

REQUEST

curl -u user:pass http://localhost:8764/api/connectors/datasources?collection=demo

RESPONSE

[ {
  "id" : "database",
  "created" : "2014-05-04T19:47:22.867Z",
  "modified" : "2014-05-04T19:47:22.867Z",
  "connector" : "lucid.jdbc",
  "type" : "jdbc",
  "description" : null,
  "pipeline" : "conn_solr",
  "properties" : {
    "db" : null,
    "commit_on_finish" : true,
    "verify_access" : true,
    "sql_select_statement" : "select CONTENTID as id from CONTENT;",
    "debug" : false,
    "collection" : "demo",
    "password" : "password",
    "url" : "jdbc:postgresql://localhost:5432/db",
    "nested_queries" : null,
    "clean_in_full_import_mode" : true,
    "username" : "user",
    "delta_sql_query" : null,
    "commit_within" : 900000,
    "primary_key" : null,
    "driver" : "org.postgresql.Driver",
    "max_docs" : -1
  }
} ]
Note
Create a datasource to index Solr-formatted XML files:
In order to see the datasource object within the Fusion UI, it must be associated with an app. To do this, create the object using the /apps endpoint.

REQUEST

curl -u user:pass -X POST -H 'Content-type: application/json' -d '{"id":"SolrXML", "connector":"lucid.solrxml", "type":"solrxml", "properties":{"path":"/Applications/solr-4.10.2/example/exampledocs", "generate_unique_key":false, "collection":"MyCollection"}}' http://localhost:8764/api/connectors/datasources

RESPONSE

{
  "id" : "SolrXML",
  "created" : "2015-05-18T15:47:51.199Z",
  "modified" : "2015-05-18T15:47:51.199Z",
  "connector" : "lucid.solrxml",
  "type" : "solrxml",
  "properties" : {
    "commit_on_finish" : true,
    "verify_access" : true,
    "generate_unique_key" : false,
    "collection" : "MyCollection",
    "include_datasource_metadata" : true,
    "include_paths" : [ ".*\\.xml" ],
    "initial_mapping" : {
      "id" : "a35c9ff3-dbb6-434b-af40-597722c2986a",
      "skip" : false,
      "label" : "field-mapping",
      "type" : "field-mapping"
    },
    "path" : "/Applications/apache-repos/solr-4.10.2/example/exampledocs",
    "exclude_paths" : [ ],
    "url" : "file:/Applications/apache-repos/solr-4.10.2/example/exampledocs/",
    "max_docs" : -1
  }
}
Change the max_docs value for the above datasource:

REQUEST

curl -u user:pass -X PUT -H 'Content-type: application/json' -d '{"id":"SolrXML", "connector":"lucid.solrxml", "type":"solrxml", "properties":{"path":"/Applications/solr-4.10.2/example/exampledocs", "max_docs":10}}' http://localhost:8764/api/connectors/datasources/SolrXML

RESPONSE

true
Delete the datasource named 'database':

REQUEST

curl -u user:pass -X DELETE http://localhost:8764/api/connectors/datasources/database

RESPONSE

If successful, no response.