Working with the crawl database
Some connectors use a crawl database to track documents found during prior crawls. This information is used to take appropriate action in the index for documents that are new, modified, or removed. The/api/connectors/configs/DATASOURCE_ID/db
endpoints allow dropping tables or clearing the database.
All V2 connectors support the crawl database as well as some V1 connectors, such as the Lucid.fs connector. The Lucid.anda connector also uses a crawl database, but it is not the same database as the Lucid.fs connector and does not have a REST API or other interface to access it.
Drop tables from a crawlDB
The output from a DELETE request to/api/connectors/configs/DATASOURCE_ID/db/<table>
will be empty. When dropping the database, note that no documents will be removed from the index. However, the crawl database will be empty, so on the next datasource run, all documents will be treated as though they were never seen by the connectors.
When dropping tables, be aware that the items
table does not delete documents from the index, but instead changes the database so database considers them new documents. When dropping other tables, such as the errors
table, it will merely clear out old error messages.
Delete items from a crawlDB
A DELETE request to/api/connectors/configs/DATASOURCE_ID/db
removes the information from the Crawl Database only. Note that this does not affect the Solr Index.
Examples
Get datasources assigned to the “demo” collection: REQUESTmax_docs
value for the above datasource:
REQUEST
When using the POST method with the
/configs
endpoint, the validateOnly=true
element only tests if the configuration is valid. It does not automatically save the datasource.