The JDBC connector fetches documents from a relational database via SQL queries. Under the hood, this connector implements the Solr DataImportHandler (DIH) plugin.
The JDBC connector in Fusion does not automatically discover and index binary data you may have stored in your database (such as PDF files). However, you can configure Fusion to recognize and extract binary data correctly by modifying the datasource configuration file. This file is created when the datasource is first run, and then it is created in
VAR-FUSIONPATH/data/connectors/lucid.jdbc/datasources/ datasourceID/conf. The name of the file will include the name of the datasource, as in
dataconfig_datasourceName.xml. If you are familiar with Solr’s DIH, you will recognize this as a standard
Follow these steps to modify the configuration file:
nameattribute for the database containing your binary data to the
convertTypeattribute for the
false. This prevents Fusion from treating binary data as strings.
FieldStreamDataSourceto stream the binary data to the Tika entity processor.
dataSourcename in the
Add an entity for your
TikaEntityProcessorto take the binary data from the
FieldStreamDataSource, parse it, and specify a field for storing the processed data.
Reload the Solr core to apply your configuration changes.