Set up JDBC drivers
Running JDBC SQL V2 in a normal (non-remote) deployment is supported in Fusion 5.4 and later.
For Fusion 5.3 and earlier, JDBC SQL V2 is only available as a remote connector.
Use blob store for JDBC drivers in Fusion 5.4 and later
When you add the JDBC drivers to the blob store, the connector pulls the drivers and adds the information to the Java classpath.Since the connector pulls all the drivers in blobStore, you must ensure there are no conflicting
.jar
files.Load the JDBC driver in Fusion 5.3 and earlier
Specify the.jar
file for the JDBC driver on the classpath of the runner. The following is an example of the Java command:
Supported JDBC drivers
This section contains a list of supported JDBC drivers that are compatible with any driver/database that implements a SQL standard.Authentication parameters may be provided as part of the connection string. It is not necessary to include a username and password in the datasource configuration.
MySQL
- Download driver
- Default driver class name:
com.mysql.cj.jdbc.Driver
- Connection URL specification
- Example:
jdbc:mysql://mysql:3306/testdb
- Example:
Postgresql
- Download driver
- Default driver class name:
org.postgresql.Driver
- Connection URL specification
- Example:
jdbc:postgresql://postgres:5432/testdb
- Example:
Microsoft SQL Server and Azure SQL Service
- Download driver
- Default driver class name:
com.microsoft.sqlserver.jdbc.SQLServerDriver
- Connection URL specification
- Example (SQL Server):
jdbc:sqlserver://mssql:1433
- Example (cloud-based Azure SQL Service):
jdbc:sqlserver://azure-test.database.windows.net:1433;database=testdbencrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;
- Example (SQL Server):
Oracle database
- Download driver
- Default driver class name:
oracle.jdbc.OracleDriver
- Connection URL specification
- Example:
jdbc:oracle:thin:@oracledb:1521:orc1
- Example:
IBM DB2
- Download driver
- Default driver class name:
com.ibm.db2.jcc.DB2Driver
- Connection URL specification
- Example:
jdbc:db2://db2:50000/testdb
- Example:
Crawl behavior
The JDBC V2 connector retrieves data based on the user-supplied SQL query. The SQL implementation allows full use of all features, but for best results, structure the SQL query to utilize pagination.Pagination
Automatic pagination
Automatic pagination is enabled by default. You can disable it by settingenableAutomaticPagination
to false
.
The batchSize
field sets the number of documents displayed per page of query results.
When a SQL query includes pagination terms such as OFFSET
, LIMIT
, or ROWS ONLY
, the SQL query terms override automatic pagination.
DB2 notes
When crawling an IBM DB2 database, the pagination method depends on your database version:- IBM DB2 version < 11.5 - Pagination is performed using subqueries.
- IBM DB2 version >= 11.5 - Pagination is performed using rows and offsets.
Appending--subquery
to the SQL query statement forces the connector to paginate using subqueries regardless of the IBM DB2 database version.
Special variables for pagination
The connector uses placeholders for pagination that are updated based onbatchSize
:
${limit}
: used as theLIMIT
paramater in native SQL${offset}
: used as theOFFSET
parameter in native SQL
- MySQL, postgresql, and DB2:
- Microsoft, Azure, and Oracle:
Nested queries
The connector supports nested queries. Nested queries are a set of SQL queries performed on each row returned by the main query. Because they are executed on every row returned, nested queries can significantly degrade performance. Use the${id}
variable to retrieve items associated with a specific primary key:
Incremental crawl
In addition to the main SQL query, users can also specify an optional delta SQL query that returns only new and modified items to increase performance during recrawls. The special${limit}
and ${offset}
pagination variables are the same as in normal crawls.
Special incremental crawl variable
Use the${last_index_time}
for an incremental crawl as a placeholder variable that contains the time the last crawl completed.
This variable is used to filter results for items added or modified since the last time the datasource job was run and is stored as a timestamp in the format of yyyy-MM-dd HH:mm:ss.SSS
The format may not be compatible with all driver date math implementations without additional conversion steps.
${last_index_time}
in single quotes. For example, '${last_index_time}'
.
Stray content deletion
Stray content deletion is a plugin feature that deletes documents from a content collection that are not rediscovered on subsequent crawls after the first crawl. This process removes stale documents from the content collection and is enabled by default. Stray content deletion must be configured by the user because native SQL does not provide the ability to retrieve rows that have been deleted from a table.Using a delta query to perform incremental crawling only returns new and modified items. If stray content deletion is enabled when the delta query is run, unmodified items that are still valid are deleted.
Processor flow diagram
The following diagram represents the flow for crawls. Incremental crawls follow the same basic flow as full crawls, except that they emit a page candidate from a delta SQL query if provided. If no delta query is provided, then incremental crawls use the original query.