Configure Remote V2 Connectors
- Prerequisites
- Connector compatibility
- System requirements
- Enable backend ingress
- Connector configuration example
- Run the remote connector
- Test communication
- Encryption
- Egress and proxy server configuration
- Password encryption
- Connector restart (5.7 and earlier)
- Recoverable bridge (5.8 and later)
- Job expiration duration (5.9.5 only)
- Use the remote connector
- Enable asynchronous parsing (5.9 and later)
If you need to index data from behind a firewall, you can configure a V2 connector to run remotely on-premises using TLS-enabled gRPC.
Prerequisites
Before you can set up an on-prem V2 connector, you must configure the egress from your network to allow HTTP/2 communication into the Fusion cloud. You can use a forward proxy server to act as an intermediary between the connector and Fusion.
The following is required to run V2 connectors remotely:
-
The plugin zip file and the connector-plugin-standalone JAR.
-
A configured connector backend gRPC endpoint.
-
Username and password of a user with a
remote-connectors
oradmin
role. This step is performed by Lucidworks. -
If the host where the remote connector is running is not configured to trust the server’s TLS certificate, Lucidworks must help configure the file path of the trust certificate collection.
If your version of Fusion doesn’t have the remote-connectors role by default, Lucidworks can create one. No API or UI permissions are required for the role.
|
Connector compatibility
Only V2 connectors are able to run remotely on-premises.
The gRPC connector backend is not supported in Fusion environments deployed on AWS.
System requirements
The following is required for the on-prem host of the remote connector:
-
JVM version 11 or later
-
Minimum of 2 CPUs
-
4GB Memory
Note that memory requirements depend on the number and size of ingested documents.
Enable backend ingress
Contact Lucidworks support to complete this step. |
-
Change
enabled
totrue
inrpc-service/values.yaml
. -
Configure the host name in
rpc-service/values.yaml
. -
Configure TLS and certificates according to their procedures and policies.
Connector configuration example
kafka-bridge:
target: mynamespace-connectors-backend.lucidworkstest.com:443 # mandatory
plain-text: false # optional, false by default.
proxy-server: # optional - needed when a forward proxy server is used to provide outbound access to the standalone connector
host: host
port: some-port
user: user # optional
password: password # optional
trust: # optional - needed when the client's system doesn't trust the server's certificate
cert-collection-filepath: path1
proxy: # mandatory fusion-proxy
user: admin
password: password123
url: https://fusiontest.com/ # needed only when the connector plugin requires blob store access
plugin: # mandatory
path: ./fs.zip
type: #optional - the suffix is added to the connector id
suffix: remote
Minimal example
kafka-bridge:
target: mynamespace-connectors-backend.lucidworkstest.com:443
proxy:
user: admin
password: "password123"
plugin:
path: ./testplugin.zip
Logback XML configuration file example
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
<pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
<charset>utf8</charset>
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${LOGDIR:-.}/connector.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!-- rollover daily -->
<fileNamePattern>${LOGDIR:-.}/connector-%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
<maxFileSize>50MB</maxFileSize>
<totalSizeCap>10GB</totalSizeCap>
</rollingPolicy>
<encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
<pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
<charset>utf8</charset>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="FILE"/>
</root>
</configuration>
Run the remote connector
java [-Dlogging.config=[LOGBACK_XML_FILE]] \
-jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
The logging.config
property is optional. If not set, logging messages are sent to the console.
Test communication
You can run the connector in communication testing mode. This mode tests the communication with the backend without running the plugin, reports the result, and exits.
java -Dstandalone.connector.connectivity.test=true -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
Encryption
In a deployment, communication to the connector’s backend server is encrypted using TLS. You should only run this configuration without TLS in a testing scenario. To disable TLS, set plain-text
to true
.
Egress and proxy server configuration
One of the methods you can use to allow outbound communication from behind a firewall is a proxy server. You can configure a proxy server to allow certain communication traffic while blocking unauthorized communication. If you use a proxy server at the site where the connector is running, you must configure the following properties:
-
Host. The hosts where the proxy server is running.
-
Port. The port the proxy server is listening to for communication requests.
-
Credentials. Optional proxy server user and password.
When you configure egress, it is important to disable any connection or activity timeouts because the connector uses long running gRPC calls.
Password encryption
If you use a login name and password in your configuration, run the following utility to encrypt the password:
-
Enter a user name and password in the connector configuration YAML.
-
Run the standalone JAR with this property:
-Dstandalone.connector.encrypt.password=true
-
Retrieve the encrypted passwords from the log that is created.
-
Replace the clear password in the configuration YAML with the encrypted password.
Connector restart (5.7 and earlier)
The connector will shut down automatically whenever the connection to the server is disrupted, to prevent it from getting into a bad state. Communication disruption can happen, for example, when the server running in the connectors-backend
pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.
You can use Linux scripts and utilities to restart the connector automatically, such as Monit.
Recoverable bridge (5.8 and later)
If communication to the remote connector is disrupted, the connector will try to recover communication and gRPC calls. By default, six attempts will be made to recover each gRPC call. The number of attempts can be configured with the max-grpc-retries
bridge parameters.
Job expiration duration (5.9.5 only)
The timeout value for irresponsive backend jobs can be configured with the job-expiration-duration-second
parameter. The default value is 120
seconds.
Use the remote connector
Once the connector is running, it is available in the Datasources dropdown. If the standalone connector terminates, it disappears from the list of available connectors. Once it is re-run, it is available again and configured connector instances will not get lost.
Enable asynchronous parsing (5.9 and later)
To separate document crawling from document parsing, enable Tika Asynchronous Parsing on remote V2 connectors.