Skip to main content
  • Latest version: v2.2.0
  • Compatible with Fusion version: 5.9.4 and later
To fetch content from multiple Box users, you must configure a Box.com datasource. For limited testing using a single user account, you can configure Box.com tokens. Connector flow

Prerequisites

Perform these prerequisites to ensure the connector can reliably access, crawl, and index your data. Proper setup helps avoid configuration or permission errors, so use the following guidelines to keep your content available for discovery and search in Fusion. Configure the Box app and authentication:
  • To crawl multiple users’ content, create a Box app configured for OAuth 2.0 with JWT.
  • For single users, you may use a Box app with standard OAuth 2.0 credentials.
Confirm network connectivity:
  • Fusion must be able to reach the Box APIs over HTTPS. If running the connector remotely, you also need to allow HTTP/2 (gRPC) egress from your network into your Fusion cluster.
Remote-connector setup if running as an on-prem process:
  • A Fusion user with the remote-connectors or admin role for gRPC authentication.
  • The connector-plugin-standalone.jar alongside the plugin ZIP on the remote host.
  • A configured connector backend gRPC endpoint (hostname:port) in your YAML.
  • If the remote host doesn’t trust Fusion’s TLS cert, point to a truststore file path in your config.
Configure Remote V2 Connectors provides complete instructions for remote connector setup.
If you need to index data from behind a firewall, you can configure a V2 connector to run remotely on-premises using TLS-enabled gRPC.

Prerequisites

Before you can set up an on-prem V2 connector, you must configure the egress from your network to allow HTTP/2 communication into the Fusion cloud. You can use a forward proxy server to act as an intermediary between the connector and Fusion.The following is required to run V2 connectors remotely:
  • The plugin zip file and the connector-plugin-standalone JAR.
  • A configured connector backend gRPC endpoint.
  • Username and password of a user with a remote-connectors or admin role.
  • If the host where the remote connector is running is not configured to trust the server’s TLS certificate, you must configure the file path of the trust certificate collection.
If your version of Fusion doesn’t have the remote-connectors role by default, you can create one. No API or UI permissions are required for the role.

Connector compatibility

Only V2 connectors are able to run remotely on-premises. You also need the remote connector client JAR file that matches your Fusion version. You can download the latest files at V2 Connectors Downloads.
Whenever you upgrade Fusion, you must also update your remote connectors to match the new version of Fusion.
The gRPC connector backend is not supported in Fusion environments deployed on AWS.

System requirements

The following is required for the on-prem host of the remote connector:
  • (Fusion 5.9.0-5.9.10) JVM version 11
  • (Fusion 5.9.11) JVM version 17
  • Minimum of 2 CPUs
  • 4GB Memory
Note that memory requirements depend on the number and size of ingested documents.

Enable backend ingress

In your values.yaml file, configure this section as needed:
ingress:
  enabled: false
  pathtype: "Prefix"
  path: "/"
  #host: "ingress.example.com"
  ingressClassName: "nginx"   # Fusion 5.9.6 only
  tls:
    enabled: false
    certificateArn: ""
    # Enable the annotations field to override the default annotations
    #annotations: ""
  • Set enabled to true to enable the backend ingress.
  • Set pathtype to Prefix or Exact.
  • Set path to the path where the backend will be available.
  • Set host to the host where the backend will be available.
  • In Fusion 5.9.6 only, you can set ingressClassName to one of the following:
    • nginx for Nginx Ingress Controller
    • alb for AWS Application Load Balancer (ALB)
  • Configure TLS and certificates according to your CA’s procedures and policies.
    TLS must be enabled in order to use AWS ALB for ingress.

Connector configuration example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443 # mandatory
  plain-text: false # optional, false by default.  
    proxy-server: # optional - needed when a forward proxy server is used to provide outbound access to the standalone connector
    host: host
    port: some-port
    user: user # optional
    password: password # optional
  trust: # optional - needed when the client's system doesn't trust the server's certificate
    cert-collection-filepath: path1

proxy: # mandatory fusion-proxy
  user: admin
  password: password123
  url: https://fusiontest.com/ # needed only when the connector plugin requires blob store access

plugin: # mandatory
  path: ./fs.zip
  type: #optional - the suffix is added to the connector id
    suffix: remote

Minimal example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443

proxy:
  user: admin
  password: "password123"

plugin:
  path: ./testplugin.zip

Logback XML configuration file example

<configuration>
    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
            <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
            <charset>utf8</charset>
        </encoder>
    </appender>

    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOGDIR:-.}/connector.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <!-- rollover daily -->
            <fileNamePattern>${LOGDIR:-.}/connector-%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
            <maxFileSize>50MB</maxFileSize>
            <totalSizeCap>10GB</totalSizeCap>
        </rollingPolicy>
        <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
            <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
            <charset>utf8</charset>
        </encoder>
    </appender>

    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="FILE"/>
    </root>
</configuration>

Run the remote connector

java [-Dlogging.config=[LOGBACK_XML_FILE]] \
  -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
The logging.config property is optional. If not set, logging messages are sent to the console.

Test communication

You can run the connector in communication testing mode. This mode tests the communication with the backend without running the plugin, reports the result, and exits.
java -Dstandalone.connector.connectivity.test=true -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]

Encryption

In a deployment, communication to the connector’s backend server is encrypted using TLS. You should only run this configuration without TLS in a testing scenario. To disable TLS, set plain-text to true.

Egress and proxy server configuration

One of the methods you can use to allow outbound communication from behind a firewall is a proxy server. You can configure a proxy server to allow certain communication traffic while blocking unauthorized communication. If you use a proxy server at the site where the connector is running, you must configure the following properties:
  • Host. The hosts where the proxy server is running.
  • Port. The port the proxy server is listening to for communication requests.
  • Credentials. Optional proxy server user and password.
When you configure egress, it is important to disable any connection or activity timeouts because the connector uses long running gRPC calls.

Password encryption

If you use a login name and password in your configuration, run the following utility to encrypt the password:
  1. Enter a user name and password in the connector configuration YAML.
  2. Run the standalone JAR with this property:
    -Dstandalone.connector.encrypt.password=true
    
  3. Retrieve the encrypted passwords from the log that is created.
  4. Replace the clear password in the configuration YAML with the encrypted password.

Connector restart (5.7 and earlier)

The connector will shut down automatically whenever the connection to the server is disrupted, to prevent it from getting into a bad state. Communication disruption can happen, for example, when the server running in the connectors-backend pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.

Recoverable bridge (5.8 and later)

If communication to the remote connector is disrupted, the connector will try to recover communication and gRPC calls. By default, six attempts will be made to recover each gRPC call. The number of attempts can be configured with the max-grpc-retries bridge parameters.

Job expiration duration (5.9.5 only)

The timeout value for irresponsive backend jobs can be configured with the job-expiration-duration-seconds parameter. The default value is 120 seconds.

Use the remote connector

Once the connector is running, it is available in the Datasources dropdown. If the standalone connector terminates, it disappears from the list of available connectors. Once it is re-run, it is available again and configured connector instances will not get lost.

Enable asynchronous parsing (5.9 and later)

To separate document crawling from document parsing, enable Tika Asynchronous Parsing on remote V2 connectors.

Authentication

Setting up the correct authentication according to your organization’s data governance policies helps keep sensitive data secure while allowing authorized indexing. The Box.com V2 connector supports two OAuth-based schemes:
  • OAuth 2.0 for single-user access
  • JWT service account for enterprise-wide crawling
After deciding which authentication type you require, read below to learn more about implementing it.

Standard OAuth 2.0 user authentication for single-user testing

For limited or single account crawls, you can create a Box app with the standard OAuth 2.0 “User Authentication” flow. See the Box documentation for additional guidance. In Fusion, you then add the:
  • API key using the Box client ID.
  • API secret using the Box client secret.
  • Refresh token as obtained through the OAuth consent flow.
Fusion will use these credentials to fetch and refresh an access token for the Box user.

JWT server authentication for enterprise-wide crawls

To fetch content from multiple Box users, you must register a Box app and enable JWT authentication. See the Box documentation for additional guidance. In Fusion, you then add the following:
  • App Entity ID using the Box app entity ID
  • Public Key ID using the Box public key ID
  • Private Key (Base64) using the Box private key in Base64
  • Private Key Password
  • Encryption Algorithm, such as RSA_SHA_256
  • Account Type as either USER or ENTERPRISE
Fusion will use the JWT to obtain “As-User” tokens, allowing it to crawl data while respecting each user’s permissions and access rights.
The Box connector retrieves data from a Box.com cloud-based data repository.This topic applies to the Box.com V1 and Box.com V2 connectors. The Box.com V1 connector is available in Fusion 5.2 and earlier. The Box.com V2 connector is available in Fusion 5.3 and later.

Configuration overview

These steps are for a multi-user Box.com data repository. For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.
Following is an overview of the steps required to set up Box and Fusion, and to crawl a Box data repository.
  1. Sign up for a Box developer account.
  2. Enable 2-step verification.
  3. Create a Box app that Fusion can use to crawl the Box files.
  4. Configure your app to use a Box service account.
  5. Install Fusion’s Box Connector.
  6. Create datasources in Fusion that use the Box connector.
  7. Crawl the Fusion datasources.

Set Up Box

Set up Box so that Fusion can crawl Box data repositories.

Step 1: Sign Up for a Box Developer Account

If you already have an account, proceed to Step 2: Enable 2-Step Verification.
  1. Open the Box Developers Console.
  2. In the top right corner, click Sign Up.
  3. Select an appropriate Platform Developer plan.
  4. Enter the requested information and click Submit.
  5. Open the confirmation email and click Verify Email.
  6. Log in to your Box account.

Step 2: Enable 2-Step Verification

  1. Log in to your Box developer account.
    1. Open the Box Developers Console.
    2. Log in as the admin.
  2. Create the Box account that you want to use for crawling.
    1. Open the Users page in the Box admin console.
    2. Click +Users to create a new user account.
    3. Enter the Name and Email for the user, and then click Add user.
    4. Click the user you just created to enter its user settings.
    5. Make this user a Co-Admin by selecting Co-Admin checkbox. Once clicked, a pane titled “User is granted the following administrative privileges” appears. Select all of the following:
      • Manage users
      • Manage groups
      • View users’ content
      • Log in to users’ accounts
      • Run new reports and access existing reports accept screen for Box connector permissions
    6. Click Save.
    7. Close the Admin Console browser tab.
  3. Enable 2-step verification for unrecognized logins:
    1. Open the Account Settings page. (You can reach this page from the drop-down menu under your initials.)
    2. On the Account tab, under Authentication, select Require 2-step verification for unrecognized logins.
    3. Choose your Country and enter a Mobile Phone Number, and then click Continue.
    4. Enter the verification code you receive, and then click Continue.
    5. If you are using a new mobile device, Box will send you a second code. Enter it, and then click Submit.
    6. Click Save Changes.

Step 3: Create a Box App that Fusion Can Use to Crawl the Box files

Create a Box app that uses OAuth 2.0 with JWT server authentication.If you already have an app, proceed to Step 4: Configure Your App to Use a Box Service Account.
  1. Open the Box Developers Console.
  2. Click Create New App.
  3. Select Custom App, and then click Next.
  4. Click OAuth 2.0 with JWT (Server Authentication), and then click Next.
  5. Name your app, and then click Create App. The name must be globally unique across all apps created by all Box users.
  6. Click View Your App.

Step 4: Configure Your App to Use a Box Service Account

  1. Use OpenSSL to create a private/public key pair:
    1. Install OpenSSL if you need to. Windows instructions are here.
    2. Open a Command Prompt window and run these commands to generate a private/public key pair:
      openssl genrsa -aes128 -out private_key.pem 2048
      
      Enter a password for the private key when prompted.
      openssl rsa -pubout -in private_key.pem -out public_key.pem
      
      In the current directory of the Command Prompt, you now have private and public key files, private_key.pem and public_key.pem respectively.
  2. Open the Box Developers Console, log in as Admin if you are asked to log in, and click your app.
  3. In the left navigation menu, click Configuration.
  4. Configure scopes and advanced features:
    1. Under Application Access, select Enterprise.
    2. Under Application Scopes, deselect Manage groups.
    3. Under Advanced Features, enable Generate User Access Tokens and Perform Actions as Users.
    4. Click Save Changes.
  5. In the Add and Manage Public Keys area, click Add a Public Key and paste the contents of the public_key.pem file (generated from Step 4: Configure Your App to Use a Box Service Account) into the text box.
    1. Make a note of the new Public Key ID that you just created.
  6. Under OAuth 2.0 Credentials, click COPY for the Client ID.
  7. Authorize your app:
    1. Open the Box Admin Console.
    2. In the left navigation menu, click Settings > Enterprise Settings (or Business Settings) > Apps.
    3. Under Custom Applications, click Authorize New App.
    4. In the API Key box, paste the Client ID credential you copied in Step 6, and then click Next.
    5. Read the App Authorization dialog and click Authorize.
    6. Close the Admin Console browser tab.
    If you change your app’s configuration later, you must repeat this step to re-authorize your app.
  8. Close the Dev Console browser tab.

Set Up Fusion

Set up Fusion to crawl Box data repositories.

Step 5: Install Fusion’s Box Connector

  1. Navigate to the Fusion 4.x Connector Downloads
  2. Select the Box connector link for the release to download. The .zip file is downloaded.
  3. Do not unzip the file.
  4. Open the Fusion UI and click System > Blobs.
  5. Click Add.
  6. Select Connector Plugin.
  7. Click Choose File, select the file, and then click Open.
  8. Click Upload.

Step 6: Create Datasources

Create datasources that use the Box connector to access the Box data repository.For each datasource:
  1. In the Fusion UI, Navigate to Indexing > Datasources.
  2. Click Add.
  3. Select Box.com.
  4. Fill in the form. Note the following regarding configuration settings to use:
    SettingNotes
    Start LinksEach start link defined for the datasource must consist of a numeric Box file ID or directory ID. The root directory of any Box account has ID 0 (zero). To crawl your entire Box repository, enter ‘0’. These images indicate with underlines where you can get a folder ID or file ID. Select a folder or file at Box.com. Folder ID: Folder ID Enter the start link 34192617287. File ID: File ID Enter the start link 204871656422:
    API KeyIn the Box Developers Console, select the app. On the Configuration tab under OAuth 2.0 Credentials, use the Client ID.
    API SecretIn the Box Developers Console, select the app. On the Configuration tab under OAuth 2.0 Credentials, use the Client Secret.
    JWT App User IDEmail address that you use to sign in to your Box co-admin account. Use the Co-admin account you created in Step 4.
    JWT Public Key IdIn the Box Developers Console, select the app. On the Configuration tab, under Add and Manage Public Keys, use the ID for a public key.
    JWT Private KeyBase64-encoded contents of the private-key file that matches your JWT Public Key Id. Base64 encode the entire contents of the file, including the first and last lines. (Fusion 5.0+ only.)
    JWT Private Key FileFull path to the private-key file you created that matches your JWT Public Key Id. (Prior to Fusion 5.0 only.)
    JWT Private Key PasswordPassphrase for the private key (from the private-key file you created in Step 4).
    Distributed crawl collection nameCollection that contains the pre-fetch index.
    Box.com children responses per pageUse the default value of 1000.
    Nested folder depth limitGenerally, you want a number that will crawl all documents, so keep the default value. For testing, you could reduce the number substantially to speed up the crawl.
    Number of partition bucketsDivide the number of files by 5000. Use that number or 10000, whichever is smaller.
    Number distributed crawl datasourcesUse 1 to 27.
    Number of pre-fetch index creator threadsA number between 2 and 5. Use 2 for small datasources and 5 for huge datasources (over 10 million files).
  5. Click Save.

Crawl a Box Data Repository

Crawl a Box data repository.

Step 7: Crawl the Fusion Datasources

Crawl the datasources, which use Fusion’s Box connector to access the Box data repository. Fusion’s Box connector uses the pre-fetch index to fetch the contents of each file from Box.com, get metadata from both the distributed index and Box.com, and index the documents through Fusion’s index pipeline.You can:
  • Run the crawl now.
    1. From the Fusion launcher, click Search > Home > Datasources.
    2. Click the datasource.
    3. Click Start Crawl.
  • Schedule the crawl (For Fusion 4.2 through 5.5):
    1. From the Fusion launcher, click Devops > Home > Scheduler.
    2. Click the row for the job that corresponds to the datasource.
    3. Specify schedule information, and then click Save.
This topic explains how to configure Box.com authorization, access, and refresh tokens. The information applies to the Box.com V1 and Box.com V2 connectors. The Box.com V1 connector is available in Fusion 5.2 and earlier. The Box.com V2 connector is available in Fusion 5.3 and later.Fusion supports two methods of authentication with the Box API:
  • JSON Web Token (JWT)
  • OAuth2

Box App Users Using JWT

Box.com has rather recently released a Box Developer Edition. The Box Developer Edition lets a user access an application without having to create their own Box account.App Auth uses the JSON Web Token (JWT) authentication architecture to establish a trusted connection with Box, allowing an application to provision and manage a Box account while minimizing the number of logins for a user and authentication services to manage.For this option, Fusion needs the inputs below to crawl your Box data.Required options are highlighted.
UI Label,
API Name
Description
JWT App User ID
f.fs.appUserId
The Developer Edition API App User ID that you want to crawl as.
JWT Public Key ID
f.fs.publicKeyId
The public key prefix registered in Box Auth that you want to use to authenticate with.
JWT Private Key
f.fs.privateKeyBase64
Base64-encoded JWT private key for the app user you want to authenticate as. (Fusion 5.0+ only.)
JWT Private Key File Path
f.fs.privateKeyFile
Path to the JWT private key file for the app user you want to authenticate as. (Prior to Fusion 5.0 only.)
JWT Private Key File Password
f.fs.privateKeyPassword
The password that secures the public key.
The biggest advantage to using the JWT App Auth Users approach is that you do not have to generate new refresh tokens. The public/private key file combination remain valid indefinitely.

Authentication Using OAuth 2.0

For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.
  1. Log in to your Box developer account as the Admin.
    1. Open the Box Developers web portal.
    2. In the top right corner, click Log In.
  2. Open the page for creating a new app and click Create New App.
  3. Click Custom App, and then click Next.
  4. Click Standard OAuth 2.0 (User Authentication), and then click Next.
  5. Name your app, and then click Create App. The name must be globally unique across all apps created by all Box users.
  6. Click View Your App.
  7. On the Configuration page:
    1. Click the Authentication Method Standard OAuth 2.0 (User Authentication).
    2. Set the Redirect URI to http://localhost or http://0.0.0.0. This address is not used by Fusion, but cannot be left blank.
    3. Click Save Changes.
ImportantThe v2.2.0 version of this connector is only compatible with Fusion 5.9.4 and later when using security trimming. The v2.2.0 connector version uses Graph Security Trimming and not regular security trimming. It is imperative to treat this as a new connector, as configurations do not transfer over due to disparities between newer versions and previous ones. A full crawl is mandatory.

Remote connectors

V2 connectors support running remotely in Fusion versions 5.7.1 and later.
If you need to index data from behind a firewall, you can configure a V2 connector to run remotely on-premises using TLS-enabled gRPC.

Prerequisites

Before you can set up an on-prem V2 connector, you must configure the egress from your network to allow HTTP/2 communication into the Fusion cloud. You can use a forward proxy server to act as an intermediary between the connector and Fusion.The following is required to run V2 connectors remotely:
  • The plugin zip file and the connector-plugin-standalone JAR.
  • A configured connector backend gRPC endpoint.
  • Username and password of a user with a remote-connectors or admin role.
  • If the host where the remote connector is running is not configured to trust the server’s TLS certificate, you must configure the file path of the trust certificate collection.
If your version of Fusion doesn’t have the remote-connectors role by default, you can create one. No API or UI permissions are required for the role.

Connector compatibility

Only V2 connectors are able to run remotely on-premises. You also need the remote connector client JAR file that matches your Fusion version. You can download the latest files at V2 Connectors Downloads.
Whenever you upgrade Fusion, you must also update your remote connectors to match the new version of Fusion.
The gRPC connector backend is not supported in Fusion environments deployed on AWS.

System requirements

The following is required for the on-prem host of the remote connector:
  • (Fusion 5.9.0-5.9.10) JVM version 11
  • (Fusion 5.9.11) JVM version 17
  • Minimum of 2 CPUs
  • 4GB Memory
Note that memory requirements depend on the number and size of ingested documents.

Enable backend ingress

In your values.yaml file, configure this section as needed:
ingress:
  enabled: false
  pathtype: "Prefix"
  path: "/"
  #host: "ingress.example.com"
  ingressClassName: "nginx"   # Fusion 5.9.6 only
  tls:
    enabled: false
    certificateArn: ""
    # Enable the annotations field to override the default annotations
    #annotations: ""
  • Set enabled to true to enable the backend ingress.
  • Set pathtype to Prefix or Exact.
  • Set path to the path where the backend will be available.
  • Set host to the host where the backend will be available.
  • In Fusion 5.9.6 only, you can set ingressClassName to one of the following:
    • nginx for Nginx Ingress Controller
    • alb for AWS Application Load Balancer (ALB)
  • Configure TLS and certificates according to your CA’s procedures and policies.
    TLS must be enabled in order to use AWS ALB for ingress.

Connector configuration example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443 # mandatory
  plain-text: false # optional, false by default.  
    proxy-server: # optional - needed when a forward proxy server is used to provide outbound access to the standalone connector
    host: host
    port: some-port
    user: user # optional
    password: password # optional
  trust: # optional - needed when the client's system doesn't trust the server's certificate
    cert-collection-filepath: path1

proxy: # mandatory fusion-proxy
  user: admin
  password: password123
  url: https://fusiontest.com/ # needed only when the connector plugin requires blob store access

plugin: # mandatory
  path: ./fs.zip
  type: #optional - the suffix is added to the connector id
    suffix: remote

Minimal example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443

proxy:
  user: admin
  password: "password123"

plugin:
  path: ./testplugin.zip

Logback XML configuration file example

<configuration>
    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
            <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
            <charset>utf8</charset>
        </encoder>
    </appender>

    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOGDIR:-.}/connector.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <!-- rollover daily -->
            <fileNamePattern>${LOGDIR:-.}/connector-%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
            <maxFileSize>50MB</maxFileSize>
            <totalSizeCap>10GB</totalSizeCap>
        </rollingPolicy>
        <encoder class="com.lucidworks.logging.logback.classic.LucidworksPatternLayoutEncoder">
            <pattern>%d - %-5p [%t:%C{3.}@%L] - %m{nolookups}%n</pattern>
            <charset>utf8</charset>
        </encoder>
    </appender>

    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="FILE"/>
    </root>
</configuration>

Run the remote connector

java [-Dlogging.config=[LOGBACK_XML_FILE]] \
  -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
The logging.config property is optional. If not set, logging messages are sent to the console.

Test communication

You can run the connector in communication testing mode. This mode tests the communication with the backend without running the plugin, reports the result, and exits.
java -Dstandalone.connector.connectivity.test=true -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]

Encryption

In a deployment, communication to the connector’s backend server is encrypted using TLS. You should only run this configuration without TLS in a testing scenario. To disable TLS, set plain-text to true.

Egress and proxy server configuration

One of the methods you can use to allow outbound communication from behind a firewall is a proxy server. You can configure a proxy server to allow certain communication traffic while blocking unauthorized communication. If you use a proxy server at the site where the connector is running, you must configure the following properties:
  • Host. The hosts where the proxy server is running.
  • Port. The port the proxy server is listening to for communication requests.
  • Credentials. Optional proxy server user and password.
When you configure egress, it is important to disable any connection or activity timeouts because the connector uses long running gRPC calls.

Password encryption

If you use a login name and password in your configuration, run the following utility to encrypt the password:
  1. Enter a user name and password in the connector configuration YAML.
  2. Run the standalone JAR with this property:
    -Dstandalone.connector.encrypt.password=true
    
  3. Retrieve the encrypted passwords from the log that is created.
  4. Replace the clear password in the configuration YAML with the encrypted password.

Connector restart (5.7 and earlier)

The connector will shut down automatically whenever the connection to the server is disrupted, to prevent it from getting into a bad state. Communication disruption can happen, for example, when the server running in the connectors-backend pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.

Recoverable bridge (5.8 and later)

If communication to the remote connector is disrupted, the connector will try to recover communication and gRPC calls. By default, six attempts will be made to recover each gRPC call. The number of attempts can be configured with the max-grpc-retries bridge parameters.

Job expiration duration (5.9.5 only)

The timeout value for irresponsive backend jobs can be configured with the job-expiration-duration-seconds parameter. The default value is 120 seconds.

Use the remote connector

Once the connector is running, it is available in the Datasources dropdown. If the standalone connector terminates, it disappears from the list of available connectors. Once it is re-run, it is available again and configured connector instances will not get lost.

Enable asynchronous parsing (5.9 and later)

To separate document crawling from document parsing, enable Tika Asynchronous Parsing on remote V2 connectors.
Below is an example configuration showing how to specify the file system to index under the connector-plugins entry in your values.yaml file:
additionalVolumes:
- name: fusion-data1-pvc
    persistentVolumeClaim:
    claimName: fusion-data1-pvc
- name: fusion-data2-pvc
    persistentVolumeClaim:
    claimName: fusion-data2-pvc
additionalVolumeMounts:
- name: fusion-data1-pvc
    mountPath: "/connector/data1"
- name: fusion-data2-pvc
    mountPath: "/connector/data2"
You may also need to specify the user that is authorized to access the file system, as in this example:
securityContext:
    fsGroup: 1002100000
    runAsUser: 1002100000

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
I