- Latest version: v3.1.0
- Compatible with Fusion version: 4.2.0 and later
Remote connectors
V2 connectors support running remotely in Fusion versions 5.7.1 and later.Configure remote V2 connectors
Configure remote V2 connectors
If you need to index data from behind a firewall, you can configure a V2 connector to run remotely on-premises using TLS-enabled gRPC.The gRPC connector backend is not supported in Fusion environments deployed on AWS.The
Prerequisites
Before you can set up an on-prem V2 connector, you must configure the egress from your network to allow HTTP/2 communication into the Fusion cloud. You can use a forward proxy server to act as an intermediary between the connector and Fusion.The following is required to run V2 connectors remotely:- The plugin zip file and the connector-plugin-standalone JAR.
- A configured connector backend gRPC endpoint.
- Username and password of a user with a
remote-connectors
oradmin
role. - If the host where the remote connector is running is not configured to trust the server’s TLS certificate, you must configure the file path of the trust certificate collection.
If your version of Fusion doesn’t have the
remote-connectors
role by default, you can create one. No API or UI permissions are required for the role.Connector compatibility
Only V2 connectors are able to run remotely on-premises. You also need the remote connector client JAR file that matches your Fusion version. You can download the latest files at V2 Connectors Downloads.Whenever you upgrade Fusion, you must also update your remote connectors to match the new version of Fusion.
System requirements
The following is required for the on-prem host of the remote connector:- (Fusion 5.9.0-5.9.10) JVM version 11
- (Fusion 5.9.11) JVM version 17
- Minimum of 2 CPUs
- 4GB Memory
Enable backend ingress
In yourvalues.yaml
file, configure this section as needed:-
Set
enabled
totrue
to enable the backend ingress. -
Set
pathtype
toPrefix
orExact
. -
Set
path
to the path where the backend will be available. -
Set
host
to the host where the backend will be available. -
In Fusion 5.9.6 only, you can set
ingressClassName
to one of the following:nginx
for Nginx Ingress Controlleralb
for AWS Application Load Balancer (ALB)
-
Configure TLS and certificates according to your CA’s procedures and policies.
TLS must be enabled in order to use AWS ALB for ingress.
Connector configuration example
Minimal example
Logback XML configuration file example
Run the remote connector
logging.config
property is optional. If not set, logging messages are sent to the console.Test communication
You can run the connector in communication testing mode. This mode tests the communication with the backend without running the plugin, reports the result, and exits.Encryption
In a deployment, communication to the connector’s backend server is encrypted using TLS. You should only run this configuration without TLS in a testing scenario. To disable TLS, setplain-text
to true
.Egress and proxy server configuration
One of the methods you can use to allow outbound communication from behind a firewall is a proxy server. You can configure a proxy server to allow certain communication traffic while blocking unauthorized communication. If you use a proxy server at the site where the connector is running, you must configure the following properties:- Host. The hosts where the proxy server is running.
- Port. The port the proxy server is listening to for communication requests.
- Credentials. Optional proxy server user and password.
Password encryption
If you use a login name and password in your configuration, run the following utility to encrypt the password:- Enter a user name and password in the connector configuration YAML.
-
Run the standalone JAR with this property:
- Retrieve the encrypted passwords from the log that is created.
- Replace the clear password in the configuration YAML with the encrypted password.
Connector restart (5.7 and earlier)
The connector will shut down automatically whenever the connection to the server is disrupted, to prevent it from getting into a bad state. Communication disruption can happen, for example, when the server running in theconnectors-backend
pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.Recoverable bridge (5.8 and later)
If communication to the remote connector is disrupted, the connector will try to recover communication and gRPC calls. By default, six attempts will be made to recover each gRPC call. The number of attempts can be configured with themax-grpc-retries
bridge parameters.Job expiration duration (5.9.5 only)
The timeout value for irresponsive backend jobs can be configured with thejob-expiration-duration-seconds
parameter. The default value is 120
seconds.Use the remote connector
Once the connector is running, it is available in the Datasources dropdown. If the standalone connector terminates, it disappears from the list of available connectors. Once it is re-run, it is available again and configured connector instances will not get lost.Enable asynchronous parsing (5.9 and later)
To separate document crawling from document parsing, enable Tika Asynchronous Parsing on remote V2 connectors.connector-plugins
entry in your values.yaml
file:
Learn more
Configure OneDrive Authentication
Configure OneDrive Authentication
OneDrive is a file hosting service that is part of the Microsoft Office Online services.
The Fusion OneDrive connector crawls a OneDrive for Business instance and retrieves data from it for indexing within Fusion.To authenticate the Fusion OneDrive connector with a OneDrive application, first configure OneDrive with the correct permissions, then authenticate the connector.
Configure OneDrive for use with the Fusion connector
Create and register a Microsoft OneDrive App for use with the Fusion connector.- Navigate to the O365 login page.
- Log in using an O365 admin account, or create a new one.
-
Use an existing application or create a new one. To create a new application:
- From the My Applications page, click Add an app.
- Give your app a name and click Create.
- Take note of the Application ID. You will need this later.
- Click Generate new password and take note of the password.
- Under Platforms, click Add platform and then Web.
-
Fill in a redirect URL to your web site, ending in port 8090. For example:
http://localhost:8090
-
Add the following permissions to the application:
- Delegated permissions:
Files.Read.All
,Sites.Read.All
,User.Read
,Directory.Read.All (Admin Only)
,People.Read.All (Admin Only)
,User.Read.All (Admin Only)
- Application permissions:
Directory.Read.All (Admin Only)
,Files.Read.All (Admin Only)
,People.Read.All (Admin Only)
,Sites.Read.All (Admin Only)
,User.Read.All (Admin Only)
- Delegated permissions:
- Click Save. The application is now ready to be authorized by an O365 account administrator for use with the connector.
Authenticate the connector
To authenticate the Fusion connector:-
Open a web browser and enter the following URL:
- Replace
<ACCOUNT_NAME>
with the prefix name of your account. - Replace
<APPLICATION_ID>
in theclient_id
parameter with your Application ID from above. - Add your site URL for the uri parameter.
- Replace
- Optional: Enter ADFS issuer URI.
-
Access the URL. A list of permissions displays.
- Click Accept. You can now use this application to crawl your OneDrive for Business accounts.
Crawl a Subset of OneDrive
Crawl a Subset of OneDrive
OneDrive is a file hosting service that is part of the Microsoft Office Online services.
The Fusion OneDrive connector crawls a OneDrive for Business instance and retrieves data from it for indexing within Fusion.You can optionally configure the OneDrive connector to crawl only the drives of certain users that you specify.
- Install the OneDrive connector. In the Fusion UI, navigate to the OneDrive connector configuration at Indexing > Datasources > Add > OneDrive.
- In the
User principal name (UPN) filter
field, specify a list of users (user principal names, or UPNs) to retrieve documents from.- A user’s UPN is the one they use for logging into OneDrive.
- The
User principal name (UPN) filter
field is an array, so you can set multiple UPNs. - All UPNs are validated, by retrieving the user drives. If the request fails, it is logged in
$fusion_home/var/log/connectors/connectors-rpc/connectors-rpc.log
. - If validation fails for all UPNs set, then the crawl job does not start.
- If at least one UPN is valid, then the validation succeeds and the job starts. While the job is running, requests to invalid UPNs fail and are logged.
- If more than 10 UPNs are set, then the validation is skipped for performance reasons.