Connector Configuration Reference
Configure remote V2 connectors
remote-connectors
or admin
role.remote-connectors
role by default, you can create one. No API or UI permissions are required for the role.values.yaml
file, configure this section as needed:enabled
to true
to enable the backend ingress.
pathtype
to Prefix
or Exact
.
path
to the path where the backend will be available.
host
to the host where the backend will be available.
ingressClassName
to one of the following:
nginx
for Nginx Ingress Controlleralb
for AWS Application Load Balancer (ALB)logging.config
property is optional. If not set, logging messages are sent to the console.plain-text
to true
.connectors-backend
pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.max-grpc-retries
bridge parameters.job-expiration-duration-seconds
parameter. The default value is 120
seconds.connector-plugins
entry in your values.yaml
file:
k8s
directory of the Web V2 remote support repository.
To set up Selenium Grid:
NAMESPACE
with your namespace.
bin/conf/connector-config.yaml
file to configure the Kafka bridge settings, the proxy settings, and the plugin path. The following snippet shows an example configuration. Quotation marks are required around the password.
http://localhost:4444/ui
. Verify that the Selenium Hub is running and the Chrome nodes are connected.
The Lucidworks connector is available on port 8764. Run docker-compose logs lucidworks-connector
in a terminal to verify that the service is up.
To check the container status, run docker-compose ps
in a terminal and verify that all containers are up.
Press Ctrl-C
to stop the services when viewing logs in real-time.
To stop all services, run docker-compose down
in a terminal. If you want to remove all volumes when stopping all services, run docker-compose down -v
.
javascriptEvaluationConfig
under Properties
in the configuration specifications.
Configure Web Site Authentication
https://FUSION_HOST:FUSION_PORT/data/connectors/container/lucid.web/datasourceName
, where datasourceName
is the name of the datasource.
After you create a datasource, Fusion creates this directory for you. The file should be a JSON formatted file, ending with the .json
file extension.
When defining the datasource, use the name of the file in the Authentication credentials filename field in the UI (or for the f.credentialsFile
property if using the REST API).All authentication types require the credentials file to include a property called type
that defines the type of authentication to use. The other required properties vary depending on the type of authentication chosen.form
for the type. The other properties are:os_username
and os_password
, which are expected by the system we crawl.smartForm
for the type. The other properties are:os_username
and os_password
, which are expected by the system we crawl.
Additionally we expect that once that login has happened, a new form is presented to the user which then posts back to where we came from. No data need to be entered in this form, which is why we include an empty { }
in the params list.params
. If no user input is required, simply include an empty { }
.type
property use “basic” or “digest”. The other properties are:type
property, use ntlm
. The other properties available are:kerberuser.keytab
in our examples.login.conf
.krb5.ini
.kerberosPrincipalName
and kerberosKeytabFilePath
or kerberosKeytabBase64
when creating the Fusion datasource, Fusion uses the default login principal and ticket cache.
You can see the default values by logging into the Fusion server as the user who runs Fusion and running klist
.If you do not want to use the default account and credentials, specify these configuration properties when creating a keytab as well as in the Web datasource setup.
Use the Kerberos user principal name (UPN), not the service principal name (SPN, which is used with the Kerberos security realm).
In some cases the UPN can be a service.In our examples, the Fusion Web crawler authenticates to the Web sites using the user kerbuser@win.lab.lucidworks.com.
We create a keytab file kerbuser.keytab
for the user principal kerbuser@WIN.LAB.LUCIDWORKS.COM.krb5-user
package: sudo apt-get install krb5-user
Example:curl --version
and make sure SPNEGO is in the output.Run the following curl command (replace the keytab path and site):login.conf
and krb5.ini
files as follows.C:\\kerb\\kerbuser.keytab
/home/lucidworks/kb.keytab
/etc/krb5.conf
. You can optionally create a custom one instead.Creating a krb5.conf
is the same for Linux and Windows. On Windows the file is krb5.ini
.In this example the domain is WIN.LAB.LUCIDWORKS.COM, the Kerberos kdc host is my.kdc-dns.com
, and the Kerberos admin server is my-admin-server-dns.com
.Example:krb5.ini
file is described in the MIT Kerberos documentation.
You can change the encryption algorithms by changing the properties default_tkt_enctypes
, default_tgs_enctypes
, and permitted_enctypes
as needed. For example:login.conf
, and krb5.ini
files, configure Fusion to use Kerberos. You must set a property in a Fusion configuration file in addition to defining the datasource in the Fusion UI.At the command line on every machine in your Fusion cluster:$FUSION_HOME/conf/fusion.cors
(fusion.properties
in Fusion 4.x), add the following property to the connectors-classic.jvmOptions
setting: -Djavax.security.auth.useSubjectCredsOnly=false
connectors-classic
service using ./bin/connectors-classic restart
on Linux or bin\connectors-classic.cmd restart
on Windows.klist
./tmp/krb*
cache file got corrupted or is not compatible after you went through other troubleshooting steps.
To rule that out, remove the /tmp/krb*
cache file on all hosts, restart your connectors-classic, and try the crawl again. That is, on each host:connectors-classic jvmOptions
on all nodes:connectors-classic
after making that change.If that doesn’t work, make sure the user you are authenticating with from Curl matches the user you are trying to authenticate with from the Web connector.
To see your Kerberos principal user name, run klist
.Web V2 connector OAuth access token configuration
Client Secret Post
.testScope
.
https://auth.pingone.com/ENV_ID/as/token
grant_type
| client_credentials
client_id
| CLIENT_IDscope
| testScope
https://login.microsoftonline.com/TENANT_ID/oauth2/v2.0/token
grant_type
| client_credentials
client_id
| CLIENT_IDscope
| https://graph.microsoft.com/.default
\t
for the tab character. When entering configuration values in the API, use escaped characters, such as \\t
for the tab character.