Latest version: v2.1.1 Compatible with Fusion version: 5.9.0 and later The Web V2 connector retrieves data from a Web site using HTTP and starting from a specified URL. Starting with Fusion 5.9.11, users of the Web V2 connector must upgrade to version 2.0.0 or later. Previous versions are incompatible due to changes introduced by the upgraded JDK in Fusion 5.9.11.
For Web V2 v2.1.0 and later, up to three Web V2 connectors can run simultaneously in a single cluster. This prevents reaching a max concurrency limit per Web V2 connector, which affects how much data can be sent to Selenium Grid at one time.
Fusion 5.6 and later uses the Open Graph Protocol as the default configuration for fields. Deviation from that standard configuration may exclude information from indexing during the crawl.
If crawls fail with a corrupted CrawlDB error, reinstall the connector.

Remote connectors

V2 connectors support running remotely in Fusion versions 5.7.1 and later.
Below is an example configuration showing how to specify the file system to index under the connector-plugins entry in your values.yaml file:
additionalVolumes:
- name: fusion-data1-pvc
    persistentVolumeClaim:
    claimName: fusion-data1-pvc
- name: fusion-data2-pvc
    persistentVolumeClaim:
    claimName: fusion-data2-pvc
additionalVolumeMounts:
- name: fusion-data1-pvc
    mountPath: "/connector/data1"
- name: fusion-data2-pvc
    mountPath: "/connector/data2"
You may also need to specify the user that is authorized to access the file system, as in this example:
securityContext:
    fsGroup: 1002100000
    runAsUser: 1002100000

Selenium Grid setup

If you are using Web V2 2.1.0 or later, you must also use Selenium Grid as part of your Web V2 connector setup. For hosted connectors, Selenium Grid support is available through Kubernetes. For remote connectors, Selenium Grid support is avialable through Docker Compose. See the Web V2 remote support repository for full setup instructions and YAML files. Before you set up Selenium Grid, install the connector standalone plugin file for the version of Fusion that you are using and the most recent version of the Web V2 connector. Verify that you have the correct files at the Lucidworks plugins site.
The Selenium services require an x86 architecture to run properly. Running the Selenium services on an ARM-based system such as Apple Silicon is not supported.

Set up Selenium Grid in Kubernetes

If you are using a hosted connector, use Kubernetes to set up Selenium Grid. These steps explain how to deploy the Selenium Hub component and the two Chrome browser nodes that connect to Selenium Hub. The referenced YAML files are available in the k8s directory of the Web V2 remote support repository. To set up Selenium Grid:
  1. In a terminal, apply the Kubernetes YAML configurations:
    kubectl apply -f deployment.yaml -n NAMESPACE
    kubectl apply -f chrome-deployment.yaml -n NAMESPACE
    kubectl apply -f service.yaml -n NAMESPACE
    
  2. Verify that the deployments are successful. Replace NAMESPACE with your namespace.
    kubectl get pods -n NAMESPACE
    kubectl get services -n NAMESPACE
    
  3. Adjust the network policy to allow port 4444. Enter the following command in a terminal:
    kubectl edit networkpolicy NAMESPACE-connector-plugin -n NAMESPACE
    
  4. Add the following snippet to the file:
    - ports:
      - port: 4444
        protocol: TCP
      - port: 4444
        protocol: UDP
    
  5. Save the file.

Set up Selenium Grid in Docker Compose

If you are using a remote connector, use Docker Compose to set up Selenium Grid. The referenced YAML files are available in the Web V2 remote support repository. Before setting up Selenium Grid in Docker Compose, you must know what version of JDK your Fusion connectors are using. If you are using Fusion 5.9.10 or earlier, you are using JDK 11. If you are using Fusion 5.9.11 or later, you are using JDK 17. To set up Selenium Grid in Docker Compose:
  1. Visit the the Web V2 remote support repository and select the folder corresponding to your JDK version. Download the contents of that folder.
  2. Edit the bin/conf/connector-config.yaml file to configure the Kafka bridge settings, the proxy settings, and the plugin path. The following snippet shows an example configuration. Quotation marks are required around the password.
    kafka-bridge:
      target: EXAMPLE_CONNECTORS_BACKEND.example.com:443
      # Uncomment proxy-server section if needed
    proxy:
      user: EXAMPLE_USERNAME
      password: "EXAMPLE_PASSWORD"
      url: https://FUSION_HOST:FUSION_PORT/
    plugin:
      path: /app/connector-plugin.zip
      type:
        suffix: remote-
    
  3. Save the configuration file.
Now you can start the Docker Compose environment, which uses standard Docker Compose commands. You can start the environment in background mode or with live logs. To start Docker Compose in background mode, navigate to the directory for your environment and enter the following command in a terminal:
docker-compose up -d
To start Docker Compose in live mode, navigate to the directory for your environment and enter the following command in a terminal:
docker-compose up
When you’ve started Docker Compose, verify that the services are running. You can access the Selenium Grid console at http://localhost:4444/ui. Verify that the Selenium Hub is running and the Chrome nodes are connected. The Lucidworks connector is available on port 8764. Run docker-compose logs lucidworks-connector in a terminal to verify that the service is up. To check the container status, run docker-compose ps in a terminal and verify that all containers are up. Press Ctrl-C to stop the services when viewing logs in real-time. To stop all services, run docker-compose down in a terminal. If you want to remove all volumes when stopping all services, run docker-compose down -v.

Enable Javascript Evaluation in Fusion

JavaScript evaluation allows the Web V2 connector to extract content from a website that is only available after JavaScript has rendered the document. It is available for Web V2 v2.1.0 and later on hosted and remote connectors. To enable JavaScript evaluation in the Web V2 connector:
  1. Navigate to your Web V2 datasource in Fusion.
  2. Select Javascript Evaluation Properties. A variety of settings displays. In this section you can customize your JavaScript evaluation settings.
  3. Select Evaluate JavaScript. This is required for using JavaScript evaluation.
  4. If you specified a SmartForms or SAML element in the Crawl Authentication Properties area, select Evaluate JavaScript during SmartForms/SAML Login.
  5. Headless browser mode is selected by default, which runs the browser performing the website crawl in the background without being visible. If your website renders pages on the server side, the Headless browser field must be unchecked for the crawl to work correctly and retrieve links. If your website renders pages on the client side, the Headless browser field should be checked.
  6. Click Apply.
For the full JavaScript evaluation settings, see javascriptEvaluationConfig under Properties in the configuration specifications.

Authentication resources

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.