Skip to main content
Released on August 27, 2024, this maintenance release includes the new Neural Hybrid Search capability, as well as upgrades to Solr, Kubernetes, Zookeeper, and some bug fixes. To learn more, skip to the release notes.

Platform Support and Component Versions

Kubernetes platform support

Lucidworks has tested and validated support for the following Kubernetes platform and versions:
  • Google Kubernetes Engine (GKE): 1.28, 1.29, 1.30
For more information on Kubernetes version support, see the Kubernetes support policy.

Component versions

The following table details the versions of key components that may be critical to deployments and upgrades.
ComponentVersion
Solrfusion-solr 5.9.5 (based on Solr 9.6.1)
ZooKeeper3.9.1
Spark3.2.2
Ingress ControllersNginx, Ambassador (Envoy), GKE Ingress Controller Istio not supported.
More information about support dates can be found at Lucidworks Fusion Product Lifecycle.

New Features

Managed Fusion 5.9.5 introduces Neural Hybrid Search, a capability that combines lexical and semantic vector search. This feature includes:
  • A new index pipeline to vectorize fields with Lucidworks AI. See Configure the LWAI Vectorize pipeline.
  • A new query pipeline to set up Neural Hybrid Search with Lucidworks AI. See Configure the LWAI Neural Hybrid Search pipeline.
  • Query and index stages for vectorizing text using Lucidworks AI. See LWAI Vectorize Query stage and LWAI Vectorize Field stage.
  • Query and index stages for vectorizing text with Seldon. See Seldon Vectorize Query stage and Seldon Vectorize Field stage.
  • A new query stage for hybrid search that works with Lucidworks AI or Seldon. See Hybrid Query stage.
  • A new service, lwai-gateway, provides a secure, authenticated connection between Managed Fusion and your Lucidworks AI-hosted models.
    See Lucidworks AI Gateway for details.
  • Solr config changes to support dense vector dynamic fields.
  • A custom Solr plugin containing a new vectorSimilarity QParser that will not be available in Apache Solr until 9.7.
LucidAcademyLucidworks offers free training to help you get started.The Course for Neural Hybrid Search focuses on how neural hybrid search combines lexical and semantic search to improve the relevance and accuracy of results:
Neural Hybrid SearchPlay Button
Visit the LucidAcademy to see the full training catalog.

Configure use case for embedding

In the LWAI Vectorize Field stage, you can specify the use case for your embedding model. To learn how to configure your embedding use case, see the following demonstration:

Fine tune lexical and semantic settings

The Hybrid Query stage is highly customizable. You can lower the Min Return Vector Similarity threshold for vector results to include more semantic results. For example, a lower threshold would return “From Dusk Till Dawn” when querying night against a movie dataset. A higher threshold prioritizes high scoring results and in this case only returns movie names with night in the title. To learn how to configure the Hybrid Query stage, see the following demonstration:

Vector dimension size

There is no limitation on vector dimension sizes. If you’re setting up vector search and Neural Hybrid Search with an embedding model with large dimensions, simply configure your managed-schema to support the appropriate dimension. See Configure Neural Hybrid Search.

Improvements

  • Managed Fusion now supports Kubernetes 1.30 for GKE. Refer to Kubernetes documentation at Kubernetes v1.30 for more information.
  • Solr has been upgraded to 9.6.1.
  • Zookeeper has been upgraded to 3.9.1.
  • The default value for kafka.logRetentionBytes is increased to 5 GB. This improvement helps prevent failed datasource jobs due to full disk space. Refer to Troubleshoot failed datasource jobs.
When indexing large files, or large quantities of files, you may encounter issues such as datasource jobs failing or documents not making it into Fusion.

Overview

When data flows into Fusion, it passes through a Kafka topic first. When the number of documents being created by a connector is large, or when the connector is pulling data into the Kafka topic faster than it can be indexed, the topic fills up and the datasource job fails. For example, if your connector is inputting a large CSV file where every row is imported as a separate Solr document, the indexing processing can time out before the document is fully ingested.

Identify the cause

If you experience failed datasource jobs or notice your connector isn’t grabbing all the documents it should, check the logs for the Kafka pod. Look for a message containing the phrases resetting offset and is out of range, which indicate data has been dropped.
2024-05-28T11:49:40.812Z - INFO  [pool-140-thread-3:org.apache.kafka.clients.consumer.internals.Fetcher@1413] - [Consumer clientId=example_Products-irdcsn, groupId=index-pipeline--example_Products--fusion.connectors.datasource-products_S3_Load] Fetch position FetchPosition{offset=6963199, offsetEpoch=Optional[0], currentLeader=LeaderAndEpoch{leader=Optional[fusion5-kafka-0.fusion5-kafka-headless.fusion5.svc.cluster.local:9092 (id: 0 rack: null)], epoch=0}} is out of range for partition fusion.connectors.datasource-products_S3_Load-2, resetting offset

Adjust indexing settings

If you determine that your datasource job is failing due to an issue in Kafka, there are a few options to try.

Adjust retention parameters

One solution is to increase the Kafka data retention parameters to allow for larger documents. You can configure these settings in your values.yaml file in the Helm chart.
  1. The default value for kafka.logRetentionBytes is 1073741824 bytes (1 GB). Try increasing this value to 2147483648 bytes (2 GB) or 3221225472 (3 GB), or larger depending on the size of your documents.
    In Fusion 5.9.5, the default value is increased to 5 GB.
    You can also set this to -1 to remove the size limit. If you do this, be sure to set an appropriate limit for logRetentionHours instead.
  2. The default value for kafka.logRetentionHours is 168 (7 days). If you increase kafka.logRetentionBytes by a significant amount (for example, 20 GB), you might need to decrease this setting to prevent running out of disk space. However, because older log entries are deleted when either limit is reached, you should set it high enough to ensure the data remains available until it’s no longer needed.
  3. In Fusion, go to Indexing > Datasources and create a new datasource to trigger a new Kafka topic that incorporates these settings.

Adjust fetch settings

Another option is to decrease the values for number of fetch threads and request page size in your datasource settings.
  1. In Fusion, go to Indexing > Datasources and click your datasource.
  2. Click the Advanced slider to show more settings.
  3. Reduce the number of Fetch Threads. Fetch settings
  4. Reduce the Request Page Size. Request page size
    This setting might not be available in every connector.
  • There is a new AI category in the Add a new pipeline stage dropdown for Query and Index Pipelines. This category contains the new stages for Neural Hybrid Search, as well as existing machine learning and AI stages. AI subgroup
  • The Managed Fusion migration script is updated to align with changes from the Solr upgrade. The migration script:
    • Removes the unused configuration, <circuitBreaker>, from solrconfig.xml. Solr no longer supports this configuration.
    • Removes the query response writer of class solr.XSLTResponseWriter.
    • Comments out processors of type solr.StatelessScriptUpdateProcessorFactory.
    • Removes <bool name="preferLocalShards"/> element from request handler.
    • Changes cache class attribute of elements "filterCache", "cache", "documentCache", "queryResultCache" to solr.search.CaffeineCache.
    • Removes keepShortTerm attribute from filter of class solr.NGramFilterFactory.
  • Added the parameter job-expiration-duration-seconds for remote connectors that lets you configure the timeout value. Refer to Configure Remote V2 Connectors.
If you need to index data from behind a firewall, you can configure a V2 connector to run remotely on-premises using TLS-enabled gRPC.
Remote V2 Connectors are not available by default. Contact your Lucidworks representative for more information about enabling them in your Managed Fusion deployment.

Prerequisites

Before you can set up an on-prem V2 connector, you must configure the egress from your network to allow HTTP/2 communication into the Fusion cloud. You can use a forward proxy server to act as an intermediary between the connector and Fusion.The following is required to run V2 connectors remotely:
  • The plugin zip file and the connector-plugin-standalone JAR.
  • A configured connector backend gRPC endpoint.
  • Username and password of a user with a remote-connectors or admin role. This step is performed by Lucidworks.
  • If the host where the remote connector is running is not configured to trust the server’s TLS certificate, Lucidworks must help configure the file path of the trust certificate collection.
If your version of Fusion doesn’t have the remote-connectors role by default, Lucidworks can create one. No API or UI permissions are required for the role.

Connector compatibility

Only V2 connectors are able to run remotely on-premises.The gRPC connector backend is not supported in Fusion environments deployed on AWS.

System requirements

The following is required for the on-prem host of the remote connector:
  • (Managed Fusion 5.9.0-5.9.10) JVM version 11
  • (Managed Fusion 5.9.11) JVM version 17
  • Minimum of 2 CPUs
  • 4GB Memory
Note that memory requirements depend on the number and size of ingested documents.

Enable backend ingress

NOTE: Contact Lucidworks support to complete this step.In your rpc-service/values.yaml file, configure this section as needed:
ingress:
  enabled: false
  pathtype: "Prefix"
  path: "/"
  #host: "ingress.example.com"
  ingressClassName: "nginx"   # Fusion 5.9.6 only
  tls:
    enabled: false
    certificateArn: ""
    # Enable the annotations field to override the default annotations
    #annotations: ""
  • Set enabled to true to enable the backend ingress.
  • Set pathtype to Prefix or Exact.
  • Set path to the path where the backend will be available.
  • Set host to the host where the backend will be available.
  • In Fusion 5.9.6 only, you can set ingressClassName to one of the following:
    • nginx for Nginx Ingress Controller
    • alb for AWS Application Load Balancer (ALB)
  • Configure TLS and certificates according to your CA’s procedures and policies.
    TLS must be enabled in order to use AWS ALB for ingress.

Connector configuration example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443 # mandatory
  plain-text: false # optional, false by default.  
    proxy-server: # optional - needed when a forward proxy server is used to provide outbound access to the standalone connector
    host: host
    port: some-port
    user: user # optional
    password: password # optional
  trust: # optional - needed when the client's system doesn't trust the server's certificate
    cert-collection-filepath: path1

proxy: # mandatory fusion-proxy
  user: admin
  password: password123
  url: https://fusiontest.com/ # needed only when the connector plugin requires blob store access

plugin: # mandatory
  path: ./fs.zip
  type: #optional - the suffix is added to the connector id
    suffix: remote

Minimal example

kafka-bridge:
  target: mynamespace-connectors-backend.lucidworkstest.com:443

proxy:
  user: admin
  password: "password123"

plugin:
  path: ./testplugin.zip

Logback XML configuration file example

<configuration>
    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{50} [%file:%line] {%mdc} %msg%n</pattern>
        </encoder>
    </appender>
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>./connector.log</file>
        <append>true</append>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <!-- rollover daily -->
            <fileNamePattern>./connector-%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
            <maxFileSize>50MB</maxFileSize>
            <totalSizeCap>10GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{50} [%file:%line] {%mdc} %msg%n</pattern>
        </encoder>
    </appender>
    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="FILE"/>
    </root>
</configuration>

Run the remote connector

java [-Dlogging.config=[LOGBACK_XML_FILE]] \
  -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]
The logging.config property is optional. If not set, logging messages are sent to the console.

Test communication

You can run the connector in communication testing mode. This mode tests the communication with the backend without running the plugin, reports the result, and exits.
java -Dstandalone.connector.connectivity.test=true -jar connector-plugin-client-standalone.jar [YAML_CONFIG_FILE]

Encryption

In a deployment, communication to the connector’s backend server is encrypted using TLS. You should only run this configuration without TLS in a testing scenario. To disable TLS, set plain-text to true.

Egress and proxy server configuration

One of the methods you can use to allow outbound communication from behind a firewall is a proxy server. You can configure a proxy server to allow certain communication traffic while blocking unauthorized communication. If you use a proxy server at the site where the connector is running, you must configure the following properties:
  • Host. The hosts where the proxy server is running.
  • Port. The port the proxy server is listening to for communication requests.
  • Credentials. Optional proxy server user and password.
When you configure egress, it is important to disable any connection or activity timeouts because the connector uses long running gRPC calls.

Password encryption

If you use a login name and password in your configuration, run the following utility to encrypt the password:
  1. Enter a user name and password in the connector configuration YAML.
  2. Run the standalone JAR with this property:
    -Dstandalone.connector.encrypt.password=true
    
  3. Retrieve the encrypted passwords from the log that is created.
  4. Replace the clear password in the configuration YAML with the encrypted password.

Connector restart (5.7 and earlier)

The connector will shut down automatically whenever the connection to the server is disrupted, to prevent it from getting into a bad state. Communication disruption can happen, for example, when the server running in the connectors-backend pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.

Recoverable bridge (5.8 and later)

If communication to the remote connector is disrupted, the connector will try to recover communication and gRPC calls. By default, six attempts will be made to recover each gRPC call. The number of attempts can be configured with the max-grpc-retries bridge parameters.

Job expiration duration (5.9.5 only)

The timeout value for irresponsive backend jobs can be configured with the job-expiration-duration-seconds parameter. The default value is 120 seconds.

Use the remote connector

Once the connector is running, it is available in the Datasources dropdown. If the standalone connector terminates, it disappears from the list of available connectors. Once it is re-run, it is available again and configured connector instances will not get lost.

Enable asynchronous parsing (5.9 and later)

To separate document crawling from document parsing, enable Tika Asynchronous Parsing on remote V2 connectors.
  • Added additional diagnostics between the connectors-backend and fusion-indexing services.
  • Added more detail to the messages that appear in the Managed Fusion UI when a connector job fails.
  • Added the reset action parameter to the subscriptions/{id}/refresh?action=some-action POST API endpoint. Calling reset will clear the subscription indexing topic from pending documents. See Indexing APIs.

Bug fixes

  • Fixed an issue that prevented successful configuration of new Kerberos security realms for authentication of external applications.

Deprecations

For full details on deprecations, see Deprecations and Removals. With the release of Solr supported embeddings and Solr Semantic Vector Search, Lucidworks is deprecating Milvus. The following Milvus query stages are deprecated and will be removed in a future release:
  • Milvus Ensemble Query Stage
  • Milvus Query Stage
  • Milvus Response Update Query Stage
Use Seldon or Lucidworks AI vector query stages instead.

Removals

For more information, see Deprecations and Removals.

Bitnami removal

By August 28, 2025, Fusion’s Helm chart will reference internally built open-source images instead of Bitnami images due to changes in how they host images.
I