Component | Version |
---|---|
Solr | fusion-solr 5.9.5 (based on Solr 9.6.1) |
ZooKeeper | 3.9.1 |
Spark | 3.2.2 |
Ingress Controllers | Nginx, Ambassador (Envoy), GKE Ingress Controller Istio not supported. |
Fusion 5 Upgrade from 5.9.x
values.yaml
file to avoid a known issue that prevents the kuberay-operator
pod from launching successfully:fm-upgrade-apps-to-solr-9
, also known as the Fusion migration script, is included in the Fusion 5.9.5 release. This utility performs the following tasks:<circuitBreaker>
, from solrconfig.xml
. Solr no longer supports this configuration.solr.XSLTResponseWriter
.solr.StatelessScriptUpdateProcessorFactory
.<bool name="preferLocalShards"/>
element from request handler."filterCache"
, "cache"
, "documentCache"
, "queryResultCache"
to solr.search.CaffeineCache
.keepShortTerm
attribute from filter of class solr.NGramFilterFactory
.Lucene90PostingsWriterDoc
to Lucene99PostingsWriterDoc
) in Lucene is incompatible with Solr in Fusion. As a result, Solr will not open a new searcher for collections using the FST50
postings format.To identify collections potentially affected by the Lucene codec change, examine the field definitions within your Solr schema. Look for fields that specify the postingsFormat
attribute with a value of FST50
. Collections containing such fields may experience compatibility issues. For example:postingsFormat="FST50"
codec issue, a Docker utility called fm-upgrade-query-rewrite
is provided alongside the Fusion 5.9.5 release. You can pull this image from Docker using docker pull lucidworks/fm-upgrade-query-rewrite:2.x
.This utility performs two actions: prepare
and restore
. Use the prepare
action before the Fusion 5.9.5 upgrade begins. At a high level, the prepare
action performs the following actions:postingsFormat="FST50"
from all collections in the environment.restore
action after the Fusion 5.9.5 upgrade finishes, which must include the Solr 9.6.1 upgrade. The restore
action performs the following actions:postingsFormat="FST50"
to all collections in the environment that were changed with the prepare
action.prepare
action.fm-upgrade-apps-to-solr-9
Docker utility. This updates the Solr configuration and collections for compatibility with Solr 9.6.1.fm-upgrade-query-rewrite
Docker utility. Use the prepare
action to address potential collection compatibility issues with Lucene 9.10.0 codecs.fm-upgrade-query-rewrite
Docker utility. Use the restore
action to restore collections to their original state.fm-upgrade-apps-to-solr-9
Docker utilityfm-upgrade-apps-to-solr-9
Docker utility to mitigate issues related to the change from Solr 9.1.1 and earlier to Solr 9.6.1. To begin, run the fm-upgrade-apps-to-solr-9
Docker utility using the DRY_RUN
environmental variable:DRY_RUN
variable prints the changes that would occur to the console without performing those actions. Review the changes thoroughly.If the changes look correct, run the fm-upgrade-apps-to-solr-9
Docker utility again without using the DRY_RUN
environmental variable. The updated config files are saved to the /upgrade-work/updated-configs
directory. The utility also creates backups for all configs in the /upgrade-work/backup-configs
.The fm-upgrade-apps-to-solr-9
Docker utility has another environmental variable, REVERT
, that allows you to revert any changes you made. To revert your changes, run:fm-upgrade-query-rewrite
Docker utilityfm-upgrade-query-rewrite
Docker utility prepare
action:RELEASE_NAME
to the release name of your Fusion installation, not the version. You can find this value using helm list
against your Fusion namespace. Locate the release using the fusion
chart, and find the value in the name
column. Typically, the release name is the same as your namespace name.The prepare
action removes postingsFormat="FST50"
from all collections in the environment before re-indexing data to temporary collections. When the prepare-595-upgrade
pod shows the status Completed
, the process is finished.For a complete summary of what this action does, refer to Upgrade Utility.signals
collection with the Rules Editor during the upgrade process. For production clusters, upgrade during a maintenance window.fm-upgrade-query-rewrite
Docker utilityfm-upgrade-query-rewrite
utility’s restore
action to restore the data from the temporary collections created by the prepare
action. Before you begin, verify all collections appended with _temp_fix
are online and healthy.Then, run the fm-upgrade-query-rewrite
Docker utility restore
action:RELEASE_NAME
value to the release name of your Fusion installation.When the restore-595-upgrade
pod shows the status Completed
, the process is finished.For a complete summary of what this action does, refer to Upgrade Utility.https://FUSION_HOST:FUSION_PORT/api/solrAdmin/default/#/
.
https://FUSION_HOST:FUSION_PORT/api/solrAdmin/default/#/~cloud?view=graph
.
prepare
and restore
pods that were created by the upgrade utility:Deployment type | Platform |
---|---|
Azure Kubernetes Service (AKS) | aks |
Amazon Elastic Kubernetes Service (EKS) | eks |
Google Kubernetes Engine (GKE) | gke |
<platform>_<cluster>_<release>_upgrade_fusion.sh
upgrade script file for editing.CHART_VERSION
to your target Fusion version, and save your changes.<platform>_<cluster>_<release>_upgrade_fusion.sh
script. The <release>
value is the same as your namespace, unless you overrode the default value using the -r
option.kubectl get pods
to see the changes applied to your cluster. It may take several minutes to perform the upgrade, as new Docker images are pulled from DockerHub. To see the versions of running pods, do:lwai-gateway
, provides a secure, authenticated connection between Fusion and your Lucidworks AI-hosted models.vectorSimilarity
QParser that will not be available in Apache Solr until 9.7.night
against a movie dataset. A higher threshold prioritizes high scoring results and in this case only returns movie names with night
in the title.
To learn how to configure the Hybrid Query stage, see the following demonstration:
Configure Neural Hybrid Search
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
Your Fusion account name must match the name of the account that you selected in the Account Name dropdown.
For more information about models, see:
{Destination Field}
is the vector field.{Destination Field}_b
is the boolean value if the vector has been indexed.useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.signals
or access_control
.
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
Your Fusion account name must match the name of the account that you selected in the Account Name dropdown.
For more information about models, see:
useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.
modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.
<copyField dest="\_text_" source="*"/>
and add <copyField dest="text" source="*_t"/>
below it. This will concatenate and index all *_t fields
.
_1024v
. There is no limitation on supported vector dimensions.<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.
ctx
), the preFilterKey
object becomes available.
preFilter
object adds both the top-level fq
and preFilter
to the parameters for the vector query.
You do not need to manually add the top level fq
in the javascript stage.
See the example below:
solrconfig.xml
within the <config>
tag:<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.knn
query parser as you would with Solr.
Specify the search vector and include it in the query.
For example, change the q
parameter to a knn
query parser string.You can also preview the results in the Query Workbench.
Try a few different queries, and adjust the weights and parameters in the Hybrid Query stage to find the best balance between lexical and semantic vector search for your use case.
You can also disable and re-enable the Neural Hybrid Query stage to compare results with and without it.XDenseVectorField
is not supported in Fusion 5.9.5 and above. Instead, use DenseVectorField
.topK
parameter. Variation will still occur, but it should be lower in the documents. Another way to mitigate shifts is to use Neural Hybrid Search with a vector similarity cutoff.For more information, refer to Solr Types of Replicas.In the case of Neural Hybrid Search, lexical BM25 & TF-IDF score differences that can occur with NRT replicas because of index differences for deleted documents, can also affect combined Hybrid score.
If you choose to use NRT replicas then it is possible that any lexical and/or semantic vectors variations can and will be exacerbated.--form-string 'fq=VECTOR_FIELD_b:true' \
ids
you see are the orphans.
Proceed to Resolving orphans.
If no documents are returned, there are likely no orphans.
You can try a few varying vectors to be certain.hnswBeamWidth
and hnswMaxConnections
per the Suggested values below.Orphaning rate | hnswBeamWidth | hnswMaxConnections |
---|---|---|
5% or less | 300 | 64 |
5% - 25% | 500 | 100 |
25% or more | 3200 | 512 |
kafka.logRetentionBytes
is increased to 5 GB. This improvement helps prevent failed datasource jobs due to full disk space. Refer to Troubleshoot failed datasource jobs.
Troubleshoot failed datasource jobs
resetting offset
and is out of range
, which indicate data has been dropped.values.yaml
file in the Helm chart.kafka.logRetentionBytes
is 1073741824
bytes (1 GB).
Try increasing this value to 2147483648
bytes (2 GB) or 3221225472
(3 GB), or larger depending on the size of your documents.
-1
to remove the size limit.
If you do this, be sure to set an appropriate limit for logRetentionHours
instead.
kafka.logRetentionHours
is 168
(7 days).
If you increase kafka.logRetentionBytes
by a significant amount (for example, 20 GB), you might need to decrease this setting to prevent running out of disk space.
However, because older log entries are deleted when either limit is reached, you should set it high enough to ensure the data remains available until it’s no longer needed.
<circuitBreaker>
, from solrconfig.xml
. Solr no longer supports this configuration.solr.XSLTResponseWriter
.solr.StatelessScriptUpdateProcessorFactory
.<bool name="preferLocalShards"/>
element from request handler."filterCache"
, "cache"
, "documentCache"
, "queryResultCache"
to solr.search.CaffeineCache
.keepShortTerm
attribute from filter of class solr.NGramFilterFactory
.job-expiration-duration-seconds
for remote connectors that lets you configure the timeout value. Refer to Configure Remote V2 Connectors.
Configure Remote V2 Connectors
remote-connectors
or admin
role.remote-connectors
role by default, you can create one. No API or UI permissions are required for the role.values.yaml
file, configure this section as needed:enabled
to true
to enable the backend ingress.
pathtype
to Prefix
or Exact
.
path
to the path where the backend will be available.
host
to the host where the backend will be available.
ingressClassName
to one of the following:
nginx
for Nginx Ingress Controlleralb
for AWS Application Load Balancer (ALB)logging.config
property is optional. If not set, logging messages are sent to the console.plain-text
to true
.connectors-backend
pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.max-grpc-retries
bridge parameters.job-expiration-duration-seconds
parameter. The default value is 120
seconds.connectors-backend
and fusion-indexing
services.
reset
action parameter to the subscriptions/{id}/refresh?action=some-action
POST API endpoint. Calling reset
will clear the subscription indexing topic from pending documents. See Indexing APIs.