Component | Version |
---|---|
Solr | fusion-solr 5.9.5 (based on Solr 9.6.1) |
ZooKeeper | 3.9.1 |
Spark | 3.2.2 |
Ingress Controllers | Nginx, Ambassador (Envoy), GKE Ingress Controller Istio not supported. |
lwai-gateway
, provides a secure, authenticated connection between Managed Fusion and your Lucidworks AI-hosted models.vectorSimilarity
QParser that will not be available in Apache Solr until 9.7.night
against a movie dataset. A higher threshold prioritizes high scoring results and in this case only returns movie names with night
in the title.
To learn how to configure the Hybrid Query stage, see the following demonstration:
Configure Neural Hybrid Search
{Destination Field}
is the vector field.{Destination Field}_b
is the boolean value if the vector has been indexed.useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.signals
or access_control
.
PUT,POST,GET:/LWAI-ACCOUNT-NAME/**
For more information about models, see:
useCaseConfig
parameter that is common to embedding use cases is dataType
, but each use case may have other parameters. The value for the query stage is query
.
modelConfig
parameters are common to generative AI use cases.
For more information, see Prediction API.
<copyField dest="\_text_" source="*"/>
and add <copyField dest="text" source="*_t"/>
below it. This will concatenate and index all *_t fields
.
_1024v
. There is no limitation on supported vector dimensions.<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.
ctx
), the preFilterKey
object becomes available.
preFilter
object adds both the top-level fq
and preFilter
to the parameters for the vector query.
You do not need to manually add the top level fq
in the javascript stage.
See the example below:
solrconfig.xml
within the <config>
tag:<ctx.vector>
evaluates the context variable resulting from a previous stage, such as the LWAI Vectorize Query stage.
knn
query parser as you would with Solr. Specify the search vector and include it in the query. For example, change the q
parameter to a knn
query parser string.You can also preview the results in the Query Workbench.
Try a few different queries, and adjust the weights and parameters in the Hybrid Query stage to find the best balance between lexical and semantic vector search for your use case.
You can also disable and re-enable the Neural Hybrid Query stage to compare results with and without it.XDenseVectorField
is not supported in Managed Fusion 5.9.5. Instead, use DenseVectorField
.topK
parameter. Variation will still occur, but should be lower in the documents. Another way to mitigate shifts is to use Neural Hybrid Search with a vector similarity cutoff.For more information, refer to Solr Types of Replicas.In the case of Neural Hybrid Search, lexical BM25 and TF-IDF score differences that can occur with NRT replicas because of index differences for deleted documents can also affect combined Hybrid score.
If you choose to use NRT replicas, then it is possible that any lexical and semantic vectors variations can and will be made worse.--form-string 'fq=VECTOR_FIELD_b:true' \
ids
you see are the orphans.
Proceed to Resolving orphans.
If no documents are returned, there are likely no orphans.
You can try a few varying vectors to be certain.hnswBeamWidth
and hnswMaxConnections
per the Suggested values below.Orphaning rate | hnswBeamWidth | hnswMaxConnections |
---|---|---|
5% or less | 300 | 64 |
5% - 25% | 500 | 100 |
25% or more | 3200 | 512 |
kafka.logRetentionBytes
is increased to 5 GB. This improvement helps prevent failed datasource jobs due to full disk space. Refer to Troubleshoot failed datasource jobs.
Troubleshoot failed datasource jobs
resetting offset
and is out of range
, which indicate data has been dropped.values.yaml
file in the Helm chart.kafka.logRetentionBytes
is 1073741824
bytes (1 GB).
Try increasing this value to 2147483648
bytes (2 GB) or 3221225472
(3 GB), or larger depending on the size of your documents.
-1
to remove the size limit.
If you do this, be sure to set an appropriate limit for logRetentionHours
instead.
kafka.logRetentionHours
is 168
(7 days).
If you increase kafka.logRetentionBytes
by a significant amount (for example, 20 GB), you might need to decrease this setting to prevent running out of disk space.
However, because older log entries are deleted when either limit is reached, you should set it high enough to ensure the data remains available until it’s no longer needed.
<circuitBreaker>
, from solrconfig.xml
. Solr no longer supports this configuration.solr.XSLTResponseWriter
.solr.StatelessScriptUpdateProcessorFactory
.<bool name="preferLocalShards"/>
element from request handler."filterCache"
, "cache"
, "documentCache"
, "queryResultCache"
to solr.search.CaffeineCache
.keepShortTerm
attribute from filter of class solr.NGramFilterFactory
.job-expiration-duration-seconds
for remote connectors that lets you configure the timeout value. Refer to Configure Remote V2 Connectors.
Configure Remote V2 Connectors
remote-connectors
or admin
role. This step is performed by Lucidworks.remote-connectors
role by default, Lucidworks can create one. No API or UI permissions are required for the role.rpc-service/values.yaml
file, configure this section as needed:enabled
to true
to enable the backend ingress.
pathtype
to Prefix
or Exact
.
path
to the path where the backend will be available.
host
to the host where the backend will be available.
ingressClassName
to one of the following:
nginx
for Nginx Ingress Controlleralb
for AWS Application Load Balancer (ALB)logging.config
property is optional. If not set, logging messages are sent to the console.plain-text
to true
.connectors-backend
pod shuts down and is replaced by a new pod. Once the connector shuts down, connector configuration and job execution are disabled. To prevent that from happening, you should restart the connector as soon as possible.You can use Linux scripts and utilities to restart the connector automatically, such as Monit.max-grpc-retries
bridge parameters.job-expiration-duration-seconds
parameter. The default value is 120
seconds.connectors-backend
and fusion-indexing
services.
reset
action parameter to the subscriptions/{id}/refresh?action=some-action
POST API endpoint. Calling reset
will clear the subscription indexing topic from pending documents. See Indexing APIs.