fusion/lucidworks/job-launcher@sha256:02624cfbc3eb536e4a29353b7bfe0846380a09f98ae3c78da859793686782092
.
imagePullPolicy
of your Deployments:
IfNotPresent
, patch it to Always
:
imagePullPolicy
for StatefulSets specifically to ensure they are set correctly:
sha256:02624cfbc3eb536e4a29353b7bfe0846380a09f98ae3c78da859793686782092
then the pod is running the latest re-release version.Develop and deploy a machine learning model with Ray
pip install ray[serve]
.docker run
with a specified port, like 9000, which you can then curl
to confirm functionality in Fusion.
See the testing example below.kubectl edit configmap argo-deploy-ray-model-workflow -n <namespace>
and then find the ray-head
container in the artisanal escaped YAML and change the memory limit.
Exercise caution when editing because it can break the YAML.
Just delete and replace a single character at a time without changing any formatting.
MODEL_DEPLOYMENT
in the command below can be found with kubectl get svc -n NAMESPACE
. It will have the same name as set in the model name in the Create Ray Model Deployment job.
e5-small-v2
model from Hugging Face, but any pre-trained model from https://huggingface.co will work with this tutorial.If you want to use your own model instead, you can do so, but your model must have been trained and then saved though a function similar to the PyTorch’s torch.save(model, PATH)
function.
See Saving and Loading Models in the PyTorch documentation.e5-small-v2
model is as follows:./
.__call__
: This function is non-negotiable.init
: The init
function is where models, tokenizers, vectorizers, and the like should be set to self for invoking.
It is recommended that you include your model’s trained parameters directly into the Docker container rather than reaching out to external storage inside init
.encode
: The encode
function is where the field or query that is passed to the model from Fusion is processed.
Alternatively, you can process it all in the __call__
function, but it is cleaner not to.
The encode
function can handle any text processing needed for the model to accept input invoked in its model.predict()
or equivalent function which gets the expected model result.deployment.py
and the class name is Deployment()
.requirements.txt
file is a list of installs for the Dockerfile
to run to ensure the Docker container has the right resources to run the model.
For the e5-small-v2
model, the requirements are as follows:import
statement in your Python file, it should be included in the requirements file.To populate the requirements, use the following command in the terminal, inside the directory that contains your code:MODEL_NAME.py
, Dockerfile
, and requirements.txt
files, you need to run a few Docker commands.
Run the following commands in order:Parameter | Description |
---|---|
Job ID | A string used by the Fusion API to reference the job after its creation. |
Model name | A name for the deployed model. This is used to generate the deployment name in Ray. It is also the name that you reference as a model-id when making predictions with the ML Service. |
Model min replicas | The minimum number of load-balanced replicas of the model to deploy. |
Model max replicas | The maximum number of load-balanced replicas of the model to deploy. Specify multiple replicas for a higher-volume intake. |
Model CPU limit | The number of CPUs to allocate to a single model replica. |
Model memory limit | The maximum amount of memory to allocate to a single model replica. |
Ray Deployment Import Path | The path to your top-level Ray Serve deployment (or the same path passed to serve run ). For example, deployment:app |
Docker Repository | The public or private repository where the Docker image is located. If you’re using Docker Hub, fill in the Docker Hub username here. |
Image name | The name of the image. For example, e5-small-v2-ray:0.1 . |
Kubernetes secret | If you’re using a private repository, supply the name of the Kubernetes secret used for access. |
Parameter | Description |
Additional parameters. | This section lets you enter parameter name:parameter value options to be injected into the training JSON map at runtime. The values are inserted as they are entered, so you must surround string values with " . This is the sparkConfig field in the configuration file. |
Write Options. | This section lets you enter parameter name:parameter value options to use when writing output to Solr or other sources. This is the writeOptions field in the configuration file. |
Read Options. | This section lets you enter parameter name:parameter value options to use when reading input from Solr or other sources. This is the readOptions field in the configuration file. |
lookup
function. Fusion 5.9.12 resolves this issue by modifying the chart templates for compatibility with ArgoCD, restoring support for TLS-enabled deployments in GitOps workflows.
lucidworks/fusion-solr
, aligning with other components and simplifying deployment for external environments.
fusion-spark
image in the configmap of the job-launcher
, like this:
wasbs://
protocol due to a Jetty version conflict; run Spark 3.2.2 instead if your jobs rely on wasbs://
.abfs://
protocol for Azure Blob Storage access, which is fully supported in Spark 3.4.1.
started-by
values for datasource jobs in the job history.default-subject
instead of the actual user.
Fusion now correctly records and displays the initiating user in the job history, restoring accurate audit information for datasource operations.
managed-schema
and managed-schema.xml
files when reading Solr config sets, ensuring backward compatibility with apps created before the move to template-based config sets.
This prevents Schema API failures caused by unhandled exceptions during schema file lookup.
job-config
service to improve reliability and user experience.job-config
pods lose connection to ZooKeeper.lw.rules.target_segment
parameter, ensuring only matching rules are triggered and improving rule targeting and safety.
primary-port-name
labels, even though this did not impact functionality. This fix reduces unnecessary log noise and improves the clarity of your logs.
In Fusion versions 5.9.12 through 5.9.13, the job-config service may falsely report as “down” in the Fusion UI, particularly during startup or in TLS-enabled deployments.
This issue is fixed in Fusion 5.9.14.
In Fusion versions 5.9.12 through 5.9.13, strict validation in the job-config
service causes “Collection not found” errors when jobs or V2 datasources target Fusion collections that point to differently named Solr collections.
This issue is fixed in Fusion 5.9.14.
As a workaround, use V1 datasources or avoid using REST call jobs on remapped collections.
In Fusion versions 5.9.12 through 5.9.13, saving a large query pipeline during high query volume can result in thread lock, excessive memory use, and eventual OOM errors in the Query service.
This issue is fixed in Fusion 5.9.14.
Clicking the Stop button has no effect in some cases where the backend job is no longer being tracked. This causes the job-config
service to ignore the job and prevents the system from updating the job status. A workaround is to issue a POST {"action": "start"}
to the appropriate job actions endpoint, which aborts the stuck job.
This issue is fixed in Fusion 5.9.14.
In Fusion 5.9.12 and 5.9.13, Spark jobs may vanish from the job list if a Spark driver pod is deleted. This behavior can cause confusion and require a job-launcher restart to restore job visibility.
This issue is fixed in Fusion 5.9.12.
A bug in the connectors-backend service can prevent jobs from running if a previous crawl attempt was interrupted—for example, if the connector pod was scaled down mid-job.
A subsequent crawl attempt may fail with the error The state should never be null
, even after clearing the datasource configuration.
This issue is fixed in Fusion 5.9.13.
fusion-spark-3.2.2
image included in Fusion 5.9.12 contains a Fabric8 token refresh bug.
This issue affects Spark jobs running in Kubernetes environments that rely on token-based authentication. Due to an outdated Fabric8 client library, the image fails to refresh Kubernetes tokens correctly, which can cause authentication errors in long-running or distributed Spark jobs.
This issue is fixed in Fusion 5.9.13.
job-config
service may report a DOWN
status at /actuator/health
even when it is fully operational.
This issue can occur after a prolonged ZooKeeper outage when TLS is enabled.
Despite the service being healthy and showing UP
status on readiness and liveness checks, the /job-config/actuator/health
endpoint may still report DOWN
, potentially triggering false alarms or unnecessary restarts.
This issue is fixed in Fusion 5.9.13.
values.yaml
file to avoid a known issue that prevents the kuberay-operator
pod from launching successfully:
Fusion 5.9.12 may fail to index with the Webv2 connector (v2.0.1) due to a corrupted job state in the connectors-backend
service.
Affected jobs log the error The state should never be null
, and common remediation steps like deleting the datasource or reinstalling the connector plugin may not resolve the issue.
The issue is fixed in Fusion 5.9.13.
In some Fusion 5.9.12 environments, clicking Save when adding a schedule from the datasource “Run” dialog does not persist the schedule or show an error message, particularly in apps created before the upgrade.
As a workaround, use a new app or manually verify that the job configuration was saved.
This issue is fixed in Fusion 5.9.13.
ml-model
service. MLeap was deprecated in Fusion 5.2.0 and was no longer used by Fusion.
Component | Version |
---|---|
Solr | fusion-solr 5.9.12 (based on Solr 9.6.1) |
ZooKeeper | 3.9.1 |
Spark | 3.4.1 |
Ingress Controllers | Nginx, Ambassador (Envoy), GKE Ingress Controller |
Ray | ray[serve] 2.42.1 |