fusion-crossdc-producer
) and a dedicated consumer application (fusion-crossdc-consumer
) for replaying updates on the target cluster, helping to ensure high availability and business continuity in distributed or hybrid cloud environments.
This feature is supported in Fusion 5.9.13 and later.

Be sure to review the Limitations section to understand what to expect before you enable this feature.
CrossDC and ConfigSync
Fusion supports ConfigSync in addition to CrossDC. While CrossDC is designed for data replication, ConfigSync is designed for configuration synchronization.When you use CrossDC, ConfigSync must also be enabled in order to mirror Solr configset data that is modified by Fusion directly in ZooKeeper.
CrossDC only mirrors the changes applied using Solr APIs.
Feature | CrossDC | ConfigSync |
---|---|---|
Data replication | ||
Solr collection synchronization | ||
Rules synchronization | ||
Configuration synchronization | ||
Blob synchronization | ||
Version control (Git) | ||
ZooKeeper data | ||
Disaster recovery | For search data | For configuration only |
Latency reduction | Across geo-distributed users | Not applicable |
Typical use case | Global failover, data center redundancy | DevOps config promotion, disaster recovery for Fusion config |
- Solr collections, which can optionally include creating and deleting collections
- Rules stored in
*_query_rewriter
and*_query_rewrite_staging
collections - Any other new Solr collection data that you configure to be synchronized, as explained below
Before you begin
Before you enable Solr CrossDC, your Solr collections must already be in a synchronized state. After you enable CrossDC, synchronization happens automatically. The instructions below explain how to synchronize your collections before enabling this feature.Prepare for enabling Solr CrossDC
Prepare for enabling Solr CrossDC
- Schedule a maintenance window. During this window, ensure that Fusion will not perform any operations that could alter the contents of its Solr collections. Schedule sufficient time to perform Solr collections backup/restore operations followed by Solr CrossDC enablement.
-
Back up your Solr collections from the source Fusion cluster, using your cloud storage provider’s repository.
The Solr documentation has provider-specific instructions for configuring the backup.
An example configuration is shown below:
- Restore your Solr collections on the Fusion clusters you want to synchronize, using your newly-created backup. The Solr documentation has complete instructions for doing this, too.
Configure CrossDC
CrossDC configuration is done in two data centers: the source and target. Follow the detailed steps below.-
In the source data center, configure Solr and Kafka.
Configure the source Solr instance
To enable CrossDC for Solr in the source data center, you need to configure the following components:
The steps and examples below show you how to configure your source Solr instance.Solr class Where configured Role in CrossDC fusion-crossdc-producer solr.xml
This module contains the necessary classes. It is included in the fusion-solr-managed
Docker image.MirroringUpdateRequestProcessorFactory solrconfig.xml
This processor mirrors Solr indexing updates (such as document additions, updates, or deletions) to the source Kafka instance. MirroringConfigSetsHandler solr.xml
This handler mirrors configset changes to the source Kafka instance. MirroringCollectionsHandler solr.xml
This handler mirrors Solr collection admin commands (such as collection creation or deletion) to the source Kafka instance. FusionCollectionsHandler solr.xml
This extended version of the MirroringCollectionsHandler
mirrors Solr collection admin commands to the source Kafka instance and adds the ability to filter (whitelist) the commands you want to mirror.-
Pull and deploy the
fusion-solr-managed
Docker image. -
In
solrconfig.xml
for the configset used by each collection, configure theMirroringUpdateRequestProcessorFactory
handler:BothOnly collections withMirroringUpdateRequestProcessorFactory
configured in theupdateRequestProcessorChain
are mirrored; other collections are ignored.bootstrapServers
andtopicName
are required:
These parameters are optional for theParameter Type Description bootstrapServers
string A comma-separated list of servers used to connect to the source Kafka cluster topicName
string The name of the Kafka topic to which Solr updates will be pushed. This topic must already exist. MirroringUpdateRequestProcessorFactory
:Parameter Type Description batchSizeBytes
integer Maximum batch size in bytes for the Kafka queue. bufferMemoryBytes
integer Memory allocated by the Producer in total for buffering. lingerMs
integer Amount of time that the Producer will wait to add to a batch. requestTimeout
integer Request timeout for the Producer. enableDataCompression
boolean Whether to use compression for data sent over the Kafka queue, one of the following:
•none
(default)
•gzip
•snappy
•lz4
•zstd
numRetries
integer Setting a value greater than zero will cause the Producer to resend any record whose send fails with a potentially transient error. retryBackoffMs
integer The amount of time to wait before attempting to retry a failed request to a given topic partition. deliveryTimeoutMS
integer Updates sent to the Kafka queue will be failed before the number of retries has been exhausted if the timeout configured by delivery.timeout.ms
expires first.maxRequestSizeBytes
integer The maximum size of a Kafka queue request in bytes – limits the number of requests that will be sent over the queue in a single batch. dlqTopicName
string If not empty, then requests that failed processing maxAttempts
times will be sent to a “dead letter queue” topic in Kafka (must exist if configured).indexUnmirrorableDocs
boolean If set to true
, updates that are too large for the Kafka queue will still be indexed locally into the source collection.mirrorCommits
boolean If true
, then standalone commit requests will be mirrored as separate requests; otherwise they will be processed only locally.expandDbq
enum If set to expand
(default), then Delete-By-Query is expanded before mirroring into a series of Delete-By-Id, which may help with correct processing of out-of-order requests on the consumer side. If set tonone
, then Delete-By-Query requests are mirrored as-is. -
If you are not using ConfigSync: Configure configset mirroring in
solr.xml
: -
If you are not using ConfigSync: Configure collection admin request mirroring in
solr.xml
. You can choose one of these handlers to use for this:MirroringCollectionsHandler
is the native Solr handler. It mirrors all admin actions for all collections, or you can select specific collections to mirror.FusionCollectionsHandler
has all the same capabilities and configuration options, plus action whitelisting so you can mirror only selected actions and ignore others.
By default, admin commands are mirrored for all collections. To mirror admin commands for specific collections only, you can set this system property:If you are usingA comma-separated list of collections for which the admin commands will be mirrored. If this list is empty or the property is not set, then admin commands for all collections mirrored. This property is supported by bothMirroringCollectionsHandler
andFusionCollectionsHandler
.FusionCollectionsHandler
, you can also configure action whitelisting by configuring the following system property:A comma-separated list of actions to mirror. If it is not empty, then only the listed actions are mirrored; all others are ignored. See the Solr documentation for the list of actions.
Configure the source Kafka
In the source data center’s Kafka instance, the Kafka topic must already exist and be configured to accept messages from the Solr instance. Make sure that thebootstrapServers
you configured in Solr are reachable by Solr and that the configuredtopicName
exists.If you configured Solr to use a Dead-Letter Queue (DLQ) topic (dlqTopicName
), you must also create that topic in the source Kafka instance.See the Kafka documentation for configuration details. -
Pull and deploy the
-
In the target data center, configure Solr, Kafka, and the Consumer.
Configure the target Solr
Because CrossDC only mirrors new commands, the existing collections, documents, and configsets from your source Solr must already exist on the target Solr before mirroring begins. Create them if needed. The target collections must have the same names as the source collections, and they must use the same configsets as the source collections.New collections created on the source Solr are not automatically created on the target Solr unless ConfigSync is enabled or you have enabled eitherMirroringCollectionsHandler
orFusionCollectionsHandler
on the source Solr.Configure the target Kafka
In the target data center’s Kafka instance, you must create the same topic that you configured in MirrorMaker and the Consumer. If you configured a Dead-Letter Queue (DLQ) topic in the source Solr instance, you must also create that topic in the target Kafka instance.See the Kafka documentation for configuration details.Configure the Consumer
-
Pull and deploy the
fusion-crossdc-consumer
Docker image for your Fusion release, such as 5.9.14. -
Configure the required system properties listed below, and any optional ones that apply to your use case.
These additional optional configuration properties are used when the Consumer must retry by putting updates back in the Kafka queue:Parameter Required? Description bootstrapServers
required A list of Kafka bootstrap servers. topicName
required Kafka topicName
used to indicate which Kafka topic the Solr updates will be read from.
This can be a comma-separated list to consume multiple topics.zkConnectString
required The ZooKeeper connection string used for connecting to the target Solr instance. consumerProcessingThreads
optional The number of threads used by the consumer to concurrently process updates from the Kafka queue. port
optional The local port for the API endpoints. Default is 8090
.collapseUpdates
optional (enum) - When set to
all
, all incoming update requests will be collapsed into a singleUpdateRequest
, as long as their parameters are identical. - When set to
partial
(default), only requests without deletions are collapsed; requests with anydelete
ops are sent individually in order to preserve ordering of updates. - When set to
none
, the incoming update requests are sent individually without any collapsing.
Requests of other types thanUPDATE
are never collapsed.Parameter Description batchSizeBytes
The maximum batch size in bytes for the Kafka queue. bufferMemoryBytes
The memory allocated by the Producer in total for buffering. lingerMs
The amount of time that the Producer will wait to add to a batch. requestTimeout
The request timeout for the Producer. maxPollIntervalMs
The maximum delay between invocations of poll()
when using Consumer group management. - When set to
-
Pull and deploy the
-
Configure Kafka MirrorMaker to connect to Kafka in both data centers.
Configure MirrorMaker
You can deploy MirrorMaker in either of your data centers, or somewhere else. Ensure that it can access both the source Kafka and target Kafka instances. Configure the source and target topic names to correspond with the names configured inMirroringUpdateRequestProcessorFactory
and the Consumer application.See the MirrorMaker documentation for configuration details.
Metrics and monitoring
Bothfusion-crossdc-producer
and fusion-crossdc-consumer
expose metrics that can be monitored.
Producer metrics
Thefusion-crossdc-producer
module exposes the following metrics for each source replica in a collection, under the
Solr /metrics
API endpoint:
Metric name | Description |
---|---|
crossdc.producer.local | Counter representing the number of local documents processed successfully. |
crossdc.producer.submitted | Counter representing the number of documents submitted to the Kafka topic. |
crossdc.producer.documentSize | Histogram of the processed document size. |
crossdc.producer.errors.local | Counter representing the number of local documents processed with error. |
crossdc.producer.errors.submit | Counter representing the number of documents that were not submitted to the Kafka topic because of exception during execution. |
crossdc.producer.errors.documentTooLarge | Counter representing the number of documents that were too large to send to the Kafka topic. |
Consumer metrics
Thefusion-crossdc-consumer
application exposes the following metrics under its /metrics API endpoint, in JSON format with the
following hierarchical keys, where the <TYPE>
can be one of UPDATE
, ADMIN
, or CONFIGSET
:
Counters
Metric name | Description |
---|---|
counters.<TYPE>.input | Number of input messages retrieved from Kafka |
counters.<TYPE>.add | Number of input Add documents (one input message may contain multiple Add documents) |
counters.<TYPE>.dbi | Number of input Delete-By-Id commands (one input message may contain multiple DBI commands) |
counters.<TYPE>.dbq | Number of input Delete-By-Query commands (one input message may contain multiple DBQ commands) |
counters.<TYPE>.collapsed | Number of input requests that were added to other requests to minimize the number of requests sent to Solr |
counters.<TYPE>.handled | Total number of successfully processed output requests sent to Solr |
counters.<TYPE>.failed-resubmit | Number of requests resubmitted to the input queue for re-trying (on intermittent failures) |
counters.<TYPE>.failed-dlq | Number of requests submitted to the Dead-Letter queue due to failures on multiple re-tries |
counters.<TYPE>.failed-no-retry | Number of requests dropped due to persistent failures (including inability to send to DLQ) |
counters.<TYPE>.output-errors | Number of errors when sending requests to target Solr |
counters.<TYPE>.backoff | Number of times when the consumer had to back off from processing due to errors |
counters.<TYPE>.invalid-collection | Number of requests sent to an invalid (e.g. non-existent) collection |
Timers
Metric name | Description |
---|---|
timers.<TYPE>.outputLatency | Dropwizard Timer (meter + histogram) for latency between request creation timestamp and the output timestamp. This assumes that the clocks are synchronized between the Producer and Consumer. |
timers.<TYPE>.outputTime | Dropwizard Timer for time to send the processed request to the target Solr. The Consumer application also exposes a /threads API endpoint that returns a plain-text thread dump of the JVM running the Consumer application. |
Limitations
The CrossDC feature has some known limitations:- Only updates are mirrored, not existing indexes. You should create a copy of each existing collection, with its documents and configset, on the target Solr before you turn on CrossDC.
- Data loss can lead to divergence. If any of the components in your CrossDC configuration experience an event that causes data loss, the source collections and target collections can potentially diverge. Diverged indexes are not automatically detected or re-synchronized.
-
Document size is limited.
The CrossDC Producer module tries to estimate the size of each message and avoid sending messages that are too large to Kafka.
It does not split messages that are too large; instead, it rejects them.
If this happens after the update has already been processed locally, then the contents of the mirrored collections can diverge.
Kafka’s maximum message size is 1MB by default, configured with
message.max.bytes
. - Retries can lead to divergence The CrossDC Producer module first applies updates locally, and attempts mirroring only if they succeed. If sending a mirrored request fails, the request is retried, and if it’s still failing then it’s logged and the message is discarded (or sent to a dead-letter queue). Since the update was already applied locally, this can cause divergence of the local and mirrored collections.
-
Commands can be re-ordered when collapsed.
The CrossDC Producer module can optionally preserve exact ordering of updates and deletes sent in a single request, but this negatively affects performance.
If you do not need strict ordering of multiple commands in a single request, then you should use
collapseUpdates=partial
orcollapseUpdates=all
. -
Delete-by-Query expansion can lead to divergence.
The CrossDC Producer can either mirror Delete-By-Query requests as-is or expand them into individual Delete-By-Id requests (except for
*:*
which is always sent as-is). In extreme cases this expansion may produce a request that is too large to be mirrored. Delete-By-Query expansion helps to ensure the strict ordering of deletes and updates in the target Solr collection but it may also lead to divergence of the local and mirrored collections if the expansion fails or the resulting request is too large to mirror. - Collection creation and deletion requires an existing configset. The CrossDC Producer module can optionally mirror collection creation and deletion requests. However, the target Solr instance must already have the corresponding configset available in ZooKeeper. If it doesn’t, this causes an error when the target collection is created or deleted. The Consumer application may also experience significant slow-downs when it receives update requests to non-existent target collections. These slow-downs affect processing requests for other collections, too.
-
ConfigSet creation and deletion behavior depends on the handler.
If you are using
MirroringConfigSetsHandler
, then new configsets created on the source Solr are mirrored automatically to the target Solr. If you are not usingMirroringConfigSetsHandler
, then new configsets are not mirrored; you must use ConfigSync or create them manually on the target Solr to avoid an error when the target collection is created or deleted. These errors also impact the performance of the Consumer application. -
In some cases, admin request whitelisting is needed.
If you are using
MirroringCollectionsHandler
, then all collection admin requests are mirrored. This may not always be desirable if the target Solr cluster is expected to differ or is managed externally (such as by an autoscaling operator). In this case, you should useFusionCollectionsHandler
instead, and configure thecollectionActionsWhitelist
property to restrict the mirrored collection admin requests to only those that are needed.