fusion-crossdc-producer
) and a dedicated consumer application (fusion-crossdc-consumer
) for replaying updates on the target cluster, helping to ensure high availability and business continuity in distributed or hybrid cloud environments.
Feature | CrossDC | ConfigSync |
---|---|---|
Data replication | ||
Solr collection synchronization | ||
Rules synchronization | ||
Configuration synchronization | ||
Blob synchronization | ||
Version control (Git) | ||
ZooKeeper data | ||
Disaster recovery | For search data | For configuration only |
Latency reduction | Across geo-distributed users | Not applicable |
Typical use case | Global failover, data center redundancy | DevOps config promotion, disaster recovery for Fusion config |
*_query_rewriter
and *_query_rewrite_staging
collectionsPrepare for enabling Solr CrossDC
Configure the source Solr instance
Solr class | Where configured | Role in CrossDC |
---|---|---|
fusion-crossdc-producer | solr.xml | This module contains the necessary classes. It is included in the fusion-solr-managed Docker image. |
MirroringUpdateRequestProcessorFactory | solrconfig.xml | This processor mirrors Solr indexing updates (such as document additions, updates, or deletions) to the source Kafka instance. |
MirroringConfigSetsHandler | solr.xml | This handler mirrors configset changes to the source Kafka instance. |
MirroringCollectionsHandler | solr.xml | This handler mirrors Solr collection admin commands (such as collection creation or deletion) to the source Kafka instance. |
FusionCollectionsHandler | solr.xml | This extended version of the MirroringCollectionsHandler mirrors Solr collection admin commands to the source Kafka instance and adds the ability to filter (whitelist) the commands you want to mirror. |
fusion-solr-managed
Docker image.
solrconfig.xml
for the configset used by each collection, configure the MirroringUpdateRequestProcessorFactory
handler:
MirroringUpdateRequestProcessorFactory
configured in the updateRequestProcessorChain
are mirrored; other collections are ignored.bootstrapServers
and topicName
are required:
Parameter | Type | Description |
---|---|---|
bootstrapServers | string | A comma-separated list of servers used to connect to the source Kafka cluster |
topicName | string | The name of the Kafka topic to which Solr updates will be pushed. This topic must already exist. |
MirroringUpdateRequestProcessorFactory
:
Parameter | Type | Description |
---|---|---|
batchSizeBytes | integer | Maximum batch size in bytes for the Kafka queue. |
bufferMemoryBytes | integer | Memory allocated by the Producer in total for buffering. |
lingerMs | integer | Amount of time that the Producer will wait to add to a batch. |
requestTimeout | integer | Request timeout for the Producer. |
enableDataCompression | boolean | Whether to use compression for data sent over the Kafka queue, one of the following: • none (default)• gzip • snappy • lz4 • zstd |
numRetries | integer | Setting a value greater than zero will cause the Producer to resend any record whose send fails with a potentially transient error. |
retryBackoffMs | integer | The amount of time to wait before attempting to retry a failed request to a given topic partition. |
deliveryTimeoutMS | integer | Updates sent to the Kafka queue will be failed before the number of retries has been exhausted if the timeout configured by delivery.timeout.ms expires first. |
maxRequestSizeBytes | integer | The maximum size of a Kafka queue request in bytes – limits the number of requests that will be sent over the queue in a single batch. |
dlqTopicName | string | If not empty, then requests that failed processing maxAttempts times will be sent to a “dead letter queue” topic in Kafka (must exist if configured). |
indexUnmirrorableDocs | boolean | If set to true , updates that are too large for the Kafka queue will still be indexed locally into the source collection. |
mirrorCommits | boolean | If true , then standalone commit requests will be mirrored as separate requests; otherwise they will be processed only locally. |
expandDbq | enum | If set to expand (default), then Delete-By-Query is expanded before mirroring into a series of Delete-By-Id, which may help with correct processing of out-of-order requests on the consumer side. If set to none , then Delete-By-Query requests are mirrored as-is. |
solr.xml
:
solr.xml
.
You can choose one of these handlers to use for this:
MirroringCollectionsHandler
is the native Solr handler. It mirrors all admin actions for all collections, or you can select specific collections to mirror.FusionCollectionsHandler
has all the same capabilities and configuration options, plus action whitelisting so you can mirror only selected actions and ignore others.MirroringCollectionsHandler
and FusionCollectionsHandler
.FusionCollectionsHandler
, you can also configure action whitelisting by configuring the following system property:
Configure the source Kafka
bootstrapServers
you configured in Solr are reachable by Solr and that the configured topicName
exists.If you configured Solr to use a Dead-Letter Queue (DLQ) topic (dlqTopicName
), you must also create that topic in the source Kafka instance.See the Kafka documentation for configuration details.Configure the target Solr
MirroringCollectionsHandler
or FusionCollectionsHandler
on the source Solr.Configure the target Kafka
Configure the Consumer
fusion-crossdc-consumer
Docker image for your Fusion release, such as 5.9.14.
Parameter | Required? | Description |
---|---|---|
bootstrapServers | required | A list of Kafka bootstrap servers. |
topicName | required | Kafka topicName used to indicate which Kafka topic the Solr updates will be read from.This can be a comma-separated list to consume multiple topics. |
zkConnectString | required | The ZooKeeper connection string used for connecting to the target Solr instance. |
consumerProcessingThreads | optional | The number of threads used by the consumer to concurrently process updates from the Kafka queue. |
port | optional | The local port for the API endpoints. Default is 8090 . |
collapseUpdates | optional (enum) |
Requests of other types than UPDATE are never collapsed. |
Parameter | Description |
---|---|
batchSizeBytes | The maximum batch size in bytes for the Kafka queue. |
bufferMemoryBytes | The memory allocated by the Producer in total for buffering. |
lingerMs | The amount of time that the Producer will wait to add to a batch. |
requestTimeout | The request timeout for the Producer. |
maxPollIntervalMs | The maximum delay between invocations of poll() when using Consumer group management. |
Configure MirrorMaker
MirroringUpdateRequestProcessorFactory
and the Consumer application.See the MirrorMaker documentation for configuration details.fusion-crossdc-producer
and fusion-crossdc-consumer
expose metrics that can be monitored.
fusion-crossdc-producer
module exposes the following metrics for each source replica in a collection, under the
Solr /metrics
API endpoint:
Metric name | Description |
---|---|
crossdc.producer.local | Counter representing the number of local documents processed successfully. |
crossdc.producer.submitted | Counter representing the number of documents submitted to the Kafka topic. |
crossdc.producer.documentSize | Histogram of the processed document size. |
crossdc.producer.errors.local | Counter representing the number of local documents processed with error. |
crossdc.producer.errors.submit | Counter representing the number of documents that were not submitted to the Kafka topic because of exception during execution. |
crossdc.producer.errors.documentTooLarge | Counter representing the number of documents that were too large to send to the Kafka topic. |
fusion-crossdc-consumer
application exposes the following metrics under its /metrics API endpoint, in JSON format with the
following hierarchical keys, where the <TYPE>
can be one of UPDATE
, ADMIN
, or CONFIGSET
:
Metric name | Description |
---|---|
counters.<TYPE>.input | Number of input messages retrieved from Kafka |
counters.<TYPE>.add | Number of input Add documents (one input message may contain multiple Add documents) |
counters.<TYPE>.dbi | Number of input Delete-By-Id commands (one input message may contain multiple DBI commands) |
counters.<TYPE>.dbq | Number of input Delete-By-Query commands (one input message may contain multiple DBQ commands) |
counters.<TYPE>.collapsed | Number of input requests that were added to other requests to minimize the number of requests sent to Solr |
counters.<TYPE>.handled | Total number of successfully processed output requests sent to Solr |
counters.<TYPE>.failed-resubmit | Number of requests resubmitted to the input queue for re-trying (on intermittent failures) |
counters.<TYPE>.failed-dlq | Number of requests submitted to the Dead-Letter queue due to failures on multiple re-tries |
counters.<TYPE>.failed-no-retry | Number of requests dropped due to persistent failures (including inability to send to DLQ) |
counters.<TYPE>.output-errors | Number of errors when sending requests to target Solr |
counters.<TYPE>.backoff | Number of times when the consumer had to back off from processing due to errors |
counters.<TYPE>.invalid-collection | Number of requests sent to an invalid (e.g. non-existent) collection |
Metric name | Description |
---|---|
timers.<TYPE>.outputLatency | Dropwizard Timer (meter + histogram) for latency between request creation timestamp and the output timestamp. This assumes that the clocks are synchronized between the Producer and Consumer. |
timers.<TYPE>.outputTime | Dropwizard Timer for time to send the processed request to the target Solr. The Consumer application also exposes a /threads API endpoint that returns a plain-text thread dump of the JVM running the Consumer application. |
message.max.bytes
.
collapseUpdates=partial
or collapseUpdates=all
.
*:*
which is always sent as-is).
In extreme cases this expansion may produce a request that is too large to be mirrored.
Delete-By-Query expansion helps to ensure the strict ordering of deletes and updates in the target Solr collection but it may also lead to divergence of the local and mirrored collections if the expansion fails or the resulting request is too large to mirror.
MirroringConfigSetsHandler
, then new configsets created on the source Solr are mirrored automatically to the target Solr.
If you are not using MirroringConfigSetsHandler
, then new configsets are not mirrored; you must use ConfigSync or create them manually on the target Solr to avoid an error when the target collection is created or deleted.
These errors also impact the performance of the Consumer application.
MirroringCollectionsHandler
, then all collection admin requests are mirrored.
This may not always be desirable if the target Solr cluster is expected to differ or is managed externally (such as by an autoscaling operator).
In this case, you should use FusionCollectionsHandler
instead, and configure the collectionActionsWhitelist
property to restrict the mirrored collection admin requests to only those that are needed.