SolrCloud Autoscaling Triggers

Triggers are used in autoscaling to watch for cluster events such as nodes joining, leaving, search rate or any other metric breaching a threshold.

In the future other cluster, node, and replica events that are important from the point of view of cluster performance will also have available triggers.

Trigger implementations verify the state of resources that they monitor. When they detect a change that merits attention they generate events, which are then queued and processed by configured TriggerAction implementations. This usually involves computing and executing a plan to manage the new cluster resources (e.g., move replicas). Solr provides predefined implementations of triggers for specific event types.

Triggers execute on the node that runs Overseer. They are scheduled to run periodically, at a default interval of 1 second between each execution (not every execution produces events).

Event Types

Currently the following event types (and corresponding trigger implementations) are defined:

  • nodeAdded: generated when a new node joins the cluster

  • nodeLost: generated when a node leaves the cluster

  • metric: generated when the configured metric crosses a configured lower or upper threshold value

  • indexSize: generated when a shard size (defined as index size in bytes or number of documents) exceeds upper or lower threshold values

  • searchRate: generated when the search rate exceeds configured upper or lower thresholds

  • scheduled: generated according to a scheduled time period such as every 24 hours, etc

Events are not necessarily generated immediately after the corresponding state change occurred - the maximum rate of events is controlled by the waitFor configuration parameter (see below).

The following properties are common to all event types:

id

(string) A unique time-based event id.

eventType

(string) The type of event.

source

(string) The name of the trigger that produced this event.

eventTime

(long) Unix time when the condition that caused this event occurred. For example, for a nodeAdded event this will be the time when the node was added and not when the event was actually generated, which may significantly differ due to the rate limits set by waitFor.

properties

(map, optional) Any additional properties. Currently includes e.g., nodeNames property that indicates the nodes that were lost or added.

Auto Add Replicas Trigger

When a collection has the parameter autoAddReplicas set to true then a trigger configuration named .auto_add_replicas is automatically created to watch for nodes going away. This trigger produces nodeLost events, which are then processed by configured actions (usually resulting in computing and executing a plan to add replicas on the live nodes to maintain the expected replication factor).

Refer to the section Autoscaling Automatically Adding Replicas to learn more about how the .autoAddReplicas trigger works.

This trigger supports one parameter, which is defined in the <solrcloud> section of solr.xml:

autoReplicaFailoverWaitAfterExpiration

The minimum time in milliseconds to wait for initiating replacement of a replica after first noticing it not being live. This is important to prevent false positives while stopping or starting the cluster. The default is 120000 (2 minutes). The value provided for this parameter is used as the value for the waitFor parameter in the .auto_add_replicas trigger.

Tip
See The <solrcloud> Element for more details about how to work with solr.xml.

Metric Trigger

The metric trigger can be used to monitor any metric exposed by the Metrics API. It supports lower and upper threshold configurations as well as optional filters to limit operation to specific collection, shards, and nodes.

This trigger supports the following configuration:

metric

(string, required) The metric property name to be watched in the format metric:group:prefix, e.g., metric:solr.node:CONTAINER.fs.coreRoot.usableSpace.

below

(double, optional) The lower threshold for the metric value. The trigger produces a metric breached event if the metric’s value falls below this value.

above

(double, optional) The upper threshold for the metric value. The trigger produces a metric breached event if the metric’s value crosses above this value.

collection

(string, optional) The collection used to limit the nodes on which the given metric is watched. When the metric is breached, trigger actions will limit operations to this collection only.

shard

(string, optional) The shard used to limit the nodes on which the given metric is watched. When the metric is breached, trigger actions will limit operations to this shard only.

node

(string, optional) The node on which the given metric is watched. Trigger actions will operate on this node only.

preferredOperation

(string, optional, defaults to MOVEREPLICA) The operation to be performed in response to an event generated by this trigger. By default, replicas will be moved from the hot node to others. The only other supported value is ADDREPLICA which adds more replicas if the metric is breached.

Example: a metric trigger that fires when total usable space on a node having replicas of "mycollection" falls below 100GB
{
  "set-trigger": {
    "name": "metric_trigger",
    "event": "metric",
    "waitFor": "5s",
    "metric": "metric:solr.node:CONTAINER.fs.coreRoot.usableSpace",
    "below": 107374182400,
    "collection": "mycollection"
  }
}

Index Size Trigger

This trigger can be used for monitoring the size of collection shards, measured either by the number of documents in a shard or the physical size of the shard’s index in bytes.

When either of the upper thresholds is exceeded the trigger will generate an event with a (configurable) requested operation to perform on the offending shards - by default this is a SPLITSHARD operation.

Similarly, when either of the lower thresholds is exceeded the trigger will generate an event with a (configurable) requested operation to perform on two of the smallest shards. By default this is a MERGESHARDS operation, and is currently ignored because that operation is not yet implemented (see SOLR-9407).

Additionally, monitoring can be restricted to a list of collections; by default all collections are monitored.

This trigger supports the following configuration parameters (all thresholds are exclusive):

aboveBytes

upper threshold in bytes. This value is compared to the INDEX.sizeInBytes metric.

belowBytes

lower threshold in bytes. Note that this value should be at least 2x smaller than aboveBytes

aboveDocs

upper threshold expressed as the number of documents. This value is compared with SEARCHER.searcher.numDocs metric.

Note
Due to the way Lucene indexes work, a shard may exceed the aboveBytes threshold even if the number of documents is relatively small, because replaced and deleted documents keep occupying disk space until they are actually removed during Lucene index merging.
belowDocs

lower threshold expressed as the number of documents.

aboveOp

operation to request when an upper threshold is exceeded. If not specified the default value is SPLITSHARD.

belowOp

operation to request when a lower threshold is exceeded. If not specified the default value is MERGESHARDS (but see the note above).

collections

comma-separated list of collection names that this trigger should monitor. If not specified or empty all collections are monitored.

Events generated by this trigger contain additional details about the shards that exceeded thresholds and the types of violations (upper / lower bounds, bytes / docs metrics).

Example:

This configuration specifies an index size trigger that monitors collections "test1" and "test2", with both bytes (1GB) and number of docs (1 million) upper limits, and a custom belowOp operation NONE (which still can be monitored and acted upon by an appropriate trigger listener):

{
 "set-trigger": {
  "name" : "index_size_trigger",
  "event" : "indexSize",
  "collections" : "test1,test2",
  "aboveBytes" : 1000000000,
  "aboveDocs" : 1000000000,
  "belowBytes" : 200000,
  "belowDocs" : 200000,
  "belopOp" : "NONE",
  "waitFor" : "1m",
  "enabled" : true,
  "actions" : [
   {
    "name" : "compute_plan",
    "class": "solr.ComputePlanAction"
   },
   {
    "name" : "execute_plan",
    "class": "solr.ExecutePlanAction"
   }
  ]
 }
}

Search Rate Trigger

The search rate trigger can be used for monitoring search rates in a selected collection (1-min average rate by default), and request that either replicas be moved from "hot nodes" to different nodes, or new replicas be added to "hot shards" to reduce the per-replica search rate for a collection or shard with hot spots.

Similarly, if the search rate falls below a threshold then the trigger may request that some replicas are deleted from "cold" shards. It can also optionally issue node-level action requests when a cumulative node-level rate falls below a threshold.

Note: this trigger calculates node-level cumulative rates using per-replica rates reported by replicas that are part of monitored collections / shards. This means that it may report some nodes as "cold" (underutilized) because it ignores other, perhaps more active, replicas belonging to other collections. Also, nodes that don’t host any of the monitored replicas or those that are explicitly excluded by node config property won’t be reported at all.

This trigger supports the following configuration:

collections

(string, optional) comma-separated list of collection names to monitor, or any collection if empty / not set.

shard

(string, optional) shard name within the collection (requires collections to be set to exactly one name), or any shard if empty.

node

(string, optional) node name to monitor, or any if empty.

metric

(string, optional) metric name that represents the search rate (default is QUERY./select.requestTimes:1minRate). This name has to identify a single numeric metric value, and it may use the colon syntax for selecting one property of a complex metric.

maxOps

(integer, optional) maximum number of add replica / delete replica operations requested in a single autoscaling event. The default value is 3 and it helps to smooth out the changes to the number of replicas during periods of large search rate fluctuations.

minReplicas

(integer, optional) minimum acceptable number of searchable replicas (i.e., replicas other than PULL type). The trigger will not generate any DELETEREPLICA requests when the number of searchable replicas in a shard reaches this threshold. When this value is not set (the default) the replicationFactor property of the collection is used, and if that property is not set then the value is set to 1. Note also that shard leaders are never deleted.

aboveRate

(float) the upper bound for the request rate metric value. At least one of aboveRate or belowRate must be set.

belowRate

(float) the lower bound for the request rate metric value. At least one of aboveRate or belowRate must be set.

aboveOp

(string, optional) collection action to request when the upper threshold for a shard or replica is exceeded. Default action is ADDREPLICA and the trigger will request from 1 up to maxOps operations per shard per event, proportionally to how much the rate is exceeded. This property can be set to 'NONE' to effectively disable the action but still report it to the listeners.

aboveNodeOp

(string, optional) collection action to request when the upper threshold for a node is exceeded. Default action is MOVEREPLICA, and the trigger will request 1 replica operation per hot node per event. If both aboveOp and aboveNodeOp operations are requested then aboveNodeOp operations are always requested first. This property can be set to 'NONE' to effectively disable the action but still report it to the listeners.

belowOp

(string, optional) collection action to request when the lower threshold for a shard or replica is exceeded. Default action is DELETEREPLICA, and the trigger will request at most maxOps replicas to be deleted from eligible cold shards. This property can be set to 'NONE' to effectively disable the action but still report it to the listeners.

belowNodeOp

action to request when the lower threshold for a node is exceeded. Default action is null (not set) and the condition is ignored, because in many cases the trigger will monitor only some selected resources (replicas from selected collections / shards) so setting this by default to e.g., DELETENODE could interfere with these non-monitored resources. The trigger will request 1 operation per cold node per event. If both belowOp and belowNodeOp operations are requested then belowOp operations are always requested first.

Example:

A search rate trigger that monitors collection "test" and adds new replicas if 5-minute average request rate of "/select" handler exceeds 100 requests/sec, and the condition persists for over 20 minutes. If the rate falls below 0.01 and persists for 20 min the trigger will request not only replica deletions (leaving at most 1 replica per shard) but also it may request node deletion.

{
 "set-trigger": {
  "name" : "search_rate_trigger",
  "event" : "searchRate",
  "collections" : "test",
  "metric" : "QUERY./select.requestTimes:5minRate",
  "aboveRate" : 100.0,
  "belowRate" : 0.01,
  "belowNodeOp" : "DELETENODE",
  "minReplicas" : 1,
  "waitFor" : "20m",
  "enabled" : true,
  "actions" : [
   {
    "name" : "compute_plan",
    "class": "solr.ComputePlanAction"
   },
   {
    "name" : "execute_plan",
    "class": "solr.ExecutePlanAction"
   }
  ]
 }
}

Scheduled Trigger

The Scheduled trigger generates events according to a fixed rate schedule.

The trigger supports the following configuration:

startTime

(string, required) The start date/time of the schedule. This should either be a DateMath string e.g., 'NOW', or be an ISO-8601 date time string (the same standard used during search and indexing in Solr, which defaults to UTC), or be specified without the trailing 'Z' accompanied with the timeZone parameter. For example, each of the following values are acceptable:

  • 2018-01-31T15:30:00Z: ISO-8601 date time string. The trailing Z signals that the time is in UTC

  • NOW+5MINUTES: Solr’s date math string

  • 2018-01-31T15:30:00: No trailing 'Z' signals that the timeZone parameter must be specified to avoid ambiguity

every

(string, required) A positive Solr date math string which is added to the startTime or the last run time to arrive at the next scheduled time.

graceTime

(string, optional) A positive Solr date math string. This is the additional grace time over the scheduled time within which the trigger is allowed to generate an event.

timeZone

(string, optional) A time zone string which is used for calculating the scheduled times.

preferredOp

(string, optional, defaults to MOVEREPLICA) The preferred operation to perform in response to an event generated by this trigger. The only supported values are MOVEREPLICA or ADDREPLICA.

This trigger applies the every date math expression on the startTime or the last event time to derive the next scheduled time and if current time is greater than next scheduled time but within graceTime then an event is generated.

Apart from the common event properties described in the Event Types section, the trigger adds an additional actualEventTime event property which has the actual event time as opposed to the scheduled time.

For example, if the scheduled time was 2018-01-31T15:30:00Z and grace time was +15MINUTES then an event may be fired at 2018-01-31T15:45:00Z. Such an event will have eventTime as 2018-01-31T15:30:00Z, the scheduled time, but the actualEventTime property will have a value of 2018-01-31T15:45:00Z, the actual time.

Trigger Configuration

Trigger configurations are managed using the Autoscaling Write API and the commands set-trigger, remove-trigger, suspend-trigger, and resume-trigger.

Trigger configuration consists of the following properties:

name

(string, required) A unique trigger configuration name.

event

(string, required) One of the predefined event types (nodeAdded or nodeLost).

actions

(list of action configs, optional) An ordered list of actions to execute when event is fired.

waitFor

(string, optional) The time to wait between generating new events, as an integer number immediately followed by unit symbol, one of s (seconds), m (minutes), or h (hours). Default is 0s. A condition must persist at least for the waitFor period to generate an event.

enabled

(boolean, optional) When true the trigger is enabled. Default is true.

Additional implementation-specific properties may be provided.

Action configuration consists of the following properties:

name

(string, required) A unique name of the action configuration.

class

(string, required) The action implementation class.

Additional implementation-specific properties may be provided

If the actions configuration is omitted, then by default, the ComputePlanAction and the ExecutePlanAction are automatically added to the trigger configuration.

Example: adding or updating a trigger for nodeAdded events
{
 "set-trigger": {
  "name" : "node_added_trigger",
  "event" : "nodeAdded",
  "waitFor" : "1s",
  "enabled" : true,
  "actions" : [
   {
    "name" : "compute_plan",
    "class": "solr.ComputePlanAction"
   },
   {
    "name" : "custom_action",
    "class": "com.example.CustomAction"
   },
   {
    "name" : "execute_plan",
    "class": "solr.ExecutePlanAction"
   }
  ]
 }
}

This trigger configuration will compute and execute a plan to allocate the resources available on the new node. A custom action is also used to possibly modify the plan.