Product Selector

Fusion 5.12
    Fusion 5.12

    Solr Partial Update IndexerIndex pipeline stage configuration specifications

    The Solr Partial Update Indexer Stage updates of one or more fields of an existing Solr document in a collection managed by Managed Fusion. It provides an alternative to the Solr Indexer stage.

    When a data feed consists of an ongoing flow of messages about known documents in a collection, such as item price, inventory counts, or weather conditions at a location, this stage provides fast indexing throughput and can be configured to enforce data atomicity to guarantee that the index always reflects the most recent update.

    This stage is configured with a set of update directives based on Solr’s atomic updates. At run time, it creates a Solr update by applying these directive to the data from a Managed Fusion PipelineDocument object and then submits this update to Solr’s update handler.

    Solr’s atomic update functionality requires that the schema for a collection is configured so that all fields have the attribute stored="true", excepting fields which are <copyField/> destinations which must be configured as stored="false".

    Example Stage Specification

    Configuration for a Partial Updater Stage in JSON:

    { "type" : "solr-partial-update-index",
      "enforceSchema" : false,
      "solrDocIdFieldName" : "id",
      "solrDocIdFieldValue" : "<doc.id>",
      "updatedFields" : [
        { "updateType" : "set", "fieldName" : "statusValue", "values" : "<doc.statusValue>" },
        { "updateType" : "set", "fieldName" : "lastCommunicationTime", "values" : "<doc.lastCommunicationTime>" }
      ],
      "concurrencyControlEnabled" : true,
      "skip" : false,
      "label" : "solr-partial-update-index",
      }

    The expression <doc.X> will evaluate to the contents of the current PipelineDocument’s field named "X".

    Types of Update Operations

    The set of update operations are based on operations supported by Solr. They are:

    • 'add' - add a new value or values to an existing Solr document field, or add a new field and value(s).

    • 'set' - change the value or values in an existing Solr document field.

    • 'remove' - remove all occurrences of the value or values from an existing Solr document field.

    • 'removeregex' - remove all occurrences of the values which match the regex or list of regexes from an existing Solr document field.

    • 'increment' - increment the the numeric value of existing Solr document field by a specific amount.

    • 'decrement' - decrement the the numeric value of existing Solr document field by a specific amount.

    In addition, this stage introduces experimental "Positional" operations which can be used to add, set or remove exactly one element of a field which takes a list of values (i.e, a multi-valued field).

    • 'positionalUpdates' - used to add or set the value at specific position.

    • 'positionalRemoves' - used to delete an element at a specific position.

    When a collection contains two or more multi-value fields which are maintained in parallel so that taken together, they act like a table stored column by column, a positional update operation updates several data cells across one row of the table. To maintain this kind of column-oriented table, the positional delete directive must specify all the fields in the document which logically comprise the table.

    Document Identifier Field

    A Managed Fusion collection is a Solr collection managed by Managed Fusion. Underlyingly, a Solr document is a list of named, typed fields. The Solr unique key field stores a string which is the unique identifier for that document. There is at most one UniqueKey field per document, which is defined in the Solr schema. The UniqueKey field value is required. For collections created via Managed Fusion, the UniqueKey field is named "id". Other document fields may also store string values which can be used as a unique identifier.

    Solr uses the UniqueKey field to find the document to be updated. If the data feed information contains a document identifier which is different than the identifier value stored in the UniqueKey field, then this stage must do a Solr lookup to find the UniqueKey value.

    Optimistic Concurrency

    Solr’s Optimistic Concurrency is a mechanism which checks whether or not a document has changed between the point at which an update request was submitted and the point at which the request is processed. Solr documents have an internal field named "_version_" which is updated whenever there is any change made to any of the other fields in that document. When optimistic concurrency control is on, update requests will be discarded if the current version of the document has changed since that request was made. This guarantees that the document will always reflect the most recent update. However, this require an additional Solr lookup to get the current document version number, which is submitted as part of the update request.

    Performance Considerations

    In order to send a single update request to Solr, without preliminary lookup requests:

    • The document identifier field should match the Solr collection’s UniqueKey identifier field.

    • Optimistic Concurrency should be turned off.

    • Positional updates are experimental and potentially expensive, since all the values for all fields being updated must be fetched into memory in order to perform positional operations.

    Solr Date Formats

    "yyyy-MM-dd'T'HH:mm:ss'Z'", // Solr format without milliseconds
    "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", // standard Solr format, with literal "Z" at the end
    "yyyy-MM-dd'T'HH:mm:ss.SS'Z'", // standard Solr format, with literal "Z" at the end
    "yyyy-MM-dd'T'HH:mm:ss.S'Z'" // standard Solr format, with literal "Z" at the end

    Configuration

    When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

    This stage indexes partial (atomic) updates to Solr documents

    skip - boolean

    Set to true to skip this stage.

    Default: false

    label - string

    A unique label for this stage.

    <= 255 characters

    condition - string

    Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.

    enforceSchema - boolean

    Default: true

    concurrencyControlEnabled - boolean

    Select to enable Optimistic Concurrency Control in Solr, guaranteeing that the document update will not be overridden by another Partial Update to the same document. If disabled, in the case of an edit collision, the last committed update to the document will win.

    Default: true

    rejectUpdatesIfDocNotPresent - boolean

    Whether to reject the update attempt if the document with given id is not present in Solr. This is not typical situation since the updates usually are performed on existing documents, however you may disable this to attempt update even if the document is not present. If the concurrency control is disabled, enabling this flag will force set the _version_ field to 1, or to 0 otherwise.

    Default: true

    updateAllDocFields - boolean

    If this option is set, the Partial Update Stage will process pipeline document fields even if they are not set by Updates and Deletions instructions here. In this case those fields will be included into the partial update document and will be processed by Solr according to atomic update rules, i.e. non-map field value(s) will be treated as a 'set' update for the field, and Map field values will be processed as an atomic update defined in the Map. The Map structure should comply to Solr atomic update rules. Note that the Partial Update stage does NOT validate consistency of fields that are not Updates or Deletions configured here, it just sends them to Solr 'as is'. Requires at least one entry in updateFields.

    Default: false

    solrDocIdFieldValue - stringrequired

    Default: <doc.id>

    dateFormats - array[string]

    params - array[object]

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    updatedFields - array[object]

    Fields to update (set, add or remove field values) in the Solr Document.

    object attributes:{updateType : {
     display name: Update Type
     type: string
    }
    fieldName required : {
     display name: Field Name
     type: string
    }
    values required : {
     display name: Value
     type: string
    }
    }

    deletedFields - array[object]

    Fields to Delete from Solr Document.

    object attributes:{fields required : {
     display name: Field
     type: string
    }
    }

    positionalUpdates - array[object]

    Update Field or Group of Fields to update (add or set) value at a specific position. See documentation for additional information.

    object attributes:{positionalUpdateType : {
     display name: Update Type
     type: string
    }
    position required : {
     display name: Position
     type: string
    }
    fieldsAndValues required : {
     display name: Fields and Values
     type: string
    }
    }

    positionalRemovals - array[object]

    Update Field or Group of Fields to remove value at a specific position. See documentation for additional information.

    object attributes:{position required : {
     display name: Position
     type: string
    }
    fields required : {
     display name: Fields List
     type: string
    }
    }

    customRouteFieldName - string

    This option is used when custom shard routing is configured is Solr so the document route is defined by value of Solr document's field (defined as 'router.field' when created the collection). If set here, the field with this name will be transferred to the partial update Solr document from the pipeline document.

    allowReservedFields - boolean

    When enabled, the Partial Update Stage will process pipeline document reserved fields even if they are not set in the stage configuration

    Default: false