Configure Field Mapping for Couchbase

The Couchbase V1 connector uses the Cross-Datacenter Replication (XDCR) feature of Couchbase to retrieve data stored in Couchbase continuously in real-time.

The Couchbase connector has built-in field mapping which allows mapping Couchbase fields to fields in your schema. The mapping configuration defines a field from your schema and an XPath-style path to the field in the Couchbase JSON document.

The field mapping can accept wildcards and double-wildcards to map fields automatically. Wildcards can be used, but only at the end of the path definition.

  • field_name="" and field_path=/docs/* - maps all the fields under docs to the same name as given in JSON.

  • field_name="" and field_path=/docs/** - maps all the fields under docs and their children fields to the same name as given in JSON.

  • field_name=searchField and field_path=/docs/* - maps all the fields under /docs to a single field named 'searchField'.

  • field_name=searchField and field_path=/docs/** - maps all the fields under /docs and their children fields to a single field named 'searchField'.

If mapping is not defined, a default mapping will be assigned, in the format of the second example above, i.e., field_name="" and field_path=/docs/**.

This example shows some simple field mapping, using a single document such as this:

{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks": 90 },
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks": 86 }
      ]
}

When we configure the datasource, we can define our field mapping as follows:

"field_mapping": [
{
    "field_name":"points_i",
    "field_path":"/exams/marks"
},
{
    "field_name":"",
    "field_path":"/**"
}
]

Two mappings are defined. The first will map the /exams/marks field from Couchbase to the points_i field in Solr. The second maps all top-level and child fields from Couchbase to either the same field name in Solr or to a dynamic field rule.

After retrieving the document, it should look like this:

{
  "first_s": "John",
  "last_s": "Doe",
  "grade_i": 8,
  "exams": [
      {
        "subject_s": "Maths",
        "test_s"   : "term1",
        "points_i":90},
        {
         "subject_s": "Biology",
         "test_s"   : "term1",
         "points_i":86}
      ]
}

The marks field from the original document has been mapped to the points_i field; most of the other fields have been mapped to appropriate dynamic field rules.

Note that the representation of the document above is after it has been retrieved from Couchbase, but before it has been processed by the index pipelines. Since the index pipelines contain several stage types that can further transform the document, such as the Apache Tika Parser stage and the Field Mapping stage, the document that ends up indexed to Solr may be different from the document representation above. Some small iterations of crawling are recommended to be sure the documents are indexed as required.