JSON Parsing Index Stage

A JSON Parsing Index stage (previously called the JSON Parser stage) parses JSON content from a document field into one or more new documents.

This stage uses Solr’s JsonRecordReader to create an index stage capable of splitting JSON into sub-documents. For details on the use of this stage in Solr, see this Lucidworks blog post: Indexing Custom JSON Data.

Example Specification, Data, Results

Stage Specification

{ "type": "json-parsing",
  "skip": false,
  "id": "json-parsing",
  "sourceField": "data",
  "splitPath": "/exams",
  "mappingRules": [
      {"path": "/first", "field": "first"},
      {"path": "/last", "field": "last"},
      {"path": "/grade", "field": "grade"},
      {"path": "/exams/subject", "field": "subject"},
      {"path": "/exams/test", "field": "test"},
      {"path": "/exams/marks", "field": "marks"}
  ]
}

Data

{
  "first": "John3",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}

Results

Parsing this data, using the splitPath "/exams" and the six mapping rules above, produces two documents, one for each object in the list of exams.

The first document has the following field, value pairs:

* first : John
* last : Doe
* grade : 8
* test : term1
* subject: Maths
* marks : 90

The second has the following field, value pairs:

* first : John
* last : Doe
* grade : 8
* test : term1
* subject: Biology
* marks : 86

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.