Index Stages API

The Index Stages API provides endpoints to:

  • manage query stage instances.

  • list query stage configuration properties

  • test processing on a set of queries

An index pipeline is comprised of index stages. Each index stage has a name and a type. The name identifies the stage instance, and the type identifies its class. Every stage type has a number of properties, which can be configured for a particular index stage instance. See the section Index Pipeline Stages for a taxonomy of index stage types.

List Index Stages or Properties of an Index Stage Type

The path for this request is one of:

api/apollo/index-stages/schema

api/apollo/index-stages/schema/<type>

When no type name is specified, a GET request returns a listing of all configuration properties for all index pipeline stages. When a type name is specified, a GET request returns the properties for that type.

This provides a type template, listing the name and type information of all properties, if they are required, and what the default value is (if any).

List an Index Stage

A GET request to endpoints:

api/apollo/index-stages/instances/ api/apollo/index-stages/instances/<stageId>

If the pipeline stage instance is specified, it returns the properties for that stage, else it returns the currently configured properties for all index pipeline stage instances.

If no index stage instances have been created, this request returns the empty list "[ ]".

Create, Update or Delete an Index Stage

The path for this request is:

api/apollo/index-stages/instances/<stageId>

where <stageId> is the name of an index stage instance.

  • POST - create a new stage. Returns a listing of the specified properties. No <stageId> path parameter is necessary; the information in the POST body to determine the stageId.

  • PUT - update an existing stage. Returns a listing of the specified properties.

  • DELETE - remove an index pipeline stage.

Send a Test Document through an Index Stage

A POST request with a payload containing a list of PipelineDocument objects to the endpoint:

api/apollo/index-stages/instances/<stageId>/<collectionName>/test

will send all documents through the index stage and returns a list of the resulting PipelineDocuments after processing.

Examples

View the configuration properties for index stage type "regex-extractor":

REQUEST

curl -u user:pass http://localhost:8764/api/apollo/index-stages/schema/regex-extractor

RESPONSE

{
  "type" : "object",
  "title" : "Regex Field Extraction",
  "description" : "This stage allows you to extract entities using regular expressions",
  "properties" : {
    "rules" : {
      "type" : "array",
      "title" : "Regex Rules",
      "items" : {
        "type" : "object",
        "required" : [ "pattern" ],
        "properties" : {
          "source" : {
            "type" : "array",
            "title" : "Source Fields",
            "items" : {
              "type" : "string"
            }
          },
          "target" : {
            "type" : "string",
            "title" : "Target Field"
          },
          "pattern" : {
            "type" : "string",
            "title" : "Regex Pattern",
            "format" : "regex"
          },
          "annotateAs" : {
            "type" : "string",
            "title" : "Annotation Name"
          }
        }
      }
    }
  }
}

See all defined index pipeline stages, regardless of type:

REQUEST

curl -u user:pass http://localhost:8764/api/apollo/index-stages/instances

RESPONSE

[{
  "type" : "tika-parser",
  "id" : "conn_tika",
  "includeImages" : true,
  "flattenCompound" : false,
  "addFailedDocs" : true,
  "addOriginalContent" : true,
  "skip" : false
},
{
  "type" : "index-logging",
  "id" : "detailed-logging",
  "detailed" : true,
  "skip" : false,
  "label" : "detailed-index-logging",
}]

See details of an index-stage named 'conn_tika':

REQUEST

curl -u user:pass http://localhost:8764/api/apollo/index-stages/instances/conn_tika

RESPONSE

{
  "type" : "tika-parser",
  "id" : "conn_tika",
  "includeImages" : true,
  "flattenCompound" : false,
  "addFailedDocs" : true,
  "addOriginalContent" : true,
  "skip" : false
}

Create a an index stage:

REQUEST

curl -u user:pass -X POST -H 'Content-type: application/json' -d '{"id": "storagesize-regex-extractor", "type":"regex-extractor", "rules": [{"source":["name"], "target":"storage_size_ss", "pattern":"(\\d{1,20}\\s{0,3}(GB|MB|TB|KB|mb|gb|tb|kb))", "annotateAs":"storage_size"}]}' http://localhost:8764/api/apollo/index-stages/instances

RESPONSE

{
  "type" : "regex-extractor",
  "id" : "storagesize-regex-extractor",
  "rules" : [ {
    "source" : [ "name" ],
    "target" : "storage_size_ss",
    "pattern" : "(\\d{1,20}\\s{0,3}(GB|MB|TB|KB|mb|gb|tb|kb))",
    "annotateAs" : "storage_size"
  } ],
  "skip" : false
}

Delete an index stage:

REQUEST

curl -u user:pass -X DELETE http://localhost:8764/api/apollo/index-stages/instances/storagesize-regex-extractor

No response is returned. To check that the stage is no longer defined, list all index stage instances.

Send a document through the index stage named 'conn_tika':

REQUEST

curl -u user:pass -X POST -H "Content-Type: application/json" -d '[{"id": "myDoc4","fields": [{"name":"title", "value": "Another little document document"},{"name":"body", "value": "This is a simple document."}]}]' http://localhost:8764/api/apollo/index-stages/instances/conn_tika/docs/test

RESPONSE

[ {
  "id" : "7b8a1d5b-9e42-40eb-8059-5804c4b4fc6b",
  "fields" : [ {
    "name" : "id",
    "value" : "myDoc4",
    "metadata" : { },
    "annotations" : [ ]
  }, {
    "name" : "parsing_time",
    "value" : [ "java.lang.Long", 0 ],
    "metadata" : { },
    "annotations" : [ ]
  }, {
    "name" : "parsing",
    "value" : "no_raw_data",
    "metadata" : {
      "creator" : "tika-parser"
    },
    "annotations" : [ ]
  }, {
    "name" : "fields",
    "value" : [ "java.util.ArrayList", [ {
      "name" : "title",
      "value" : "Another little document document"
    }, {
      "name" : "body",
      "value" : "This is a simple document."
    } ] ],
    "metadata" : { },
    "annotations" : [ ]
  } ],
  "metadata" : { },
  "commands" : [ ]
} ]