Logstash Connector and Datasource Configuration

Note
This connector is deprecated in Fusion 2.4, and removed in Fusion 3.0.

Logstash is an open-source log management tool which takes inputs from one or more logfiles and parses and filters them according to a set of configurations and produces as output a stream of JSON objects. The Logstash connector uses Logstash 1.4.2 to send documents to a Fusion pipeline.

The Fusion archive includes a Logstash deployment, located in the directory fusion/3.1.x/apps/connectors/resources/lucid.logstash/logstash­1.4.2. This deployment includes a custom ruby class lucidworks_pipeline_output.rb which collects Logstash outputs and sends them to a Fusion pipeline.

Configuration

The connector is "lucid.logstash" and the plugin type is "logstash".

The connector takes a required property "Logstash Configuration" which is the Logstash configuration in JavaScript. The Fusion UI Admin Tool provides a JavaScript-aware input box which so that you can create and edit your Logstash configuration directly from Fusion.

The Logstash configuration script has three clauses:

  • input

  • filter

  • output

The connector takes an optional property "buffer_size" which is the number of documents to buffer. When the buffer limit is reached, all documents in the buffer are sent to Solr for batch indexing. The default buffer size is 10. In the Fusion UI Admin Tool, choosing the "Advanced" option on the Datasource configuration panel exposes an input box labeled "Buffer size".

Running a Datasource Job

Once started, a Logstash connector will continue to run indefinitely. The Fusion UI Admin Tool Datasource panel provides controls to start, stop, abort, and clear the datasource. For file-based inputs, Logstash tracks last line read in a file in a "since_db" file. Clearing the datasource requires removing these files.

Examples

The Logstash configuration is specified in JavaScript. The specification consists of the following three clauses:

  • input - specifies how to listen for incoming data

  • filter - data-specific Logstash filter definitions.

  • output - additional outputs, as needed.

All three clauses must be present, but the filter and output clauses can be empty.

Simple Logstash Configuration for reading an input file:

input {
    file {
        path => "/var/log/mylogfile.log"
    }
}
filter {
}
output {
}

Simple Logstash Configuration for listening for incoming data on a specific port:

input {
    tcp {
        port => 10234
    }
}
filter {
}
output {
}

Filter clause for pure JSON data:

filter {
    json {source => "message" }
}

Tutorial

The Lucidworks blog post Data Analytics Using Fusion and Logstash walks through the process of developing a Logstash script and configuring and running a datasource.