Elasticsearch Logstash Kibana

September 24, 2016

What ELK is for?

When you have eventually produced your awesome code and deployed it on your favorite server, you can celebrate! Though, you have to keep in mind that your work is not completly done. Now your website is up and running, you have to maintain it and make it available and avoid downtime. Depending on your errors forecast, evaluation of traffic or bugs you leave, you will probably have to fix few things. What are the best practices to avoid pitfalls?

To make the right decision, you have to know what is happening under the hood. For a car, there are various indicators on the dashboard: battery charging alert, transmission temperature, oil pressure warning, door ajar. And when you get your car fixed, a professional may have another scan tool which pinpoints exactly where the malfunction is. This is the same thing for a website or server, you need to know exactly where errors, slowdowns and intrusions occur. If you do not know where and when your server crashes, you will probably have to search exhaustively in your code to find a bug and this is not what you want. A pretty neat stack dealing very well with that is the ELK, which stands for Elasticsearch, Logstash, Kibana.


Logstash

logstash

This is where your input should enter first. Logstash is a data collection engine, it can deal with all kind of input. While collecting, data can be filtered, parsed and forwarded. It's very useful for dealing with logs from nginx, syslog etc... Logstash can be seen as a pipeline processing system, equivalent to:

tail -f foo.log | grep bar | awk '{print $2 $5}' >> foo-pretty.log
  • tail - read continously the logs
  • grep - filter the incoming data
  • awk - select only relevant information
  • >> - redirect the output stream into another file

Obviously, Logstash let you establish more advanced and complex rules than that. Thus, it improves the research by associating each part of the input with the appropriate semantic. Behind the scene, this is the work of grok.

Grok

to understand intuitively or by empathy, to establish rapport with

Robert A. Heinlein

Grok is a tool to match a string to fields. It lets you build or use existing sets of named regular expressions and then helps you use them to match strings.

The purpose is to bring more semantics to an input. In other words, give a meaning to a string. Log and event data can be turned into structured data.

For example, if you look at this log line:

Nov 21 17:54:13 scorn kernel: pid 541 (expect), uid 3020: exited on signal 3

As a human, you can read a timestamp, a hostname, a process, a number, a program name, a uid, and an exit message. You might represent this in words as:

TIMESTAMP HOST PROGRAM: pid NUMBER (PROGRAM), uid NUMBER: exited on signal NUMBER

The syntax for a grok pattern is %{SYNTAX:semantic}. A pattern can store the matched value in a new field. Specify the field name match patter:

%{SYSLOGBASE} pid %{NUMBER:pid} \(%{WORD:program}\), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}

Suppose you have as input:

[2016-06-12 18:26:51.247] [TRACE] My message

The corresponding match pattern is:

\[%{TIMESTAMP_ISO8601:logdate}\] \[%{LOGLEVEL:level}\] %{GREEDYDATA:message}

Each string in %{} is evaluated and replaced with the regular expression it represents.

Here are some predefined patters.

String Pattern
Nov 21 17:54:13 %{SYSLOGBASE}
541 %{NUMBER}
2016-06-12 18:26:51.247 %{TIMESTAMP_ISO8601}
randomword %{WORD}
ALERT, TRACE %{LOGLEVEL}
ca328c09-f6d5-4d99-8e30-28be8c18c1b1 %{UUID}
Thursday, Thu %{DAY}
43:5b:39:c9:3f:cd %{MAC}
Error on line 10: undefined variable c %{GREEDYDATA}

You can find the list of all classic patterns here.

For testing purpose, try http://grokdebug.herokuapp.com/.


Once we have filtered and parsed our input stream, we have to store all the data where it's simple to make fast search with some criteria. This is where elasticsearch comes on stage.


Elasticsearch

Elasticsearch is an open-source full-text search and analytics engine. With a RESTful API, it allows you to store, search and analyse big volumes of data. It's more than a simple text search, it must be an enriched, intelligent, fast. In our case, we want to have the possibility to find quickly some kind of errors or message in the logs.

elasticsearch

Notice that Elasticsearch is a near real time (NRT) search platform. What this means is there is a slight latency, normally one second, from the time you index a document until the time it becomes searchable. Elasticsearch is not a primary datastore, it does not support ACID transactions. It's most than acceptable for our purpose.

If you want to know more about it, please refere to the official documentation.


Kibana

Kibana is a browser-based analytics and search interface for Elasticsearch. It offers the possibilty to create custom dashboard with pie charts, bar charts, line charts and scatter plots. It provides a powerfull and beautiful data visualization for all your event-based data. It's easy to filter, customize and look for a specific event.


kibana 1


Let's start

The server

Firstly, we'll setup a server with restify and the bunyan logger. It supports several type of streams: you can redirect to log file, send over TCP, UDP or directly to the stdout. For our tests, we will use the UDP protocol on port 5005 and stdout.

index.js

'use strict';

const restify = require('restify');
const bunyan = require('bunyan');
const log = bunyan.createLogger({
  name: 'website1',
  streams: [
    {
      stream: process.stdout // log into the console
    },
    {
      type: 'raw',
      stream: require('bunyan-logstash').createStream({
        host: '127.0.0.1',
        port: 5005
      })
    }
  ]
});

function welcome(req, res, next) {
  log.info('This is an information')
  error;
}

const server = restify.createServer();
server.get('/', welcome);

server.on('uncaughtException', function (req, res, route, error) {
  log.error(error, 'Error message');
  res.send('An error occured');
});

server.listen(8080, function () {
  console.log('%s listening at %s', server.name, server.url);
});

Don't forget the install all the necessary packages and launch the server.

npm i --save restify bunyan bunyan-logstash
node index.js

The Stack

Now the server has been configured, let's install the elk stack. In order to make the installation smothier, we will use the docker engine. Clone the elk stack for docker from deviantony:

git clone git@github.com:deviantony/docker-elk.git

To run, you will need docker compose. On ubunutu:

sudo apt-get install docker-compose

This repository contains the three docker images: Elasticsearch, Logstash and Kibana. Alternatively, you can install manually these three.

I have to specify the version of logstash in the Dockerfile because of compatibility problems. FROM logstash:2.4.0

In order to accept udp protocol on port 5005 you have to specify in the configuration file.

docker-compose.yml

elasticsearch:
  image: elasticsearch:latest
  command: elasticsearch -Des.network.host=0.0.0.0
  ports:
    - "9200:9200"
    - "9300:9300"
logstash:
  image: logstash:latest
  command: logstash -f /etc/logstash/conf.d/logstash.conf
  volumes:
    - ./logstash/config:/etc/logstash/conf.d
  ports:
    - "5000:5000"
    - "5005:5005/udp" # <- here
  links:
    - elasticsearch
kibana:
  build: kibana/
  volumes:
    - ./kibana/config/:/opt/kibana/config/
  ports:
    - "5601:5601"
  links:
    - elasticsearch

Logstash configuration

The configuration file is organized in three part, the input, the filters and the output. For the input, we add udp on port 5005 and increase the buffer_size for larger messages. Then, we make logstash ingest the input to add three more fields:

  • name - the application name
  • msg - the actual log message
  • level - the error level

Finally, the output is redirected to the console and to elasticsearch on port 9200.

logstash/configs/logstash.conf

input {
  tcp {
    port => 5000
  }
  udp {
    port => 5005
    buffer_size => 8192
  }
}

filter {
  json {
    source => "message"
    target => "json"
    add_field => { "name" => "%{[json][name]}" }
    add_field => { "msg" => "%{[json][message]}" }
    add_field => { "level" => "%{[json][level]}" }
  }
}

output {
  stdout { }
  elasticsearch {
    hosts => "elasticsearch:9200"
  }
}

Start dockers:

docker-compose up -d

Check that your dockers are running:

$ docker ps -a
CONTAINER ID   IMAGE                 COMMAND                  CREATED          STATUS         PORTS                                            NAMES
8b3e108d2606   logstash:latest       "/docker-entrypoint.s"   15 seconds ago   Up 14 seconds  0.0.0.0:5000->5000/tcp, 0.0.0.0:5005->5005/udp   dockerelk_logstash_1
df6d5c5676c9   dockerelk_kibana      "/docker-entrypoint.s"   15 seconds ago   Up 14 seconds  0.0.0.0:5601->5601/tcp                           dockerelk_kibana_1
6d9015a4109f   elasticsearch:latest  "/docker-entrypoint.s"   15 seconds ago   Up 14 seconds  0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   dockerelk_elasticsearch_1

You can try to send your first udp message to the stack by redirecting a json to /dev/udp/localhost/5005:

echo -n "{\
  \"@timestamp\":\"$(date +"%Y-%m-%dT%T.%3NZ")\",\
  \"message\":\"The message\",\
  \"tags\":[\"bunyan\"],\
  \"source\":\"console\",\
  \"level\":\"info\",\
  \"name\":\"back-to-the-basics\",\
  \"hostname\":\"my-machine\",\
  \"pid\":20784\
}" > /dev/udp/localhost/5005
  • -n - Do not print the trailing newline character
  • %3N - Nanoseconds rounded to the first 3 digits

You can now create new logs by going on localhost:8080.


Elasticsearch

We won't dive too much into elasticsearch because all actions can be performed through Kibana.

After creating the index in Kibana, you can query elasticsearch for the configuration:

curl -XGET 'http://localhost:9200/logstash-2016.09.24'

If you want to make a research directly with the api, here is an example:

curl -XGET 'http://localhost:9200/logstash-2016.09.24/_search?q=level:error' | json

List of indexes

curl 'localhost:9200/_cat/indices?v'

Kibana

On Kibana, you need to establish the first index. You can choose the index containing time-based events. Hence, you should normally see three messages:


screen1


From here on out, you can save any custom filter you like. For instance, create a filter for level=error or name=website1 and save it with the "save search" button.



Conclusion

With such a setup, it will be very easy to skim through all your logs for a specific website, type of error or even by status code. The common pitfall is to think it will be easy to look for an error without this kind of search engine, above all when you have plenty of applications.



References:

  1. Logstash - elastic.co
  2. Elasticsearch Reference
  3. Docker ELK stack - github.com
  4. Logstash TCP stream for Bunyan - github.com
  5. ELK: Elasticsearch, Logstash and Kibana at Wikimedia - youtube.com
  6. Wikimedia kibana - logstash-beta.wmflabs.org
  7. Elasticsearch - github.com
  8. Grok debugger - grokdebug.herokuapp.com
  9. Basic concepts - elastic.co
  10. Create a Grok Pattern - grok.nflabs.com
  11. Oniguruma - github.com
  12. Error Handling in Node.js - joyent.com