My software development journey

Elastic stack setup for Docker

December 28, 2019

Configuring Elastic stack to monitor Docker containers

In this post, I will focus on setting up and configuring Elastic stack as well as considerations of how I configured it. For overview of what Elastic stack is, take a look in my previous posts in this series.

Installing Elastic stack in Docker

Since I already had a Docker environment available, I was also looking for a solution that could be deployed there. I ended up using Docker-elk as a starting template for setting up an Elastic stack in Docker. The project is well documented and makes installation and initial setup of Elastic stack rather easy.

Getting data to Logstash

After installation, the first hurdle I faced was how to get the container logs into Elasticsearch. My first attempt was to configure the Docker GELF logging driver that would write to Logstash. At that time my application was still writing text to console, which meant that a multiline log entry would be seen in Elasticsearch as a series of log events not necessarily in the same order as they arrived. After researching the problem I found out, that Docker logging drivers, in general, do not have support for multiline logging. This meant that stitching log events needed to be done in Elastic stack.

For my second attempt, I reconfigured Docker logging driver to default JSON driver and installed Filebeat. Filebeat was configured to use container input and to handle multiline events. After configuring Logstash to parse information out of log events, I noticed that it would be much easier to reconfigure my application to output log events on a single line in JSON format. So for my final solution, I settled on that. Using JSON format to log events allows for easy parsing of event data in Logstash and removes the need for multiline handling.

Another bonus of using Filebeat was also easy inclusion of container information into log events by adding Docker metadata to processing pipeline. As noted in the documentation, Filebeat requires access to Docker Unix port and in my case needed to be mapped to a container running Filebeat. Note that Filebeat has two inputs available for Docker, Docker input and container input. As stated in the documentation, Docker input is deprecated and replaced by container input. See below my Filebeat configuration.

filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

filebeat.inputs:
    #Configure container input to monitor container logs
  - type: container
    paths: 
      - '/var/lib/docker/containers/*/*.log'

processors:
- add_docker_metadata:

  # Whitelist the fields of interest
- include_fields:
     fields: ["container.id", "container.image.name", 
     "container.name", "host.name", "log.file.path", "message"]

output.Logstash:
  hosts: ["Logstash:5044"]

setup.kibana:
  host: 'kibana:5601'

Processing log data in Logstash

My processing pipeline had to handle two distinct use cases. First being data that did not arrive in JSON format from applications running in the same Docker environment. Another use case was applications configured to write log entries in JSON format. These logs included additional data from ASP.NET and I wanted to limit the number of different fields that would be stored in the database. My goal was to store an entry as is, when not in JSON format, and when in JSON format to allow only whitelisted fields. To not lose information from JSON entries, a complete entry is also stored. Also for log data in JSON format, I wanted to use the timestamp written in the log entry and not the default behavior of using time of arrival into the Logstash processing pipeline. This timestamp is later used in Kibana to sort the entries chronologically.

filter {
  json {
    id => "json"
    source => "message"
    target => "[@metadata][log_entry]"
    skip_on_invalid_json => true
  }

  if [@metadata][log_entry] {
    date {
      id => "apply_log_timestamp"
      match => ["[@metadata][log_entry][@t]", "ISO8601"]
    }

    mutate {
      id => "copy_allowed_fields"
      copy => {
        "[@metadata][log_entry][@l]" => "log_level"
        "message" => "[log_entry][source_json]"
        #Other whitelisted properties
      }
    }

    mutate {
      id => "replace_message"
      replace => { 
        "message" => "%{[@metadata][log_entry][@m]}"
      }
    }
  }
}

As a first step of processing the JSON filter tries to parse received log entry from property message. On success this is stored in [@metadata][log_entry] property. I decided to store in @metadata property since anything stored there will not be forwarded to output, which in turn allowed me to implement whitelisting of properties.

The next step is a conditional check if there is an entry. In case of parsing error, this will be missing and the if block will be skipped. With date filter plugin timestamp from an application is used to overwrite message timestamp. In my case Serilog is configured to store it in the field @t.

After setting the date, whitelisted fields are copied to target fields using mutate filter plugin. In my sample above, I overwrite log_level field with the one from parsed application log entry and store the full original message to source_json field. In this way, no logging data is lost but only desired fields are configured.

As a final step message field is replaced with text message from the application. Although mutate filter allows both copy and replace operation to be done at once, that did not work for me, so I had to split it in two consecutive steps.

Configuring Index Lifecycle management in Elasticsearch

Since we are dealing with logging data, I wanted to also to have some automatic data removal strategy. In my case, I was aiming to keep 90 days worth of logging data and afterward delete it. In Elasticsearch this is possible to configure using Index Lifecycle Management.

Elasticsearch defines four stages for an index on how the data is used:

  • Hot - index is actively updated and queried
  • Warm - index is no longer updated, but still being queried
  • Cold - index is no longer updated and seldom queried
  • Delete - index is no longer needed and can be deleted

To transition an index between different stages automatically, a policy needs to be set up for index. Policy needs to define a condition when a rollover to a new index needs to occur, e.g.: an index size is reached. Afterward, the policy needs to be applied to an index template. This can be done via Dev Tools console in Kibana.

In the end, I created a policy which makes an index rollover daily or if 1GB size is reached. Afterward, an index goes into a warm phase and is deleted after 90 days. This in practice means that an index will be created daily. My idea behind it was, that in this manner the deletion would be more granular. Unfortunately, after running such a setup for a while, I did notice that querying the logging data over several indexes - that is over several days, would often fail. I did not fully research the problem, but my current suspicion is, that the failure is related to having so many indexes. So with that warning out of the way, here is the command to create the policy I described.

PUT _ilm/policy/hot-warm-delete-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "1gb",
            "max_age" : "1d"
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "warm": {
        "min_age": "0ms",
        "actions": {
          "readonly" : { },
          "set_priority": {
            "priority": 25
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

And to apply the policy to an index template use the following command. Note that I am targeting my default name for Logstash index.

PUT _template/logging_template
{
  "index_patterns": ["Logstash-*"],
  "order" : 1,
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.lifecycle.name": "hot-warm-delete-policy"
  }
}

Conclusion

Elastic stack can be composed of many different components, so configuring it does take some effort to get the desired effect. When I was working on my setup, I noticed that to achieve a certain goal there were usually several approaches. This means that a good deal of time is spent trying the different approaches to see which one gets you closest to your desired goal. Another pitfall to consider when configuring your pipeline is, that some approaches you might find on the internet are already obsolete with the latest Elastic stack version - e.g. index lifecycle management development seems rather recent development and was done differently before.

As already mentioned, I did encounter some issues in my current Elasticsearch index setup, so have that in mind if you plan to implement index lifecycle management in a similar fashion.