My software development journey

Elastic stack at a glance

December 14, 2019

A short tour of Elastic stack

Another problem present in the world of containers and especially microservices is in aggregating all the log files to more easily find potential issues in your applications. During my investigations of possible tools available, I ended up using Elastic stack. One reason I chose it is since it can be easily deployed on our internal infrastructure and at least in basic version doesn’t require any license fees. So let me give you a short overview of what elastic stack is and what it can do for you.

Elastic stack (previously also known as ELK stack) was initially built to process and aggregate logs from different sources. Since then it has expanded to general monitoring and data collection, creating dashboards based on collected data, etc. It is composed of several components, which play nicely with each other, but can also work independently from one another. I list them here, with a more detailed view of each to follow:

  • Elasticsearch - a NoSQL document database where collected and processed data is stored
  • Kibana - web frontend used to display and search logging data stored in Elasticsearch database. It is also used to manage the database.
  • Beats - a collection of different data collection services. For my purposes I used Filebeat, which is a file monitoring service that forwards changes in log files.
  • Logstash - a service that processes incoming data - in my case logs, before forwarding it into storage. Note that there is a good deal of overlap between functionality between Beats and Logstash. From my observations Beats tend to be better at collecting the relevant data and specialize more in that direction, while Logstash tends to offer more options for data processing.

Elasticsearch

As mentioned previously it is a NoSQL distributed document database. It can run on a single server as well as scale over several servers in a cluster. Scaling allows for data and it’s replicas to be spread over a cluster, which helps to distribute the load over several servers as well as add fault tolerance in case of hardware failure.

Data is serialized and stored in JSON documents within an index. Each document is a collection of fields, which are the key-value pairs that store your data. By default, all data in fields is indexed in a dedicated data structure given by the schema. For those familiar with a SQL database, a comparison of terminology in Elasticsearch:

  • Index would be a rough equivalent to a table
  • The document is equivalent to a data row
  • The field is an equivalent to a column

Unlike the SQL database, Elasticsearch can optionally be schema-less, which means that new fields in a document will be automatically detected and added to index.

Elasticsearch also implements index lifecycle management, which allows the spreading of the same index over several instances. This allows partitioning of data based on when it was added to the index and possibly compacting or deleting older data.

Searching the database can be done via REST API using either the JSON-style query language or SQL style queries. Data can be searched either with structured queries where like in SQL databases matches are returned. Alternatively, a full-text query can also be performed on the documents, where results are returned by relevance in terms of how closely they match the query. A combination of both approaches within a query is also possible.

Logstash

Logstash is a data collection engine used to aggregate data from several sources to process and normalize it before writing it into destinations of your choice. Processing pipeline can be configured with different plugins like JSON parser for structured data arriving in JSON format, Grok parser to generate structured data from input text, etc.

Input data can come from:

  • Logs and metrics. This was the initial use case for Logstash.
  • The web in the form of HTTP requests and HTTP endpoint polling.
  • Datastores and streams, like various databases and message queues.
  • Sensors and IoT devices.

Processed data can be routed to other services for analysis, archiving, monitoring or alerting.

Kibana

Kibana is an open-source analytics and visualization platform designed to work with Elasticsearch. It is used to search and view the data in Elasticsearch as well as analyze and visualize data in charts, tables, and maps.

It also allows for the management of the Elasticsearch database and has built-in user management.

Filebeat

Filebeat is a lightweight file monitor that can potentially do some minor processing of incoming data and then forward data to different outputs. Typically it is used with Elasticsearch or Logstash but allows also for other outputs.

Conclusion

In this blog post, I described different components that make up Elastic stack and how they relate to one another. In the upcoming blog post, I plan to describe how I configured Elastic stack to store and display logging information from containers.