5 Data Storage Solutions for Better Log Management
Servers generate a large number of events (logs) that contain important information including operation errors, warnings, and users' behavior. By default, most servers store these data in plain text log files on their local file systems. While plain-text logs are accessible and human-readable, they are difficult to use, reference and analyze without holistic systems for aggregating and storing data.
The solution for storing big data is a huge technical challenge nowadays. Of course, there are different options such as RDBMS, NoSQL, time-series databases, etc. LogPacker gives users the freedom to choose among 15 supported data providers. In this article, we analyze five storage solutions for saving large amounts (a few TB of data) of logs generated on multiple instances.
Let’s consider five data storage engines that are efficient for storing time-based events:
ClickHouse.
PrestoDB.
InfluxDB.
Apache HBase.
Tarantool.
These providers are created for different purposes, but all of them can be used for saving time-based events.
Data Storage Requirements
It’s not enough to just save events into a database. Each data storage must have an interface to search logs with good performance in real time.
It should be able to store at least 40 GB data per day.
The total data size will be around 20 TB of data.
Searching for log messages should be done in real time. The acceptable response time for a search query is less than 10 seconds.
For example, in LogPacker, we have a common API wrapper that allows you to search logs, get the latest data, manipulate with tags, etc. A user has only one interface to send and receive data that doesn’t depend on data provider but has various connectors inside LogPacker.
Log/event entry has the following attributes:
ID.
Message.
Source.
Time.
AgentID.
TagName.
LogLevel.
Count.
FileSize.
Platform.
In our case, LogPacker Agent sends formatted logs to the server, so each record from different sources has the same structure and can be easily analyzed.
Now, let’s describe in more detail the responsibilities for each data storage and connector. We simplified the list of functions for our example.
Save the Event object into the database with proper indexes. The Insert operation doesn't have to by automatic. It takes more time.
Get a list of unique tags in the system. In our case, Tag is the name of a log/event producer such as Nginx, MySQL, HAProxy, etc.
Get a list of unique Agent IDs in the system. The Agent ID can be the server name from where the Event has been sent, an Android device name, or even a website domain.
Get a sorted list of logs for the specified filter. The main filter option is a time range, which is mandatory. The list of tags, list of agents, etc. are optional
Note: The LogPacker API is written in GO, so all code examples will be illustrated in this programming language.
1. ClickHouse
Key features:
Highly reliable.
Cross-data-center replication.
SQL support.
Performance exceeds comparable column-oriented DBMS.
ClickHouse (CH in this article) is an open-source column-oriented database management system that allows generating analytical data reports in real time. Created by Yandex developers for internal purposes, it has evolved into an open-source tool. It currently powers Yandex.Metrica, the world’s second largest web analytics platform with over 13 trillion database records and over 20 billion events a day. It generating customized reports on-the-fly, directly from non-aggregated data — so it is really fast.
Here, you can find the installation instructions.
CH supports SQL syntax, so it’s very easy to use it. Our requirements actually need only one table ("Events") in the LogPacker database. A lot of table engines are provided by CH, we will use MergeTree. The MergeTree engine supports an index by primary key and by date and provides the possibility to update data in real time. This is the most advanced table engine in ClickHouse. A MergeTree type table must have a separate column containing the date. The type of the date column must be "Date" (not "DateTime"). We don’t have this type in our Event structure, so we will add it as additional.
The "CREATE TABLE: statement looks like this:
//CREATE TABLE IF IT DOES NOT EXIST logpacker.events
`id`
String,
`message`
String,
`source`
String,
`time`
Int64,
`server_time`
Int64,
`date`
Date,
`agent_id`
String,
`count`
Int32,
`tag_name`
String,
`log_level`
Int32,
`file_size`
Int32,
`platform`
String
) ENGINE = MergeTree(date, (id), 8192);
Multiple inserts work the same as MySQL:Data inserts. In our subject, this is a very frequent operation — it can be thousands of requests per second. We recommend inserting data in packets of at least 1,000 rows or making no more than a single request per second. To improve performance, you can make multiple INSERT
queries in parallel, and performance will increase linearly:
//INSERT INTO logpacker.events VALUES
("id1", "[fatal] some error from LB", "lb.log", 1469695592, 1469695592, toDate(now()), "server1.aws", 1, "lb", 1, 1024, "winserver"), (...), (...);
Now, let's SELECT
some data:
SELECT `id`, `message`, `source`, `time`, `server_time`, `agent_id`, `tag_name`, `log_level`, `file_size`, `count`, `platform`
FROM events
WHERE `tag_name`
IN("nginx", "lb")
AND `agent_id`
IN("winserver")
AND `time` >= 1469695592;
Note: ClickHouse has a Golang connector.
Additionally, CH supports replication, which is really necessary for large amounts of data and log management. Replication is only supported for tables in the MergeTree
family. Replication is asynchronous and multi-master. INSERT
and ALTER
queries can be sent to any available server. Data is inserted on this server, then sent to the other servers. Because it is asynchronous, recently inserted data appears on the other replicas with some latency. If a part of the replicas is not available, the data on them is written when they become available. If a replica is available, the latency is the amount of time it takes to transfer the block of compressed data over the network.
2. PrestoDB
Key features:
Connector architecture that is Hadoop friendly.
Supports ANSI SQL.
ODBC and JDBC support.
Map data type support.
Presto is an open-source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.
If you need help, check out this documentation on how to install PrestoDB.
Presto is being called a database by many members of the community. It provides different connectors with Apache Hive by default. PrestoDB works with SQL syntax.
The "CREATE TABLE" statement looks like this:
//CREATE TABLE IF NOT EXISTS events
`id`
varchar,
`message`
varchar,
`source`
varchar,
`time`
bigint,
`server_time`
bigint,
`agent_id`
varchar,
`count`
bigint,
`tag_name`
varchar,
`log_level`
Inbigintt32,
`file_size`
bigint,
`platform`
varchar
);
PrestoDB has clusterization and can be deployed on different machines. The node properties file etc/node.properties
contains configuration specific to each node. A node is a single installed instance of Presto on a machine. This file is typically created by the deployment system when Presto is first installed. INSERT
and SELECT
functions are the same as for ClickHouse or MySQL.
PrestoDB has a good Golang library provided by Facebook.
3. InfluxDB
Key features:
Support for hundreds of thousands of writes per second.
Time-centric functions and an easy to use SQL-like query language.
Data can be tagged, allowing for very flexible querying.
Support for a variety of high-availability and clustering schemes.
InfluxDB is an open-source database written in Go specifically to handle time series data with high availability and high-performance requirements. InfluxDB can be installed in minutes without external dependencies, yet it is flexible and scalable enough for complex deployments.
A fresh installation of InfluxDB has no databases (apart from system _internal
), so creating one is our first task. You can create a database with the "CREATE DATABASE" statement.
InfluxDB is a time series database, so it makes sense to start with what is at the root of everything: time. All data in InfluxDB have that column. Time stores timestamps that show the date and time (in RFC3339 UTC) associated with particular data.
The measurement acts as a container for tags, fields, and the time column. The measurement name is the description of the data that are stored in the associated fields. Measurement names are strings, and, for any SQL users out there, a measurement is conceptually similar to a table that will be created automatically.
InfluxDB uses points for writing data. Post multiple points to multiple series at the same time by separating each point with a new line. Batching points in this manner results in much higher performance.
The GO client has functions to post points or batches:
c, err = : client.NewHTTPClient(client.HTTPConfig {
Addr: "localhost:8086",
})
//Full event map
event: = map[string] string {
"ID": "id1"
}
point, err: = client.NewPoint("events", nil, , time.Now())
bps, err: = client.NewBatchPoints(client.BatchPointsConfig {
Database: "logpacker",
})
bps.AddPoint(p)
c.Write(bps)
4. Apache HBase
All Apache HBase data can be queried from measurements by simple SQL. InfluxDB can handle hundreds of thousands of data points per second. Working with that much data over a long period of time can create storage concerns. A natural solution is to downsample the data; keep the high precision raw data for only a limited time and store the lower-precision, summarized data for much longer or forever.
Key features:
Near real-time speed.
Hadoop scalable.
Automatic, tunable replication.
Store data of any type.
Apache HBase is the Hadoop database. It is also a distributed, scalable, big data store. It’s very convenient to store logs in and it has real-time read/write access to your data. This project's goal is the hosting of very large tables — billions of rows, X millions of columns.
Quickstart will get you up and running on a single-node, a standalone instance of HBase, followed by a pseudo-distributed single-machine instance, and finally a fully-distributed cluster.
HBase uses a data model very similar to that of Bigtable. Users store data rows in labeled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely so that rows in the same table can have crazily varying columns if the user likes.
Here's how to a create table automatically with a Golang client:
c, err: = goh.NewTcpClient("localhost:9090", goh.TBinaryProtocol, false)
cols: = [] * goh.ColumnDescriptor {
goh.NewColumnDescriptorDefault("ID"),
goh.NewColumnDescriptorDefault("Message"),
// ...
}
c.CreateTable("events", cols)
And the following code will Mutate one row:
mutations: = [] * Hbase.Mutation {
goh.NewMutation("ID", [] byte("id1")),
// ...
}
c.MutateRow("events", [] byte("row1"), mutations, nil)
HBase provides a cluster replication mechanism that allows you to keep one cluster’s state synchronized with that of another cluster, using the write-ahead log (WAL) of the source cluster to propagate the changes. Some use cases for cluster replication include:
Backup and disaster recovery.
Data aggregation.
Geographic data distribution.
Online data ingestion combined with offline data analytics.
HBase provides Scan to get data from tables. Scan settings in MapReduce
jobs deserve special attention. Timeouts can result (i.e., UnknownScannerException
) in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer
for the next set of data.
5. Tarantool
Note: Tarantool is a different thing. It’s a memory-based database, but it’s really fast and has clusterization features, so in the cloud, we use it as front-end storage for our logs that are not older than one hour.
Key features:
ACID transactions.
Asynchronous master-slave and master-master replication.
Server-side scripting and stored procedures.
It’s easy to install Tarantool. Tarantool architecture needs some clarification. When Tarantool is being used to store data, there is always at least one space, or it can be many spaces. Each space has a unique name specified by the user. Each space has a unique numeric identifier that can be specified by the user but it is usually assigned automatically by Tarantool. Spaces always contain one tuple set and one or more indexes. In order for a tuple set to be useful, there always must be at least one index in a space.
For our los storage, we will have one space called LogPacker with multiple indexes. For example, we will have indexes for TagName
, AgentID
, etc. to be sorted by definite fields.
The Tarantool Go connector allows to send Lua code to Tarantool, so spaces and indexes can be created automatically:
opts: = tarantool.Opts {}
c, err: = tarantool.Connect("localhost:3013", opts)
c.Eval("box.schema.space.create('logpacker')", [] interface {} {})
c.Eval("box.space.logpacker:create_index('primary', " {
unique = true, parts = {
1, 'STR'
}
}
")", [] interface {} {})
For writing and querying data, Tarantool has insert/select functions. Tarantool maintains a set of write-ahead log (WAL) files. There is a separate thread — the WAL writer — that catches all requests that can change a database, such as box.schema.create
or box.space.insert
.
Replication allows multiple Tarantool servers to work on copies of the same databases. The databases are kept in sync because each server can communicate its changes to all the other servers. Servers that share the same databases are a cluster. Each server in a cluster also has a numeric identifier that is unique within the cluster, known as the server ID. A replica gets all updates from the master by continuously fetching and applying its write-ahead log (WAL). Each record in the WAL represents a single Tarantool data change request such as INSERT
, UPDATE
, or DELETE
, and is assigned a monotonically growing log sequence number (LSN). In essence, Tarantool replication is row-based; each data change command is fully deterministic and operates on a single tuple.
Conclusion
You may disagree that some of the listed storages are efficient for log storage. In the case of having a limited infrastructure, you can use other, more common storage solutions such as MySQL or ElasticSearch. LogPacker supports other data storages, as well. You should analyze your data size, architecture, replication requirement, etc., and choose the best one for your case.
How you store data depends on a few criteria.
Storing Period
If the logs are for long-term archival purposes and do not require immediate analysis, then S3, AWS Glacier, or tape backup might be suitable options since they provide services for large volumes of data at a relatively low cost. If you need only a few days' or months' worth of logs, then storing them on distributed storage systems such as HDFS, Cassandra, MongoDB, or ElasticSearch works well. If you only need a few hours' worth of retention for real-time analysis, Redis might work, as well.
Data Volume Environment
A day's worth of logs for Google is different from a day's worth of logs for ACME Fishing Supplies. The storage system you chose should allow you to scale out horizontally if your data volume is large.
Log Access
Some storage solutions are not suitable for real-time or even batch analysis. AWS Glacier or tape backup can take hours to load a file. This doesn’t work if you need log access for troubleshooting production. If you plan to do more interactive data analysis, storing log data in ElasticSearch or HDFS may allow you to work with raw data more effectively. Sometimes, log data is so large that it can be only analyzed in more batch oriented frameworks. The defacto standard, in this case, is Apache Hadoop along with HDFS.
Hopefully, the all of the information provided in this article will help you choose the best tool and become a log hunter!