Sysadmins know that monitoring, metrics, logging, and alerting are deceptively complicated problems. And complicated problems have developed complicated tools.
I've used a lot of them. Individually, they all suck in different ways. At least in the open source world, no single tool will handle everything. The combination of multiple tools needed to cover all the territory sucks even more. Multiple services to run on every machine, multiple config formats, multiple open ports and data directories that have to be kept track of. An exponential explosion of things that can break or be misconfigured.
Things are looking up though. Over the last few years, a few key pieces have been falling into place. Graphite has emerged as a pretty good backbone to hang a lot of the other pieces off. If you can figure out how to turn whatever you're interested into a single number, you can feed it into Graphite. Poll Graphite for that metric and you can send an alert if it passes a threshold. There are any number of tools out there to take your graphite metrics and turn them into beautiful dashboards that you can put on a big screen monitor in your office so everyone can see the health of your whole system at a glance. It's not perfect, but it handles a lot of use-cases and makes life simpler.
That's only one piece of the puzzle though. Funneling everything down to a single numerical metric works for quite a few things, but there are still plenty of cases where you want actual log entries, preferably even parsed out into fields and tagged with timestamps. If you can feed it into elasticsearch, Kibana will give you a pretty reasonable interface for looking through it. Even for the data making its way to Graphite, you still have to collect it from a variety of sources and grind it up and turn it into the right format.
What's left is basically a plumbing problem. Collecting data from logs various logs and applications in different formats and routing it around, parsing some, filtering out noise at the source, aggregating and rolling up some, turning some of it into counters and rate variables and getting the right bits out to the right endpoints.
Recently, Mozilla announced Heka, which is another player in that space. Heka is clearly heavily inspired by Logstash, but has its own flavor. It's a very young project, but so far I'm really liking what I'm seeing. It's written in Go and compiles to a small, fast binary that's painfully easy to deploy and is very light on resource usage. I tend to run small virtual servers without a lot of memory to spare, so I've been paying close attention.
Where Heka gets difficult is in configuring it. This is a problem that it shares with the other similar "plumbing" tools. The problem is that since they mostly just connect other components together, when looked at in isolation, it's hard to know exactly what you should do with it at first. You can easily download the right Heka binary for your system and run it. And you'll see... nothing. In order to see it do anything, you'll need to at the very least figure out how to get some kind of data into it and tell it how to output data to some other system that you can watch. To do that, you'll have to read quite a bit of the documentation. I've found that Heka's current documentation particularly rough for this initial "but how do I actually use it?" use case. The example config file that you first encounter does a good job of giving you a sense of what Heka is capable of, but isn't likely to work for your setup and you'll have to read a lot more to make sense of what it's doing.
So I figured it would be helpful to come up with a couple very simple config files that might be more helpful for someone just starting out with Heka.
Of course, first, you'll want to download the appropriate Heka binary for your system. Personally, for testing purposes, I just grabbed the right tarball and unpacked it to get the
hekad binary. You can run it very straightforwardly with:
$ hekad -config=/path/to/config.toml
It will either complain about errors in your config file or it will run in the foreground.
Now let's make some config files. The format is TOML which is super easy to get right, so at least there's that.
First up is pretty much a "hello world" equivalent config. We just want Heka to watch one log file and, when a line is appended to it, to print that line out to another config file. This is the simplest Heka config I've been able to make that actually does something. Here it is:
Save that to
echo.toml and point hekad at it.
Then open two more terminals. In one, append a line or two to
$ echo one >> /tmp/input $ echo two >> /tmp/input $ echo three >> /tmp/input
In the other terminal, run a
tail -f /tmp/output. You should see something like:
one two three
If you append more text to the input, you should see it show up the output. There's an extra newline on each of them, but at least we're seeing something. We've successfully recreated a unix pipe.
There are a few things to note in the config. Obviously, there's a "LogfileInput" section that points it at a log file to watch and a "FileOutput" section that tells it to write output to a particular file. The "seekjournal" line tells Heka where it can store its own data for basic internal bookkeeping. That's only necessary because I'm running it as my regular user and Heka's default,
var/run/hekad/seekjournals/ isn't writeable by my user, so I just point it off at a temp file for now. Then, in the output section, this line:
message_matcher = "Type =~ /.*/"
Is telling Heka to apply this to any and all messages that it encounters. Remember that Heka's job is to handle lots of different inputs, apply various filters to them, and route them to the right outputs. So everything that isn't an input needs to be specific about which messages it would like to handle. Our simple output section just matches anything whatsoever. As you get more complicated Heka configs, this output section is a handy one to be able to paste in when you're debugging so you can see everything that Heka is doing.
In the next post, I'll show you a slightly more complicated config that will let Heka run as a replacement for statsd.