A Gentle Introduction to Heka, Part 2: Replacing Statsd

By anders pearson 27 Jul 2013

This is a continuation of A Gentle Introduction To Heka, where I showed the simplest config possible to get Heka to do anything at all.

Now let’s up the ante a little bit and get Heka to actually do something useful.

If you’re collecting metrics with graphite, there’s a good chance that you are running statsd or one of its clones as well. Statsd has a simple, but important job of collecting simple counter and timer data from applications, rolling it up every 10 seconds or minute or whatever, and submitting it in an aggregated form to Graphite. This makes it easier to instrument your applications by sprinkling very simple “increment this counter” calls throughout your code wherever you think you might want to track something. The calls to statsd send UDP packets which don’t require a response, so your application doesn’t have to block during the request, and if statsd is down, it doesn’t take your application down with it. On the other end, Graphite gets metrics in a nicer form that’s easier to turn into a useful graph, and because statsd is only periodically flushing metrics to it, it evens out the load on Graphite. Your application might get a huge traffic spike, generating lots of events in a short time, but statsd takes the brunt of it and Graphite still just sees one submission every cycle, just with larger numbers on the counters.

Statsd, useful as it is, is very simple. It turns out that Heka can be made to replace its basic functionality just by wiring together a couple of the components that it comes with. Here’s a config:

Save that into statsd.conf, change the output section to point it at your Graphite/Carbon server and run Heka with it: hekad -config=statsd.conf

Assuming you have the python statsd client library installed, you should be able to do something like:

$ python
>>> import statsd
>>> c = statsd.StatsClient('localhost', 8125)
>>> c.incr('stats.heka.test')

Wait ten seconds or so for it to flush it out, and in your Graphite interface, you should now see that a ‘stats.heka.test’ metric has been created. Increment the counter a few times and it should do what you expect. Timers should also work:

>>> c.timing('stats.heka.timed', 320)

Heka’s carbon output is still pretty rudimentary (currently not supporting prefixes other than ‘stats’, eg), but I expect that it will only improve.

Once again, the config is pretty simple but has a couple little things that took some figuring out. It strings together three different components, the StatsInput, which listens on a UDP port specified for messages in the format that statsd takes, a StatAccumInput, which does the aggregation and rollup of those messages, and the CarbonOutput, which collects the rolled up stats and sends them to your Carbon server in the format that it expects.

StatsInput and StatAccumInput have default output/input message types that happen to match up, so they work together automatically. Getting the output of StatAccumInput to CarbonOutput took a little more work, telling CarbonOutput to watch for messages with the heka.statsmetric type, and telling StatAccumInput to put its data into the payload. I’m still not perfectly clear on why that had to be done, but I’m hoping it will make more sense as I get more familiar with wiring up more complicated setups.

The nice thing about how these pieces are broken up is that you can see how you could extend it to a much more powerful system. Imagine you had a few different kinds of stats that you wanted to collect that needed to be rolled up with different periods and maybe sent to different Carbon servers. With statsd, this could quickly lead to a proliferation of multiple statsd servers running. With Heka, you could just add additional CarbonOutputs and StatAccumInputs. The StatAccumInputs could do pattern matching to handle different metrics and route them to the appropriate outputs.

This is just the beginning of how Heka’s pluggable pipeline architecture starts to pay off.

I’m going to take a little break and spend more time exploring Heka, but perhaps next time I’ll show how Heka can also replace at least some of Logstash’s functionality.

Tags: sysadmin heka