scaling tag clouds
By anders pearson 13 Dec 2005
While we’re on the subject of tagging, let’s talk a little bit about tag clouds and their display.
Tag clouds are nice visual representations of a user’s tags with the more common clouds displayed in a larger font, perhaps in a different color. Canonical examples are delicious’ tag cloud and flickr’s cloud.
It’s a clever and simple way to display the information.
Doing it right can be a little tricky too. A tagging backend (like Tasty) will generally give you a list of tags along with a count for each for how many times it appears.
The naive approach is to divide up the range (eg, if the least common tag has a count of 1 and the most common 100) into a couple discrete levels and assign each to a particular font-size (eg, between 12px and 30px).
The problem with the naive approach (which I’m now noticing all over the place after spending so much time lately thinking about tag clouds) is that tags, like many real-world phenomena typically follow a power law. The vast majority of tags in a cloud will appear very infrequently. A small number will appear very frequently.
Here‘s an example of a cloud made with this sort of linear scaling.
It’s better than nothing, but as you can see, most tags are at the lowest level and there are just a couple really big ones. Since it’s dividing up an exponential curve into equal chunks, much of the middle ground is wasted.
To make a cloud more useful, it needs to be divided up into levels with a more equal distribution. The approach i found easiest to implement was to change the cutoff points for the levels. Conceptually, it’s sort of like graphing the distribution curve logarithmically. so instead of dividing 1-100 up as (1-20, 21-40, 41-60, 61-80, 81-100), it becomes something like (0-1, 2-6, 7-15, 16-39, 40-100).
That turns that same cloud into this, which I think makes better use of the size spectrum.
The actual algorithm for doing the scaling requires a bit of tuning but this is the prototype code I wrote for testing that produced that nicer scaled cloud: