thraxil.org:

scaling tag clouds

by anders pearson Tue 13 Dec 2005 16:08:26

While we're on the subject of tagging, let's talk a little bit about tag clouds and their display.

Tag clouds are nice visual representations of a user's tags with the more common clouds displayed in a larger font, perhaps in a different color. Canonical examples are delicious' tag cloud and flickr's cloud.

It's a clever and simple way to display the information.

Doing it right can be a little tricky too. A tagging backend (like Tasty) will generally give you a list of tags along with a count for each for how many times it appears.

The naive approach is to divide up the range (eg, if the least common tag has a count of 1 and the most common 100) into a couple discrete levels and assign each to a particular font-size (eg, between 12px and 30px).

The problem with the naive approach (which I'm now noticing all over the place after spending so much time lately thinking about tag clouds) is that tags, like many real-world phenomena typically follow a power law. The vast majority of tags in a cloud will appear very infrequently. A small number will appear very frequently.

Here's an example of a cloud made with this sort of linear scaling.

It's better than nothing, but as you can see, most tags are at the lowest level and there are just a couple really big ones. Since it's dividing up an exponential curve into equal chunks, much of the middle ground is wasted.

To make a cloud more useful, it needs to be divided up into levels with a more equal distribution. The approach i found easiest to implement was to change the cutoff points for the levels. Conceptually, it's sort of like graphing the distribution curve logarithmically. so instead of dividing 1-100 up as (1-20, 21-40, 41-60, 61-80, 81-100), it becomes something like (0-1, 2-6, 7-15, 16-39, 40-100).

That turns that same cloud into this, which I think makes better use of the size spectrum.

The actual algorithm for doing the scaling requires a bit of tuning but this is the prototype code I wrote for testing that produced that nicer scaled cloud:

TAGS: math visualization tags tasty tagging clouds

comments

good stuff...I love math...

one thing I noticed today about thraxil's cloud is that it makes clicking some of the links difficult. This happens when higher level tags display directly below lower level tags (right now I'm looking at "Politics", "plugins" & "plush"). The clickable areas for "plugins" and "plush" are almost completely obscured by the clickable area for "politics"...

I enjoy the delicious tag cloud personaly but found that web 2.0 has a great cloud feature as well. Have you tried any of these?

Looks like this algorithm got picked up and credited in this Product:

http://plone.org/products/tagcloud

Nice!


formatting is with Markdown syntax. Comments are not displayed until they are approved by a moderator. Moderators will not approve unless the comment contributes value to the discussion.

namerequired
emailrequired
url
remember info?