more fun with RSS

i am an information junkie. every day i read a lot of websites. there are dozens of news sites, weblogs, and comics that i read on a regular basis. i also like fresh information; i like to know about something as soon as possible. i also have a job that i’d like to keep and every once in a while i’d like to at least pretend that i have a life. so i don’t really have time to visit each of these sites one (or more) times a day checking to see if they’ve been updated with any new content.

luckily, there's this great standard called RSS which has really taken off lately, and has made my life much easier. RSS is simply a way of making your content available in a standardized machine-readable format. it becomes useful when you use software that can aggregate multiple RSS sources in a single location.

there are dozens of RSS aggregators out there. just do a google search and see for yourself. most of them, such as NetNewsWire run as a regular desktop application. that model's no good for me since i do some of my surfing from home and some from work. so if i've read an item while at work, it would still show up as new on a desktop aggregator at home. the kind that i really wanted was server-side with a central database and a browser based interface. there are plenty of those out there too, but none of the ones i looked at felt quite right to me; either they were too slow, they didn't have features i wanted, their interface didn't mesh with how i surf, or they had way too many features that i didn't need and that only get in the way.

so, because i'm a perfectionist, control-freak, and i know Perl, i spent a couple hours writing my own. at the time i was also looking for any excuse to try my hand at writing an application as a full-fledged mod_perl handler rather than a collection of CGI or Apache::Registry scripts.

the simple web-based aggregator i wrote, i call Corral. you just tell it to subscribe to a bunch of RSS feeds, and it fetches them hourly, picks out any new items from the feeds and presents them to you in an easy to read format. when you've read the items, you mark them as read and they won't show up again. once it's set up, it's very simple to use.

i’ve been using it myself for the last month or so and now i don’t know how i functioned without it. anyway, i figured the least i could do would be to give other people access to it so they can experience the wonders of RSS first-hand as well.

so, if you want to beta-test Corral, just do the following: go create a new account, then login to that account. at that point you shouldn't be subscribed to any feeds so you won't see much. follow the link for 'subscribe to existing feeds'. it will take you to a list of a bunch of feeds that i (or my alpha testers) have already added to the system. select (with control-click) the ones that you want to subscribe to and hit the 'subscribe to selected feeds' button. you will probably now see a bunch of items listed on your main page. once you've read them, just hit the 'mark all items as read' button at the bottom (or select individual items and mark just those ones as read). this main page functions as sort of an INBOX like you are probably used to for your email. once you've marked items as read, they won't show up in your INBOX anymore. adding a new RSS feed is a two step process, first add it via the 'add new feed' link, then make sure to subscribe to it. Corral also has the notion of categories for feeds. if you find them useful and can figure them out on your own, go ahead and use them, otherwise, just ignore them.

Corral runs on my workstation at work. it should be pretty fast but be warned that since it isn’t a production server, it could occasionally be offline or just really slow (like if i’m compiling any big projects or playing UT or something).

once you start to learn to enjoy Corral or any other RSS aggregator, you will soon find yourself frustrated that some site you like to read doesn't provide an RSS feed (and hence, can't be aggregated). first of all, make sure it doesn't have an RSS feed; sometimes they're just hidden. if it's a weblog hosted on livejournal, it has a feed. all you have to do is append '/rss' to the end of the url. eg, lani's journal http://www.livejournal.com/users/kpilo would have its RSS feed at http://www.livejournal.com/users/kpilo/rss. if it isn't a livejournal site, there's still hope. look at the source for the web-page, look in the <head> for a tag like <link rel="alternate" type="application/rss+xml" title="RSS" href="http://www.miromi.org/mt/blog/index.rdf" />. if it has one of those, you're in business.

if you still don't have any luck, you have about three options left: 1) harass the owner of the site and get them to add an RSS feed (in many weblog authoring tools or CMSs, this is a simple addition), 2) learn Perl or python and write your own script to scrape the site and generate your own feed (if you do this in Perl, i highly recommend that you look into the XML::RSS, HTML::TokeParser, and LWP::Simple modules).

a third option, is to use another little tool that i setup. fenris just watches a given url, and creates a very simple RSS feed that has a new item any time the site changes at all. it's very stupid and can easily be fooled by things like ad banners which occasionally make the page appear to be changed even when the content is really the same. but, it's very fast and doesn't require any programming. using fenris should be pretty much self-explanatory by now.

so if you have too much time on your hands (or potentially too little) and have managed to make it this far, past all those paragraphs of technical talk, feel free to give Corral a spin. if you find it useful, or have any ideas for how it could be improved, let me know. source code is forthcoming once i get around to packaging it up.