My History with News Feeds

By anders pearson 31 Mar 2013

Back when RSS first came out and a couple major sites started supporting it, I cobbled together an “aggregator” that was just a Perl script that pulled down the dozen or so feeds that I knew of (Slashdot, memepool, and a couple others), did a tiny bit of munging to get them merged into one XML file that sat in my web root, then used an XSL transformation (via Apache Cocoon) to turn that into a single HTML page that had all the links from all those sites in one place.

I thought it was the coolest thing ever.

It was broken about 70% of the time because one feed or another in the group would have a stray ampersand or angle bracket in a headline making the entire thing invalid and breaking the transformation.

More feeds kept coming out and it got too cumbersome to do things that way, so I eventually built a database backed reader with a mod_perl frontend that had an interface a little more like the Reader style with a stream of news and a notion of read vs unread entries. 70% of the feeds were still invalid at any given time and couldn’t be parsed, but at least they didn’t take down my whole aggregator. I think I spent a lot of time sending email to sites letting them know that their feeds were invalid.

Then Mark Pilgrim released his Universal Feed Parser Python script along with his whole Postel’s Law rant. I was way more into Perl at that point, but I knew a little Python and the promise of being able to parse even invalid feeds was quite enticing, so I rewrote the import side of my aggregator as a Python script using feedparser that then did a POST to the mod_perl app with the data. It worked great. Probably only 10% of the feeds were broken at any given time now.

The hourly cronjob that ran on my desktop pulling down and parsing all the feeds was a beast though. I would pretty much have to stop working for five minutes every hour while my machine chugged away. I put up with that for probably a year or two. It sucked, but I was hooked on having fresh news feeds all pulled into one place where I could quickly see only the new items. It was like having an IV drip of information heroin straight into my veins.

Then Bloglines came out. It had a snappier feeling interface than mine, seemed to do an even better job handling broken feeds, and most importantly, didn’t drag my workstation down every hour. So I switched there without hesitation and was happy for years.

Reader came out and I stuck with Bloglines for quite a while but the Plumber kept showing up more and more frequently (Bloglines users know what I’m talking about), and there was an inevitable feeling that Google was just going to crush them sooner or later, so I eventually switched away. Most of the Reader interface I liked much less than Bloglines, but it was fast, and the keyboard commands for navigation were really nice. I’ve been there ever since.

I looked at my feed count the other week when they announced it was shutting down and it was well over 600. I occasionally try out standalone reader clients and mobile clients, but none of them work for my intense feed reading workflow.

A typical session for me is that I would open up Reader in my browser in the morning. There will be a few hundred unread entries. 95% of them I know will be complete noise, so the goal is to get through them all and find the entries worth actually reading in as little time as possible. In Reader, hitting g, then a, puts you in a stream of all the entries from all the categories all merged together. Then j over and over again to advance, marking each entry as read along the way.

Each entry that comes up has about half a second for me to decide if it’s noise or something that might potentially be interesting. Noise just gets a j and I’m on to the next one. Potentially interesting, I open it in a background tab and continue going through my feeds. I can make it through hundreds of entries in just a few minutes that way. Once I’ve made it through all my feeds and have no more unread items, I have about a dozen to two dozen browser tabs going in the background. I work through each of those with browser keyboard commands (Ctrl-PgDn and Ctrl-w mostly). One pass through to sift out ones that on second look really aren’t worth spending time on. Then a pass where I actually read any of them that look digestable in a couple minutes. Then a final pass where long articles I want to read later on my tablet get sent to Pocket, and what’s left is generally programming-centric stuff that I want to actually do something with immediately. Eg, a new Python module that looks useful that I will dig through the code and changelog or even install and write a small script to play with.

The whole process takes about an hour, and the end result is that I’ve got a half dozen articles sent to my tablet for subway reading, I’ve read about a dozen short to medium length posts, I’ve probably played with a bit of code, and I’ve also seen a few hundred headlines go past so I have a sense of what’s going on even in areas that I didn’t bother investigating deeply.

I have a smartphone and an Android tablet like a good little techie, but I’ve had zero interest in ever reading news feeds on them, even using the Google Reader Android app. Without my full browser tab setup, I could never figure out a way to get through my feeds at even a tenth the speed of my regular workflow. I’m much happier using the devices for reading the longer articles that I’d previously saved off.

When Google changed the behavior of the “share” button on Reader last year to integrate it with Google+ and everyone got upset, I didn’t really share in the outrage. But then, I also never understood why an aggregator would have a “share” button in the first place. I certainly didn’t use it.

The other week, Google announced they were shutting down Reader and I said, “well, guess I’m going to have to write my own again”. So the next weekend, that’s what I did. I’ve been on a bit of a kick for the last year, freeing myself from walled garden services. I shut down my facebook account, put up my own photo gallery site and let my Flickr account lapse. I just no longer want to be dependent on services that are out of my control.

At this point, my aggregator is unpolished, but it fully supports my workflow for reading feeds. Actually, I already prefer it to Reader since everything non-essential to my workflow is stripped out. It’s running at http://feeds.thraxil.org/ and is open to anyone to sign up, but I can make no guarantees about longevity. Honestly, at this point, you should only think about using it if you want to help me debug it.

If you’ve got any programming chops, I recommend writing your own. It’s a fun problem that will expose you to more of the development landscape than a typical web app (eg, a task queue to take the heavy lifting out of the request/response cycle is pretty much a necessity) and you’ll be able to reduce your own level of dependency on services controlled by other people with priorities that don’t match your own.