thraxil.org:

Behind the Music

by anders pearson Tue 02 Jan 2018 11:55:14

I posted my yearly music roundup yesterday, which I've done for the last three years. Today I thought I'd just take a moment to explain how I go about creating those posts. Eg, there are 165 albums in the last post and I link both the artist and album pages on each. Do you really think I manually typed out the code for 330 links? Hell no! I'm a programmer, I automate stuff like that.

First of all, I find music via a ton of different sources. I follow people on Twitter, I subscribe to various blogs' RSS feeds, and I hang out in a bunch of music related forums online. So I'm constantly having new music show up. I usually end up opening them in a new tab until I get a chance to actually listen to them. Once I've listened to an album and decided to save it to my list, my automation process begins.

I'm a longtime emacs user, so I have a capture template set up for emacs org-mode. When I want to save a music link, I copy the URL in the browser, then hit one keyboard shortcut in emacs (I always have an emacs instance running), I paste the link there and type the name of the artist. That appends it to a list in a text file. The whole process takes a few seconds. Not a big deal.

At the end of the year, I have this text file full of links. The first few lines of this last year's looks something like this:

** Woe - https://woeunholy.bandcamp.com/album/hope-attrition
** Nidingr - https://nidingrsom.bandcamp.com/album/the-high-heat-licks-against-heaven
** Mesarthim - https://mesarthim.bandcamp.com/album/type-iii-e-p
** Hawkbill - https://hawkbill.bandcamp.com/track/fever
** Black Anvil - https://blackanvil.bandcamp.com/album/as-was
** Without - https://withoutdoom.bandcamp.com/
** Wiegedood - https://wiegedood.bandcamp.com/releases

For the first two years, I took a fairly crude approach and just record an ad-hoc emacs macro that would transform, eg, the first line into some markdown like:

* [Woe](https://woeunholy.bandcamp.com/album/hope-attrition)

Which the blog engine eventually renders as:

<li><a href="https://woeunholy.bandcamp.com/album/hope-attrition">Woe</a></li>

A little text manipulation like that's a really basic thing to do in emacs. Once the macro is recorded, I can just hit one key over and over to repeat it for every line.

Having done this for three years now though, I've noticed a few problems, and wanted to do a little more as well.

First, You'll notice that the newest post links both the artist and the album. This, despite the fact that I only captured the album link originally.

Second, if you look closely, you'll notice that not all of the bandcamp links are quite the same format. Most of them are <artist>.bandcamp.com/album/<album-name>, but there are a few anomalies like https://hawkbill.bandcamp.com/track/fever or https://withoutdoom.bandcamp.com/ or https://wiegedood.bandcamp.com/releases. The first of those was a link to a specific track on an album, the latter two both link to the "artist page", but if an artist on bandcamp only has one album, that page displays the data for that album. Unfortunately, that's a bad link to use. If the artist adds another album later, it changes. Some of the links on my old posts were like those and now just point at the generic artist page.

So, going from just the original link that I'd saved off, whatever type it happened to be, I wanted to be able to get the artist name, album name, and a proper, longterm link for each.

I've written some emacs lisp over the years and I have no doubt that if I really wanted to, I could do it all in emacs. But writing a web-scraper in emacs is a little masochistic, even for me.

The patth of least resisttance for me probably would've been to do it in Python. Python has a lot of handy libraries for that kind of thing and it would've have taken very long.

I've been on a Go kick lately though, and I ran across colly, which looked like a pretty solid scraping framework for Go, so I decided to implement it with that.

First, using colly, I wrote a very basic scraper for bandcamp to give me a nice layer of abstraction. Then I threw together a real simple program using it to go through my list of links, scrape the data for each, and generate the markdown syntax:

I just run that like:

 go run yearly.go | sort --ignore-case > output.txt

And a minute or two later, I end up with a nice sorted list that I can paste into my blog software and I'm done.

TAGS: music emacs golang

formatting is with Markdown syntax. Comments are not displayed until they are approved by a moderator. Moderators will not approve unless the comment contributes value to the discussion.

namerequired
emailrequired
url
remember info?