I’m a programmer, artist, writer, and more. I wrote the code that powers this site. I work for CCNMTL as a programmer. I was a WaSP member (emeritus now). I paint and draw quite a lot. I code quite a bit.

Recipe for Reliability

By anders pearson 30 Jan 2023

I thought I’d share my “simple” recipe for building reliable applications. Of course “simple” doesn’t mean “easy”, but I think this is the core philosophy behind how I approach developing reliable software.

Here is the process:

1. A developer writes some code to make one improvement to the application.
2. That developer deploys that code to production.
3. That developer observes that new code in production to verify that it did what they expected and didn’t cause other problems.
4. That developer takes what they’ve learned in step 3 and goes back to step 1.

That’s it.

Anything that deviates from that feedback loop or slows it down will ultimately make a system less reliable and more prone to incidents and outages.

One obvious caveat on this process is that the developer involved has to actually care about the reliability of the system. I’m kind of taking that as a given though. I guess It’s possible for a developer to not care about reliability, but every developer I’ve ever met has at least not actually wanted to cause an outage.

Outages and incidents usually come from more subtle deviations from that process.

One very common mistake that we, as a profession often get wrong is not having the same developer who wrote the code do the deployment and the observation. That often also includes a related mistake where more than one change is developed or deployed at once.

A commit or PR should have a single purpose. The more different things you try to do at once, the harder the code/PR will be to understand, the more likely it is that errors will be introduced, the harder it will be to thoroughly observe the behavior of all the different changes once it’s in production to verify that it’s actually correct, and the harder it will be to connect something learned in that process back to the actual cause and use that to inform the next improvement. If you’ve spent any time debugging code, this should be intuitive. Writing a thousand lines of code implementing a dozen different features before running anything pretty much guarantees a painful debugging session and missed bugs.

Implementing a single feature or even just a single branch or logical part of a feature and testing it before moving on to the next makes it much faster to locate the source of a problem and to have some confidence that each line of code does what you think it does. As experienced developers, I think most of us recognize that having a single clear purpose is important not just for the unit of work (PR) but for the unit of code as well (function, class, module, service, etc). The more different responsibilities a single piece of code has, the more likely it is to have bugs and the harder it is to understand and to work on.

Google’s CL author’s guide agrees on that point.

The CL makes a minimal change that addresses just one thing. This is usually just one part of a feature, rather than a whole feature at once. In general it’s better to err on the side of writing CLs that are too small vs. CLs that are too large.

The other part of that is probably a more common mistake, but is related. That’s when it’s not the same developer doing all of the steps. If Developer A writes some code, then Developer B (or worse, Ops Person B), deploys it, B is usually not in nearly as good a position as A to properly check that it’s working as expected. It’s not always the case, but usually A, having spent the time writing the code, has a better idea of what it’s supposed to do, what it should look like if it’s working correctly, and what edge cases are most likely to cause problems.

These two are commonly connected. Multiple developers each working on separate improvements get their code bundled together and all deployed in a batch. That pretty much always means that there are multiple changes deployed at the same time, which again makes it harder to reason about problems and interactions, creates more surface area that needs to be checked, and when a problem is found, makes it harder to trace to one change and makes it more complicated to revert just that change.

I’ve occasionally mentioned that I’m not really a fan of “staging” environments as they are often used. There are advantages to having a staging site, but the downside is that they often become a chokepoint where multiple changes get bundled together and then are later deployed to production together, invoking the two above problems. I’ve seen many prodution incidents that started when there were a bunch of different changes that had been deployed to staging, that had been verified there to different degrees. Those then all got merged together and deployed to production. The developer merging and deploying (probably separate developers) didn’t have a full understanding of all the different changes or how to verify them after the deploy. Unfortunately, this is a very common problem with staging environments. There are legitimate uses for a staging environment, but I think that they are often overused and their downside needs to be considered, especially if they are forming this kind of chokepoint.

You may have noticed that “testing” isn’t one of the steps in the process before deploying to production. There are a couple reasons for that.

First, I consider automated tests to be part of both the “write the code” and “deploy the code” steps. The process of writing code should almost always involve running tests locally. Really, a good test driven development workflow is just a mini version of the whole process above, except without a deploy step. You implement a single small piece of functionality at a time and verify its behavior the best you can and then repeat. Step 2 of the process, “That developer deploys that code to production.” doesn’t mean that the developer manually copies code out to the production servers; it means that they initiate an automated deployment pipeline, either by clicking a “deploy” button somewhere or merging a PR to trigger the deploy. The deployment pipeline should be running tests, possibly at multiple levels (unit tests, integration tests, post-deployment smoke tests) and fail as soon as any of them fail.

A more controversial reason I didn’t explicitly include a testing step is that while I love tests, I actually don’t think they’re as directly important to the reliability of a site. Good automated tests allow a developer who cares about the reliability of the site to make changes to the code and verifythose changes more rapidly. They allow the deploy step to run more safely (so less time spent debugging or fixing broken deploys). My experience is that if a hypothetical Team A writes no tests at all but otherwise cares a lot about site reliability and follows the process above, the result will be a more reliable site than Team B, who write a lot of tests but implement large changes, don’t deploy those changes individually, deploy infrequently, and don’t focus on observing the changes in production. Team A might start off worse, but they’ll learn a lot more, have a deeper understanding of how their code actually runs in production, and be able to build something that’s more reliable in the long run.

Steps 3 and 4 where the developer who implemented a change closely observe that code in production and learn from it are perhaps the key to the whole approach. This is why I tend to put so much emphasis on metrics, monitoring, logging, and observability tools. You usually can’t just see inside a running process in production, so you have to have tools in place to collect and process useful data. This is also why, while I can put a lot of those tools in place, at some point, the developers writing the code need to pick them up and use them. They are the ones who will be in the best position to know what data will help them verify that a feature is working like they expect or to understand what’s happening when it behaves differently in production than they expected. The developers will have assumptions about how often a given function is called, how long a database call should take to execute, which calls might fail, what sorts of values will be passed as parameters to a function, etc. Production traffic and end-user behavior often prove our assumptions wrong in startling ways. Uncovering those wrong assumptions quickly and correcting them is key to making a reliable site. One of the best things a developer can do is to cultivate a deep curiousity about what their code is really doing in production.

It’s also important to keep in mind that the success is greatly affected by the speed of the whole process. Each time you go through it, you should learn something. The more times you go through it, the more you learn and the more you can improve.

If the process, or steps in the process are slow or difficult, that limits how many times you can go through the cycle. A large codebase that takes a long time to navigate and slow tests make the development step slower. Slow deployment pipelines obviously make the deploy step take longer, but that’s also slowed by not having zero downtime deployments, forcing you to only be able to deploy during certain, infrequent windows (this also again makes it much more likely that you’ll end up deploying multiple changes at the same time). Not having good observability tooling makes it slower to verify the change in production. In addition to allowing fewer iterations, any slowness in the process also reduces reliability because the more time that passes between writing code and observing it in production, the more difficult it will be for the developer to really remember the full context of the change and properly verify it.

We often have additional steps in our workflow that serve purposes other than reliability and must be there. But we need to minimize their impact on the overall process. Eg, if you have compliance requirements (eg, SOC 2, PCI, ISO, etc) you probably have to have code reviews for security. Even without compliance requirements, code reviews are good (though I would argue that their importance has always been less about catching bugs or improving reliability and more about ensuring that other developers are aware of or understand the changes and maintaining common standards for our codebase). But it’s very important that turnaround time on code reviews is kept short to avoid slowing down the entire process. (of course, it’s equally important that we keep PRs small and single-purpose so the reviewers can do meaningful reviews quickly). It’s also important to lean on automation as much as possible to keep that part of the process fast and efficient.

Finally, it’s also worth mentioning that the importance of this process isn’t limited to just application code. When I’m working on infrastructure, my goal is to go through this whole cycle with Terraform config, ansible roles, etc.

This post has contained a lot of my opinions. I expect that not everyone will agree with me on many of these. I will note though that what I advocate for is pretty close to the recommendations that you will see in, eg, Charity Majors’ twitter feed, Dave Farley’s Modern Software Engineering or the DORA Report, which studies software development at a large scale. In particular, DORA are interested in what makes a “high functioning organization” and the four key metrics that they’ve found to be the most reliable predictors are 1) lead time for a change (how long it takes to get from an idea to that change being running in production and available to customers; shorter is better) 2) deploy frequency (how many times per day/month/etc. you deploy or release to end users; more often is better) 3) time to restore (if there’s an outage, how long does it typically take to fix; shorter is better) and 4) change fail percentage (what percentage of your changes/releases resulted in an incident; lower is better). The first two of those are obviously directly related to the approach I describe and I think an argument could be made that it helps with the latter two as well.

Deploying Django on Kubernetes

By anders pearson 03 Dec 2022

I have a couple old related blog posts here: Continuously Deploying Django with Docker (2015) and Continuously Deploying Django with GitHub Actions (2019). They describe my approach (at the time) to deploying Django apps on a simple cluster of servers, achieving continuous deployment with zero downtime and some basic failover and scalability capabilities.

At the end of the first one, I say:

Nevertheless, what I’ve put together is basically a low rent, probably buggy version of a PaaS. I knew this going in. I did it anyway because I wanted to get a handle on all of this myself. (I’m also weird and enjoy this kind of thing). Now that I feel like I really understand the challenges here, when I get time, I’ll probably convert it all to run on Kubernetes or Mesos or something similar.

And, several years later, I feel like I should probably mention that yes, I did actually just convert it all to run on Kubernetes. That was actually quite a while ago, but I’ve been lazy about blogging.

Since I got a lot of positive feedback on the previous posts, I figure I ought to write up my Kubernetes setup as well in the hopes that people find that useful.

Let’s talk about Kubernetes really quickly first. I should stress that I am, by no means, a Kubernetes expert. I’ve gone through the exercise of setting up and running a cluster manually. As a learning exercise, I highly recommend doing that. If you’re actually going to run a Kubernetes cluster with production workloads I only recommend going that route if you plan to invest a significant amount of time in becoming a Kubernetes expert. Kubernetes has a reputation for complexity and it’s deserved when it comes to building, operating, and maintaining a cluster. Just deploying an application to a Kubernetes cluster that someone else is responsible for operating is actually quite simple though. These days, there are a number of options for managed Kubernetes clusters that you can just pay for. I have experience with GKE and DigitalOcean Managed Kubernetes and can recommend either of those. I’m sure the equivalent offerings from AWS, etc. are also fine; I just don’t have experience with them.

The setup I’ll describe for my personal apps here uses DigitalOcean Kubernetes. I’ll point out a few things that are specific to that, but most of it will be pretty generic.

For the most part, the underlying Django applications are still structured the same and packaged individually as Docker images pretty exactly as described in the previous blog posts. The actual Dockerfiles, python dependencies, etc. have all been updated over the years to more modern approaches, but the end result is still an image that is more or less a black box that takes its configuration from environment variables (à la 12 factor apps. Kubernetes just runs docker containers, so I really didn’t have to change the apps to run them there.

Again, I’ll use my RSS feed reader app, “antisocial”, as the example and go through its configuration. That’s because it has both a web component as well as Celery workers and Celery Beat, which also need to run. That’s the most complicated one I have. Other apps that don’t need Celery or Celery Beat are basically the same, but even simpler than what I’ll show here.

My DigitalOcean Kubernetes cluster is pretty small, just three nodes with enough RAM to handle all the apps I want to run on them. Again, I aim to have at least two instances of the web component running behind a load balancer for basic failover and to allow for zero downtime deployments. With three nodes in my cluster, I want it to be able to keep serving traffic even if it loses one node. If this were for production workloads and not just my personal apps that don’t really need high availability, I’d set up a larger and more redundancy. For my personal apps though, I need to keep the costs reasonable.

The apps all use a shared managed PostgreSQL instance (again, run by DigitalOcean) and serve static files via AWS S3 and CloudFront. So the Kubernetes cluster is just handling the actual Django web apps and Celery workers as well as a RabbitMQ instance that connects them.

Kubernetes has a number of abstractions that we’re going to have to look at. I won’t go into great detail on each, since there are better resources out there for learning Kubernetes. Instead, we’ll just look at how they are set up for this Django app and how they fit together.

Another quick note: Kubernetes has a bunch of features like namespaces and RBAC that let you secure everything and prevent applications running on the same cluster from accessing each other (accidently or maliciously). Since this is all just my personal side project stuff, I’ve skipped as much of that as I can. If you are going to use Kubernetes for production, really need to go learn that stuff and understand it first.

Pretty much everything in Kubernetes is just defined in a YAML file. You then (after setting up some auth stuff that I won’t cover), interact with your cluster by running kubectl apply -f some_file.yaml. Kubernetes reads the defnition of whatever you are creating or updating in that file and updates the cluster to match it. Yes, you quickly get a bit overwhelmed with the amount of YAML involved and how verbose it can be, but you get used to it and the consistency of the interface is pretty nice.

The first thing we need for the app is a Service, which is just an abstraction for an application that will be running and accessible on the internal network. So, we have service.yaml like:

----
apiVersion: v1
kind: Service
name: antisocial
labels:
app: antisocial
spec:
type: NodePort
selector:
app: antisocial
ports:
- port: 8000
targetPort: 8000
name: gunicorn

All that’s really doing when we run kubectl apply -f service.yaml is setting up some routing information to let the cluster know that there’s some “antisocial” application that will be exposing “gunicorn” on port 8000. We’re not actually running anything yet.

The next thing we need is configuration for our applications. In my old setup, I could deploy config files and put environment variables into Systemd/upstart configs as needed. Since you don’t typically have access to the servers in a Kubernetes cluster, you need a different approach. Kubernetes provides a ConfigMap abstraction which is just bundle of key/value pairs. We make one for the app in configmap.yaml:

---
apiVersion: v1
kind: ConfigMap
name: antisocial-config
data:
DB_HOST: "....db.ondigitalocean.com"
DB_USER: "antisocial"
DB_PORT: "25060"
AWS_S3_CUSTOM_DOMAIN: "....cloudfront.net"
AWS_STORAGE_BUCKET_NAME: "thraxil-antisocial-static-prod"
ALLOWED_HOSTS: ".thraxil.org"
HONEYCOMB_DATASET: "antisocial"

Any other settings that we need could be added there. Again, a simple kubectl apply -f configmap.yaml and it’s built in the cluster.

A bad approach would be to put sensitive data like passwords or the django secret key into the ConfigMap. It would work, but isn’t recommended. Instead, Kubernetes also has Secrets which are very similar to ConfigMaps but, as the name implies, intended for secret and sensitive values. There are a bunch of different ways to set up and manage secrets. The approach I took was to have essentially an .env file called secrets.txt with just key/value pairs for the secrets like:

SECRET_KEY=....
AWS_ACCESS_KEY=...
AWS_SECRET_KEY=...

And so on. Then I do kubectl create secret generic antisocial-config --from-env-file ./secrets.txt to load them into the cluster.

So, we have a Service, and configuration split into the secret and non-secret parts. Now we are ready to actually connect those up and run our application.

Kubernetes does that, along with some more information about the steps involved in spinning up your services in a Deployment. It’s a bit of a weird abstraction at first, but quickly became one of my favorite aspects of Kubernetes once I understood it. The Deployment defines the complete desired state of the application along with enough information for the Kubernetes cluster to figure out how to achieve that desired state no matter what state it starts out in.

Let’s start by just looking at the Deployment for the web app part (ie, gunicorn) without any of the Celery stuff:

---
apiVersion: apps/v1
kind: Deployment
name: antisocial-app
labels:
app: antisocial
spec:
replicas: 2
selector:
matchLabels:
app: antisocial
template:
labels:
app: antisocial
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- antisocial
topologyKey: kubernetes.io/hostname
containers:
- image: <IMAGE>
name: antisocial
envFrom:
- secretRef:
name: antisocial-secret
- configMapRef:
name: antisocial-config
ports:
- containerPort: 8000
name: gunicorn

The last part of that is actually a good place to start. That specifies the container that’s going to run, gives it the name antisocial, sets up an environment from the ConfigMap and Secrets that we defined, and tells Kubernetes that those containers will be exposing port 8000 with the name gunicorn, which lets it associate those container/ports with the abstract Service that was defined way back at the beginning. The -image: <IMAGE> line we’ll come back to later.

The replicas: 2 line tells it to run two instances of this container. Then, the whole podAntiAffinity: block basically tells Kubernetes to do its absolute best to run those two instances on different physical nodes (the underlying servers). Having two instances running doesn’t help us much for failover if they’re running on the same node and that node goes down. “Anti-affinity” is Kubernetes’ way of letting you avoid that, while also letting Kubernetes otherwise have complete control over which containers run on which nodes without you having to micro-manage it.

The other wonderful thing about that config is that by default Kubernetes does a rolling deploy with zero downtime. When a new image gets deployed, it spawns instances of the new version and waits until they’re running before moving traffic over to them and only then shutting down the old ones. If you define health check endpoints in your containers, it will wait until the new ones are actually able to handle traffic. If you want to get really fancy, you can replace the health checks with more complicated checks that, eg, look at external metrics like error rates or latency and you can configure it to roll out the new version in small steps, only continuing if those metrics look good (ie, canary deploys).

The Celery Beat Deployment is very similar:

---
apiVersion: apps/v1
kind: Deployment
name: antisocial-beat
labels:
app: antisocial-beat
spec:
replicas: 1
selector:
matchLabels:
app: antisocial-beat
template:
labels:
app: antisocial-beat
spec:
containers:
- image: <IMAGE>
name: antisocial
command: [ "/run.sh", "beat" ]
envFrom:
- secretRef:
name: antisocial-secret
- configMapRef:
name: antisocial-config

The differences, aside from it being labeled “antisocial-beat” instead of just “antisocial” are that it only has one replica (you don’t want more than one Celery Beats instance running at once), no ports are exposed, and it adds the command: ["./run.sh", "beat"] parameter which tells the docker container to run the beats service instead of gunicorn.

It’s slightly more complicated with the Celery Workers:

---
apiVersion: apps/v1
kind: Deployment
name: antisocial-worker
labels:
app: antisocial-worker
spec:
replicas: 1
selector:
matchLabels:
app: antisocial-worker
template:
labels:
app: antisocial-worker
spec:
initContainers:
- image: <IMAGE>
name: migrate
command: [ "/run.sh", "migrate" ]
envFrom:
- secretRef:
name: antisocial-secret
- configMapRef:
name: antisocial-config
- image: <IMAGE>
name: collectstatic
command: [ "/run.sh", "collectstatic" ]
envFrom:
- secretRef:
name: antisocial-secret
- configMapRef:
name: antisocial-config
- image: <IMAGE>
name: compress
command: [ "/run.sh", "compress" ]
envFrom:
- secretRef:
name: antisocial-secret
- configMapRef:
name: antisocial-config
containers:
- image: <IMAGE>
name: antisocial
command: [ "/run.sh", "worker" ]
envFrom:
- secretRef:
name: antisocial-secret
- configMapRef:
name: antisocial-config

It’s basically the same approach as Celery Beat, but adds the whole initContainers block. That defines containers that should be executed once at initialization. In this case, it runs “migrate”, “collectstatic”, and “compress” commands in sequence before starting the Celery worker, which then stays running.

YAML files can be concatenated together into a single file, so all three of those parts go into deployment.yaml. Running kubectl apply -f deployment.yaml will then actually bring everything together and give us a setup with the Celery worker and beats process running, and two gunicorn processes running on different nodes in the cluster. If a node in the cluster goes down, Kubernetes knows what was running on it and will do its best to replicate those containers to other nodes and update the internal network routing to send traffic to them. If containers crash, it will restart them to ensure that the desired number of replicas are always available. If the cluster is expanded, Kubernetes will spread the load out across them as best it can.

Deploying to the cluster is ultimately done by running kubectl apply -f deployment.yaml after updating the -image: <IMAGE> lines in the config to point to a new version of the docker image.

A simple GitHub actions workflow can do that:

on:
push:
branches: master
name: deploy
jobs:
buildDockerImage:
name: Build docker image
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master

- name: Build docker image
run: docker build -t thraxil/antisocial:${{ github.sha }} . - name: docker login run: docker login -u$DOCKER_USERNAME -p $DOCKER_PASSWORD env: DOCKER_USERNAME:${{ secrets.DOCKER_USERNAME }}
DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }} - name: docker push run: docker push thraxil/antisocial:${{ github.sha }}

- name: Update deployment file
run: TAG=$(echo$GITHUB_SHA) && sed -i 's|<IMAGE>|thraxil/antisocial:'${TAG}'|'$GITHUB_WORKSPACE/deploy/deployment.yaml

- name: Install doctl
uses: digitalocean/action-doctl@v2
with:
token: ${{ secrets.DIGITALOCEAN_ACCESS_TOKEN }} - name: Save DigitalOcean kubeconfig with short-lived credentials run: doctl kubernetes cluster kubeconfig save --expiry-seconds 600 k8s-1-20-2-do-0-nyc1-.... - name: Deploy to DigitalOcean Kubernetes run: kubectl apply -f$GITHUB_WORKSPACE/deploy/deployment.yaml

- name: Verify deployment
run: kubectl rollout status deployment/antisocial-app

The beginning of that is the same as before, whenever we merge to master, it builds the docker images (which runs the unit tests), tags it with a git SHA1, and pushes that up to the docker hub. A small extra new step is that it then also does a quick sed to replace all occurrences of <IMAGE> in the deployment.yaml file with the newly created and pushed docker image + tag.

Then it’s a little bit DigitalOcean specific where it authenticates to my cluster.

The actual deploy happens with kubectl apply -f ... deployment.yaml. The next step, kubectl rollout status deployment/antisocial-app isn’t strictly necessary but is a nice feature. That just waits until the cluster reports back that the deployment has succeeded. If the deployment fails for some reason, the GitHub Actions workflow will then be marked as a failure and it’s much more noticable to me.

There’s a final piece that I do need to mention. What’s been covered above gets the application running, but at some point, you need to actually expose it to the rest of the internet on a public interface. Kubernetes refers to this as an Ingress. There are a ton of different ways to do Ingress. It’s common to use Nginx, Traefik, Caddy, HAProxy, or other common reverse proxies or load balancers for ingress, especially if you are running your own cluster. For managed clusters, it’s also common for providers to make their own managed load balancer services available as Kubernetes Ingress. Eg, AWS ELB, GCP HTTPS Load Balancers, etc. Digital Ocean does the same with their load balancer. If you’re using a managed Kubernetes cluster, you probably also want to use the managed load balancer. I’m not going to cover my ingress setup here because it’s specific to the DigitalOcean setup and I will instead just recommend that you follow the instructions for your chosen provider. One nice aspect of pretty much all of them is that they make SSL certificates pretty seamless to manage. If the provider’s ingress doesn’t handle certificates itself, you can use cert-manager with your Ingress. I’ve found it much easier to deal with for Letsencrypt certificates than the standard certbot approach.

Finally, if you’ve read this far, I have a secret for you: I already don’t use this setup. Yeah, by the time I got around to writing up my Kubernetes setup, I’d already moved on to a different approach that I like even better for my personal side projects. I’ll try to be faster about writing about that setup, but no guarantees. In the meantime, I do still recommend a similar approach to Kubernetes, especially if you want that failover and scalability. I’ve barely scratched the surface here of what Kubernetes is capable of as a platform that you can build on. I know it’s an intimidating amount of abstract concepts and YAML when you first encounter it. But compared to a similarly (honestly, much less) capable setup like the previous one I had with VMs running docker and systemd, custom configuration management tools, a consul/etcd cluster, registrator, consul-template, and a bunch of shell scripts for deployment, it really is simpler to deal with. It’s pretty amazing to go into the cluster management console, delete a node, and watch Kubernetes automatically move things around to the remaining servers without dropping any traffic.

Ratchet

By anders pearson 26 Nov 2022

One of my favorite all-purpose engineering tools (maybe it’s more accurate to call it a “technique”, but I’m going to stick with “tool” here) is the ratchet.

In my career, I’ve had the benefit of some very long tenures at different organizations. I and teams I’ve worked on have launched new greenfield projects, I’ve maintained some codebases for more than a decade, I’ve done big bang rewrites and piecemeal migrations. I’ve worked with experienced and talented developers and complete newbies. I’ve also inherited a lot of code and systems. Some of those have been well designed, well tested, and well documented. Others have been… not so much those things.

I’m not dogmatically against rewrites. Sometimes that’s the appropriate solution. Often though, it’s not practical or feasible to rewrite a large codebase or existing system even if it’s in terrible shape. It needs to be improved in place. The thing with systems that are already in bad shape is that making changes is risky. The larger the change, the riskier. It’s often clear that the current state of things is bad, but you don’t know exactly what “good” would look like or how to get there from where you are.

This is where the ratchet comes in.

A ratchet is two parts:

1. any small change that improves the codebase or the system in some way.
2. some safeguard that locks that change in place.

Fix a bug? Add a regression test to make sure the bug stays fixed. No automated tests at all? Add a “dummy” test suite that runs zero tests. Obviously that won’t catch any bugs by itself, but it should be low risk to introduce (you’re just adding test harnesses) and when you do start adding tests, you’ll have the scaffolding there to fit them into. Set up a commit hook or Github action to run the dummy test suite. Again, it shouldn’t introduce any risk but will get everyone accustomed to seeing tests pass as part of the development cycle. I’ve seen dummy test suites like this catch syntax errors or broken imports in code just by virtue of ensuring that the code is at least parsed and compiled before getting pushed out to production (we’ve all seen developers make “just a tiny change” and push without even running it locally; if we’re honest, most of us have done that ourselves).

Is the code all over the place in terms of conventions? Add a simple linter tool (eg, flake8 or eslint). Most of them will let you enable/disable different rules. If you need to, start out by disabling every single rule so that it’s not actually checking anything, but add it to the commit hooks or CI setup. Then you can enable one rule at a time later on as you gain confidence that they aren’t breaking anything. Each rule that gets enabled prevents the codebase from ever having that problem again. Eventually, you might make enough progress that you’re comfortable switching to an automatic formatter like black or go fmt or similar.

Is a deploy process manual, slow, and error prone? Write a Runbook entry documenting it as well as you currently understand it. Then start automating parts of the runbook. Add a simple end to end “smoketest” at the end of the deploy to verify that the deploy was successful. Before you know it, you’ll have a completely automated deployment process.

None of these are revolutionary ideas. I just find it useful to think in terms of this “ratchet” mechanism when I’m improving a codebase, a system, or even a team’s process. Make lots of small steps and make it easier for the system to naturally move towards a “better” state than a worse one. At some point the system dynamics take over and become self-reinforcing.

By anders pearson 30 Dec 2019

For the last few years, at the end of the year, I’ve been posting my roundup of new music for the year. I’m a little bored with that for now and thought that this time, I’d list the books I read this year and maybe some brief thoughts on them.

According to my Amazon order history, the first book I bought in 2019 was Hacking: The Art of Exploitation, 2nd Edition. I’d generally avoid anything with the word “hacking” in the title, but somehow this was recommended to me and I’m glad it was. It does a surprisingly good job of explaining basic exploitation techniques like buffer overflows, format string vulnerabilities, and shell code. The kind of stuff I used to read about in text files on usenet but never really got too far into. It reminded me that I really haven’t thought much about assembly and machine code since university. Back in those days, my exposure was mostly writing MIPS and Z-80 assembly and I never really bothered with x86. That led me to pick up Assembly Language Step-by-Step which isn’t terribly modern, but does a good job covering x86 assembly. Wanting to shore up the connections from assembly all the way to higher level languages, I also read Low-Level Programming: C, Assembly, and Program Execution on Intel® 64 Architecture, which was excellent. Its coverage of assembly isn’t as thorough, but it gets much more into those weird bits of compiler and linker magic that I’ve long avoided having to deal with. Later in the year, I ended up getting The Apollo Guidance Computer: Architecture and Operation and I absolutely love it. I’m an old space nerd (aren’t we all) and this is a fascinating look at the computer guidance system on the Apollo, including the hardware, the software, and some math and physics tricks they used to pull everything together.

I stumbled on Introduction to the Theory of Complex Systems at a bookstore and couldn’t resist it. It goes into network theory, evolutionary algorithms, statistical mechanics, and ties together ideas from mathematics, physics, biology, and social sciences. If you know me, you know that I’d be all over that. Afterwards, I wanted more and I’m pretty sure that The Road to Reality: A Complete Guide to the Laws of the Universe came up as a recommendation somehow. If you didn’t know, I was a physics major for most of my undergrad before switching to electrical and then computer engineering. This is the book that I wish I’d had when I was struggling through my physics classes. It doesn’t go deep into any of it, but Penrose explains so many of the concepts with a clarity that was definitely missing from my classes and textbooks. I’m not sure it would be that understandable without at least some basic college physics and math background, but if you ever took a general relativity or quantum mechanics class and felt like you didn’t really get it, this is definitely worth checking out.

That kind of got me thinking about those concepts in those physics classes that I’d learned to use but never really felt like I understood. As a physics major, you don’t necessarily have to take a lot of math classes, and there are a lot of things that get introduced in your physics classes as tools but aren’t really explained that well. Back when I was in school, we didn’t have Wikipedia or Youtube or Amazon (and I certainly didn’t have disposable income to spend on expensive math books that weren’t required for my classes) so unless you took a class on a topic or knew someone with that expertise, it was hard to fill in those gaps. By the end of my physics classes, I remember feeling like I was solving problems by pattern matching and mechanically applying memorized solutions and could get the right results but I no longer really knew what was going on. A few of the important ones that I definitely missed out on were tensors and topology. We used tensors plenty in physics classes, but I remember mostly just thinking of them as multi-dimensional matrices and a set of mechanical rules that you used to manipulate subscripts and superscripts in a way that let you do higher dimensional calcuations without having to write out a million terms. Every once in a while a professor would do something weird with them and we’d just have to take it on faith that it was valid, but it always left me feeling like there was a lot more that I didn’t understand.

This realization that I could now fill in those gaps led me down another rabbit hole of math books that occupied quite a bit of my year. I started with some youtube videos and various online resources that helped a lot. Then I picked up Tensor Calculus because some of those Dover books are inexpensive hidden gems. This one wasn’t. It might be fine if you are more of a mathematician, but I’m still fundamentally coming at things from a physics/engineering perspective and not that interested in proofs and derivations. The one that was much better for me was An Introduction to Tensors and Group Theory for Physicists. The preface basically described my exact situation of “lingering unease” around the concept and does a great job of filling in the blanks. It also made connections to group theory that I hadn’t really thought about. Group Theory comes up more often in computer science and is something I feel more comfortable with but it did remind me that I never actually took an Abstract Algebra class. I was also vaguely aware that despite seeming like it should be unrelated, Topology is pretty heavily based on Abstract Algebra and that was another topic that I wanted to fill in. So I got A Book of Abstract Algebra. That one I think does fall into the “hidden gems” category of Dover books. Highly recommended. I followed it up with Introduction to Topology which was… ok. It was clear enough and I could follow it, but it was pretty dry and I don’t feel like I gained any real insights from it. I didn’t come away from it thinking that Topology was amazing and having a different perspective on things. I just kind of feel like now I know a bunch of definitions and theorems. I also got Counterexamples in Topology which was highly recommended and I can see being a valuable reference, but it’s really not one that you just sit down and read cover to cover. Somewhere in the middle of my Dover math books spree, I also read Geometry, Relativity and the Fourth Dimension which was pretty shallow but a very quick and fun read.

The last math book I picked up that’s worth mentioning is Mathematical Methods for Physics and Engineering: A Comprehensive Guide. This is another book that I wish had existed when I was an undergrad. It’s big and extremely thorough. It basically covers all of the math you would need for an entire physics or engineering undergrad education. I really can’t think of a single mathematical concept, tool, or technique that I encountered in my undergrad physics and engineering that isn’t covered in it. For any given topic, there’s probably a better introduction or a more thorough treatment in some other book, but I’ve never seen any other book with the same breadth. It’s become my goto math reference and lives on a shelf by my desk now.

As a “palate cleanser” during my math refresher, I read The Poetics of Space which has been on my recommendations list for years and come up in numerous conversations. I don’t feel like I got as much out of it as I could have because I’m just not that familiar with the french poetry that he bases his discussions off of. I’m sure it’s one of those books that really needs to be read in its original French, but even in translation, it’s beautifully written and evocative.

I’ve been vegan for a while but only recently thought to actually read anything on the topic. Why We Love Dogs, Eat Pigs, and Wear Cows: An Introduction to Carnism is probably the best introduction to the ethical and moral philosophy behind veganism and animal rights. The Sexual Politics of Meat: A Feminist-Vegetarian Critical Theory makes a strong case for veganism as an important part of intersectional feminism. It was written in the 90’s though and both veganism (eg, at the time “vegan” wasn’t a commonly used term, so it only talks about “vegetarianism”) and feminism have changed quite a bit since then, so it needs to be read with that in mind. One chapter jumps right into fairly graphic descriptions of sexual violence without any warning, so that’s something to be aware of. On a completely different axis, How Not To Die: Discover the foods scientifically proven to prevent and reverse disease, while not explicitly “vegan”, is a great scientific look at nutrition from the doctor behind nutritionfacts.org and supports a vegan diet as at least a good starting point for avoiding many of the diseases and causes of mortality that plague the modern world.

I’ve been living in London for a few years now and in 2020, Phoenix and I become elligible for Indefinite Leave to Remain. Our main obstacle is passing a test that involves knowledge of British history, culture, and government. There’s an official guide, Life in the United Kingdom that covers everything that could be on the test along with practice tests and I’m pretty sure I could pass the test by cramming those for a bit. But I’ve got plenty of time and if I’m going to be living here long term, I figure I might as well know a bit more. Also, if you hadn’t noticed, the last few years have been pretty eventful in British politics and have exposed much of the world to the idiosyncratic and confusing way that very important decisions are made over here. In an effort to understand how these things work and the historical context behind them, I have been going through some “very short introduction” books to build up some background knowledge. So far I’ve gone through The British Empire, Nineteenth Century Britain, Twentieth Century Britain, The British Constitution, and British Politics. I’ll probably read a bunch more before I’m done.

Those books all fit into various themes or categories. In between them were a bunch that were pretty random:

Finally, the one piece of fiction I managed to read this year (to be fair, I read a lot of fiction in 2018, so I was taking a bit of a break on purpose) was Into the Dark (Dark Devices Book 1), which happens to have been written by my coworker. He wrote it for NaNoWriMo and self-published, which I strongly support. It’s not very long and leaves the story set up for future continuation, but it’s a nice fantasy read and reminds me of a slightly sci-fi’d up take on old D&D Underdark books.

Continuously Deploying Django with GitHub Actions

By anders pearson 12 May 2019

[Edit 2019-09-21: Updated with the new YAML syntax]

A lot has changed since then so I thought it was about time I updated. Especially since this weekend I switched from Jenkins to GitHub Actions.

So, a quick disclaimer before I go on: GitHub Actions is currently in public beta. They have been clear that it is not yet considered stable for production and comes with no warranty. It could change and break my setup at any time. They also haven’t released any information about what it will cost when it is fully out. These are my personal apps so I’m fine with all of that. I’ll try to remember to update this post when the beta is over, but in the meantime, use it at your own risk.

My old post covered the basic setup and, honestly, the general approach hasn’t changed much; I’ve just been swapping out the pieces. It’s worth reading the old post for more details, but I’ll recap the basic approach here so the rest of this post can just cover the new GitHub Actions stuff.

• I’m still packaging Django apps up in Docker images. My Dockerfile has changed, but these days there are a million articles on how to run Django in Docker, so I won’t cover those changes.
• I have three application servers where the Django apps run. Each app gets deployed to at least two of them so there is redundancy. This means that a server can fail or go down for maintenance without the site going down. For some of them, there are also Celery Worker and Celery Beat services running to handle offline tasks. There’s an nginx proxy in front of that setup proxying back to the individual Django apps. Consul, consul-template, and registrator are used to dynamically adjust the proxying setup so everything is handled smoothly when a server goes down or when an application instance is added or removed. This part still works basically the same so see the old post if you want more details on how that works.
• My servers are all managed with Salt, including the production settings and secrets. The Docker containers get those settings in environment variables. Those variables are put in place by Salt and handed to the containers via a couple shell scripts. Again, that part is all unchanged and you can get more details in the old post. What’s relevant here is that there is a docker-runner script on each server that will run an application’s container with all of the production settings injected and let me basically do manage.py commands in those containers. So docker-runner myapp migrate is equivalent to starting up the thraxil/myapp:$TAG container (where $TAG is specified in a designated file and is generally a git hash) with production settings and running manage.py migrate in the container.
• Static files are hosted on Amazon S3 and CloudFront. Again, this is pretty typical for Django deployments and is better covered elsewhere. All you need to know for my setup is that manage.py collectstatic and manage.py compress need to be called during the deployment to push the latest static files up.

The old approach to deployment was that I ran Jenkins server alongside my app servers. When there was a push to the master branch on GitHub, it would trigger a build there, which would do the following:

• Build a new docker image for the app. I like to include running unit tests as part of the Dockerfile. That ensures that the tests pass in the exact environment with all the same dependencies as the code will have in production.
• Tag that image and push it to the Docker Hub.
• Do a docker pull of that exact tag on all of the app servers.
• Write the tag out to the right place on those servers so docker-runner will use that version of the image.
• On one of the servers, run docker-runner myapp migrate, docker-runner myapp collectstatic, and docker-runner myapp compress to handle database migrations and static files.
• One by one, restart the processes on the app servers.

Jenkins used to run those steps with a fairly small shell script described in the old post. At some point, I converted that to a “proper” Jenkins pipeline specified in a Jenkinsfile. I never wrote about that, and I’m replacing it now, but it’s up here if you are curious. It did some things better than the the shell script version and made for a much nicer overall experience in the Jenkins web interface, but mostly it made me glad that I don’t often have to code in Groovy.

Jenkins worked OK for me but it’s always been a pretty awkward part of the stack and not a lot of fun to run and keep updated.

So when Github Actions came out and I got access to the public beta, I decided to see if I could replace my Jenkins setup.

What I came up with is seems to work pretty well.

Actions for a project are stored as code in a .github directory in your project. There’s a nice web UI for editing actions, but it’s worth looking at the code. My old blog post covered the deployment of my antisocial feed reader app, so for consistency, let’s look at how the GitHub Actions setup looks for it.

I have two workflows. The first just sets up and runs Jessie Frazelle’s branch-cleanup-action GitHub action which keeps things tidy by deleting merged branches. She has a great blog post on how it works that helped me start to get my head around Actions.

The main deploy workflow, in .github/workflows/deploy.yml starts with:

on:
push:
branches: master
name: deploy
jobs:
buildDockerImage:
name: Build docker image
runs-on: ubuntu-latest
steps:

The first step (actually technically two steps) that runs is:

    - uses: actions/checkout@master
- name: Build docker image
uses: actions/docker/cli@master
with:
args: build -t thraxil/antisocial:${{ github.sha }} . That builds the docker image, tagged with the git SHA, which GitHub Actions conveniently exposes in the github.sha variable. As I mentioned before, with my Dockerfiles, the unit tests run during the build, so this step also serves as a good check on PRs. The next two are fairly self-explanatory:  - name: docker login uses: actions/docker/login@master env: DOCKER_PASSWORD:${{ secrets.DOCKER_PASSWORD }}
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }} - name: docker push uses: actions/docker/cli@master with: args: push thraxil/antisocial:${{ github.sha }}

They log us into the Docker Hub and push the docker image there. The only new bit is that the login step uses the secrets field to grant itself access to some secret settings that are stored in the GitHub project settings.

The bulk of the deploy work (that I had to do, at least) is in the next stanza:

    - name: deploy
uses: thraxil/django-deploy-action@master
env:
APP: antisocial
KNOWN_HOSTS: ${{ secrets.KNOWN_HOSTS }} PRIVATE_KEY:${{ secrets.PRIVATE_KEY }}
SSH_USER: anders
WEB_HOSTS: ${{ secrets.WEB_HOSTS }} CELERY_HOSTS:${{ secrets.CELERY_HOSTS }}
BEAT_HOSTS: ${{ secrets.BEAT_HOSTS }} That uses a custom action that I placed in its own repo so I could easily re-use it across my projects. The great thing about GitHub Actions is that they are just Docker containers that get some specific environment variables and shared directories set up and run how you need them. GitHub Actions will find the Dockerfile in there, build it (if it hasn’t already cached), and run it in the appropriate environment. That means that it’s easy to package up pretty much any common deployment tool you can think of as a GitHub Action (if someone else hasn’t already done it) or just put your own together if you know how to make a Docker image and write a little shell script. If I were doing this from scratch or if it were a more complicated deployment process, I’d probably grab or build an Ansible action and do the rest of the deployment that way. In my case though, it’s only a couple steps and I pretty much already had a shell script written (see the old post). So I just made an action that pretty much runs that script in a container, with some tweaks to make it work in the new environment. The Dockerfile is pretty minimal. Just builds off debian:stable-slim (which GitHub highly recommends to keep images small and using a common base to maximize caching), installs openssh-client (the only package we need that isn’t already there) and drops in a small script. That script should look familiar from the old blog post. It just does a few additional things to deal with the GitHub environment variables and setting up a valid SSH config. All of the variables that it needs are either set via env in the workflow config above, or they are stored in the project settings as secrets (SSH private key, etc.) Finally, since I like to use Sentry to track exceptions and Sentry does a better job if you tell it when you deploy new code, I use a community published action to publish a new sentry release for the project:  - name: sentry release uses: juankaram/sentry-release@master env: ENVIRONMENT: production SENTRY_AUTH_TOKEN:${{ secrets.SENTRY_AUTH_TOKEN }}
SENTRY_ORG: thraxil
SENTRY_PROJECT: antisocial

That’s basically it. I’m mostly pleased with the setup. It took me a few hours to figure out all the pieces and work through some stupid bugs (my own) to get it working how I wanted, but now it’s pretty solid.

Like I said earlier, I really like that it works by just stringing together Docker containers. That means there’s never any question about when GitHub Actions will support some tool. If you can stick it in a Docker image, you can use it. Configuration is straightforward once you’ve spent some time with it (and the Web UI is surprisingly usable and powerful).

I feel like I’m only scratching the surface of what it’s capable of (I’m not even running anything in parallel). The more interesting uses will be less of this “traditional” kind of deployment pipeline and will take better advantage of the Actions’ direct access to the rest of GitHub’s APIs. Right now it feels like the community is still figuring out what those possibilities are and I’m excited to see what patterns emerge.

Shelfie

By anders pearson 24 Feb 2019

The #shelfie hashtag has been coming up lately. I’m bad at social media, but I thought I’d post mine here.

I moved from the US to Europe a few years ago and I had to massively cut down my book collection. In the last few years, I’ve started accumulating books again, but I’ve been a bit more purposeful this time around. So most of the books on my shelves are ones that I decided were important enough to bring across the ocean with me or that I’ve wanted to have for reference since the move. Fiction and “lighter” non-fiction that I’m only going to read through once I try to buy in electronic formats so it doesn’t take up precious space in my flat.

My books are vaguely organized though not totally consistently. The top shelf here is the really random stuff, including the few bits of fiction I have in physical form, Dutch language books (that’s “Charlie and the Chocolate Factory” in Dutch), and of course, one of my small paintings in the front.

The middle shelf is sort of C++, Linux, Networking, and low level systems stuff.

Bottom is mostly Cloud, DevOps, and SRE related.

These shelves are right below a window so I had a hard time getting a clearer photo without it washing out a bit.

Top shelf here is focused more on CS and distributed systems. Some classics here like SICP, CLSR, the Dragon Book, CTM, etc.

Bottom shelf is the fun one with my weird art books. The foil wrapped one is the NASA Graphics Standards Manual. The old looking one next to it is an 1800’s edition of Elihu Vedder’s illustrated version of the Rubaiyat of Omar Khayyam.

Top here is math and random programming books that didn’t fit on the other shelves.

The bottom shelf has a few electronics books and random large ones that won’t fit elsewhere. The filing box is mostly full of printed out CS papers.

2018 Music

By anders pearson 14 Jan 2019

It took me a little longer than I’d planned, but here is my yearly music round up for 2018 just as I’ve done for 2017, 2016, and 2015.

As usual, this isn’t exhaustive and only includes bandcamp links. I have no other commentary other than that I enjoyed these. My tastes run towards weird, dark, loud, and atmospheric. Enjoy.

Behind the Music

By anders pearson 02 Jan 2018

I posted my yearly music roundup yesterday, which I’ve done for the last three years. Today I thought I’d just take a moment to explain how I go about creating those posts. Eg, there are 165 albums in the last post and I link both the artist and album pages on each. Do you really think I manually typed out the code for 330 links? Hell no! I’m a programmer, I automate stuff like that.

First of all, I find music via a ton of different sources. I follow people on Twitter, I subscribe to various blogs’ RSS feeds, and I hang out in a bunch of music related forums online. So I’m constantly having new music show up. I usually end up opening them in a new tab until I get a chance to actually listen to them. Once I’ve listened to an album and decided to save it to my list, my automation process begins.

I’m a longtime emacs user, so I have a capture template set up for emacs org-mode. When I want to save a music link, I copy the URL in the browser, then hit one keyboard shortcut in emacs (I always have an emacs instance running), I paste the link there and type the name of the artist. That appends it to a list in a text file. The whole process takes a few seconds. Not a big deal.

At the end of the year, I have this text file full of links. The first few lines of this last year’s looks something like this:

** Woe - https://woeunholy.bandcamp.com/album/hope-attrition
** Nidingr - https://nidingrsom.bandcamp.com/album/the-high-heat-licks-against-heaven
** Mesarthim - https://mesarthim.bandcamp.com/album/type-iii-e-p
** Hawkbill - https://hawkbill.bandcamp.com/track/fever
** Black Anvil - https://blackanvil.bandcamp.com/album/as-was
** Without - https://withoutdoom.bandcamp.com/
** Wiegedood - https://wiegedood.bandcamp.com/releases

For the first two years, I took a fairly crude approach and just record an ad-hoc emacs macro that would transform, eg, the first line into some markdown like:

* [Woe](https://woeunholy.bandcamp.com/album/hope-attrition)

Which the blog engine eventually renders as:

<li><a href="https://woeunholy.bandcamp.com/album/hope-attrition">Woe</a></li>

A little text manipulation like that’s a really basic thing to do in emacs. Once the macro is recorded, I can just hit one key over and over to repeat it for every line.

Having done this for three years now though, I’ve noticed a few problems, and wanted to do a little more as well.

First, You’ll notice that the newest post links both the artist and the album. This, despite the fact that I only captured the album link originally.

Second, if you look closely, you’ll notice that not all of the bandcamp links are quite the same format. Most of them are <artist>.bandcamp.com/album/<album-name>, but there are a few anomalies like https://hawkbill.bandcamp.com/track/fever or https://withoutdoom.bandcamp.com/ or https://wiegedood.bandcamp.com/releases. The first of those was a link to a specific track on an album, the latter two both link to the “artist page”, but if an artist on bandcamp only has one album, that page displays the data for that album. Unfortunately, that’s a bad link to use. If the artist adds another album later, it changes. Some of the links on my old posts were like those and now just point at the generic artist page.

So, going from just the original link that I’d saved off, whatever type it happened to be, I wanted to be able to get the artist name, album name, and a proper, longterm link for each.

I’ve written some emacs lisp over the years and I have no doubt that if I really wanted to, I could do it all in emacs. But writing a web-scraper in emacs is a little masochistic, even for me.

The patth of least resisttance for me probably would’ve been to do it in Python. Python has a lot of handy libraries for that kind of thing and it would’ve have taken very long.

I’ve been on a Go kick lately though, and I ran across colly, which looked like a pretty solid scraping framework for Go, so I decided to implement it with that.

First, using colly, I wrote a very basic scraper for bandcamp to give me a nice layer of abstraction. Then I threw together a real simple program using it to go through my list of links, scrape the data for each, and generate the markdown syntax:

I just run that like:

 go run yearly.go | sort --ignore-case > output.txt

And a minute or two later, I end up with a nice sorted list that I can paste into my blog software and I’m done.

2017 Music

By anders pearson 01 Jan 2018

For the third year in a row, here is my roundup of music released in 2017 that I enjoyed.

One of the reasons that I’ve been making these lists is to counteract a sentiment that I encounter a lot, especially with people my age or older. I often hear people say something to the effect of “The music nowadays just isn’t as good as [insert time period when they were in their teens and twenties]”. Sometimes this also comes with arguments about how the internet/filesharing/etc. have killed creativity because artists can’t make money anymore so all that’s left is the corporate friendly mainstream stuff. I’m not going to get into the argument about filesharing and whether musicians are better or worse off than in the past (hot take: musicians have always been screwed over by the music industry, the details of exactly how are the only thing that technology is changing). But I think the general feeling that music now isn’t like the “good old days” is bullshit and the result of mental laziness and stagnation. We naturally fall into habits of just listening to the music that we know we like instead of going out looking for new stuff and exploring with an open mind. My tastes run towards weird dark heavy metal, so that’s what you’ll see here, but I guarantee that for whatever other genres you are into, if you put the effort into looking just a little off the beaten path, you could find just as much great new music coming out every year. I certainly love many of the albums and bands of my youth, but I also feel like the sixteen year old me would be really into any one of these as well.

OK, I know I’ve said that I don’t do “top 10” lists or anything like that, but if you’ve made it all the way to the bottom of this post, I do want to highlight a few that were particularly notable: Bell Witch, Boris, Chelsea Wolfe, Goatwhore, Godflesh, King Woman, Lingua Ignota, Myrkyr, Pallbearer, Portal, The Bug vs Earth, Woe, and Wolves in the Throne Room.

Plus special mention to Tyrannosorceress for having my favorite band name of the year.

In the Wild

By anders pearson 21 Dec 2017

Last night, I was scanning the /r/guitarpedals subreddit. Something I have been known to do… occasionally.

I see this post:

OK, someone’s trying to identify a pedal they came across in a studio. I’m not really an expert on boutique pedals, but I have spent a little time on guitar forums over the years so who knows, maybe I can help?

Clicking the link, there’s a better shot of the pedal:

Hmm… nope, don’t recognize it. Close the tab…

OK, yeah, the pedal doesn’t look familiar, but the artwork on it sure does…

That’s one of my drawings from about 2008.

So at some point, someone out there built a custom guitar pedal, used one of my drawings for it, it ended up in a recording studio somewhere, someone else found the pedal in the studio, took a picture, posted it on reddit, and I stumbled on it.

Anyone who’s known me for very long knows that I post all of my artwork online under a Creative Commons Public Domain license. I’m not a career artist and it’s not worth the hassle for me to try to restrict access on my stuff and I’d rather just let anyone use it for whatever they want. So this obviously makes me very happy.

I’ve had plenty of people contacting me over the years asking to use them. That’s unnecessary but appreciated. My paintings and drawings have appeared on dozens of websites and articles. There are a couple books out there that include them (besides the Abstract Comics Anthology that I was actively involved in). I know that there’s at least one obscure death metal album out there that uses one of my paintings for the cover. I’ve had a few people say they were going to get tattoos, but I’ve never seen a photo of the results, so I can’t say for certain whether anyone followed through on that.

This is the first time that I’ve run into my own work like this randomly in a place that I wasn’t looking.

BTW, no one has yet identified the pedal, so obviously, if you know anything about who built it, let me know.