A few years back, I got really into Cellular Automata. I had a big project at work that involved implementing a bunch of simulations via CAs and generally found it to be an interesting topic.
I had the good fortune to spend some time on that project with Don Hopkins, who showed me a bunch of tricks and pointed me in some very fruitful directions (including recommending the book Cellular Automata Machines by Toffoli and Margolus as a primer).
The result is that I've got a bunch of tricks in and around Cellular Automata in my head. I don't think I invented any of these. I don't think Don even did (maybe he was just being modest). But they are things that I've never seen written down and probably should be. It recently occurred to me that I have a blog so I should, you know, do that.
I've got enough that I plan to do it as a small series of blog posts (if I can keep up the writing momentum over the next week or two).
Starting with a trick that's pretty simple but is easy to overlook.
It deals with standard two-dimensional CAs (think Game of Life) and is really just a clever way to avoid a common naïve implementation problem.
To review, there is a rectangular, tiled Grid made up of Cells, each of which has a particular State (such as "on" or "off" or a numerical value). At each generation, the new state of each cell is determined by applying a fixed set of Rules or a mathematical function using the states of the cells in its Neighborhood.
A typical neighborhood for CA is the Von Neumann Neighborhood which includes the cell itself (which we'll call "center" or just
C) and the four cells that share sides with it, which we'll call "north", "south", "east", and "west" (or
W). There is also a Moore Neighborhood which expands the Von Neumann Neighborhood with Northwest, Southwest, Northeast, and Southeast.
The obvious representation for a CA grid in most programming languages will be a two-dimensional array of whatever type the cell's state needs to be stored as.
The naïve programmer then writes something like the following (pseudocode):
/* NOTE: square grid in all examples to simplify */ grid[SIZE][SIZE] for j from 0 to SIZE - 1: for i from 0 to SIZE - 1: c = grid[i][j] /* get the neighborhood for the cell */ n = grid[i - 1][j] s = grid[i + 1][j] e = grid[i][j + 1] w = grid[i][j - 1] /* and calculate the next state... */
And they run into a problem right away. For cells on the edges of the grid, one or more of the neighbors in their neighborhood doesn't exist. They probably get some kind of runtime error with an array out of bounds exception.
The standard solution here is not hard to imagine. Typically, you decide if you want the edges of the grid to "wrap" around like a torus or just have some default value. Then insert some bounds checking in the code to handle it:
grid[SIZE][SIZE] for j from 0 to SIZE - 1: for i from 0 to SIZE - 1: c = grid[i][j] /* get the neighborhood for the cell */ if i != 0: n = grid[i - 1][j] else: n = grid[SIZE - 1][j] if i != SIZE - 1: s = grid[i + 1][j] else: s = grid[j] if j != SIZE - 1: e = grid[i][j + 1] else: e = grid[i] if j != 0: w = grid[i][j - 1] else: w = grid[i][SIZE - 1] /* and calculate the next state... */
Not pretty, but it works and lets the programmer get on to exploring the fun world of Cellular Automata.
Now, the problem is that there is a lot of conditional logic happening in an inner loop. Each of those tests gets carried out on every cell in the grid even though only a small percentage of them is really necessary (on a sufficiently large grid there will be way more inner cells than cells on the edges).
So here's the first trick.
What it boils down to is putting a "gutter" around the grid to ensure that every cell becomes one of the inner cells and doesn't require any testing.
That translates to expanding the two-dimensional array holding the grid so it has an extra row of cells around the entire border. Then, before running each generation, prepopulating those gutters with the wraparound data (or just defaults). The code then loops over only the inside part of the expanded grid and can always safely assume that every cell has a full set of neighbors.
The pseudocode becomes something like:
grid[SIZE + 2][SIZE + 2] /* wrap gutters */ for i from 0 to SIZE + 1: grid[i] = grid[i][SIZE + 1] grid[i][SIZE + 1] = grid[i] grid[i] = grid[SIZE + 1][i] grid[SIZE + 1][i] = grid[i] for j from 1 to SIZE + 1: for i from 1 to SIZE + 1: c = grid[i][j] /* get the neighborhood for the cell */ n = grid[i - 1][j] s = grid[i + 1][j] e = grid[i][j + 1] w = grid[i][j - 1] /* and calculate the next state... */
This isn't the most mind-blowing thing ever, but I've seen quite a few programmers starting out with CAs miss it.
I also like it as an example of a general heuristic for optimizing code by making a pass over the data to clean or prepare it and avoid expensive boundary checking altogether at a later stage when it will be more expensive.
Let's take a moment to think about the complexity analysis though. In a strict, theoretical sense, none of this really matters. If you're dealing with an N by N grid, both pieces of code are O(N^2) because real computer scientists don't care about constants. If anything, from a theoretical perspective, the second approach is worse if you are interested in storage costs.
But in the real world, I can pretty much guarantee that the second approach will perform noticeably better. Sometimes constants matter.
The next part of this series, when I get around to writing it, will be a bit more implementation specific (Python, in my case) but will reap some additional rewards from the addition of gutters to the grid representation.