Thraxil

// thraxil.org
users
anders

generic functions coming to python?

By anders pearson 06 Apr 2006

About once a year or so, I find myself spending some time immersing myself in the world of Lisp (and Scheme). Lisp is sort of a proto programming language. Lisp fans make a strong argument that it’s on a different level entirely than all other languages. I pretty much buy into that argument, but I’ve never quite been able to get it to click for me to the point where I would feel comfortable writing a real, production application in Lisp. I always start out with the intention of writing something useful and large enough to get myself over that hump but I always either get distracted with other stuff or hit some wall (usually involving just not being able to get some library I want to use installed on whatever Lisp interpreter I’ve chosen).

Nevertheless, my little excursions into the land of parenthesis have always been worthwhile. It always seems that I pick up some new or deeper understanding of some aspect of programming. When I first went through the exercises in SICP it was a revelation for me. Suddenly all kinds of things like recursion, first-class functions, data driven programming, and constraint systems made much more sense. Later attempts taught me the power of code as data, having full, unrestrained access to the abstract syntax tree of your running program and the power of a true macro system (I still don’t feel comfortable with Lisp’s macro syntax but I’ve at least decided that it’s worth learning at some point). Scheme was also my first real exposure to a proper, interactive REPL environment and that style of development impressed me enough that when python came along, I was probably more willing to give it a look because it had a nice interactive interpreter. Common Lisp and SLIME are still the gold standard in that department though.

Lisp certainly isn’t the only language that I’ve learned valuable lessons from though.

Haskell gave me a lot of respect for pattern matching, list comprehensions, lazy evaluation and (along with SML) powerful type inference. It’s another language that I intend to keep coming back to because I’m sure it has more to offer me. Haskell also changed my mind from “significant whitespace is evil!” to “significant whitespace is a bloody fantastic idea!” and again, probably helped butter me up for python.
Since C was my first real programming language, I won’t even attempt to list all that it taught me.
Assembly is invaluable for peeling back the curtain and knowing what really goes on in the black box. Plus, spending time down there will free you of all fear of pointers or hex math.
I can’t think of anything too specific that I learned from FORTH, but it definitely warped my mind and showed me just how much you can take out of a language and still have something powerful.
Perl taught me that if you have lists, hashes, and references, you can pretty much get away without any other data structures for 99% of real world applications. It was also my first real exposure to the power of closures.
Java taught me how easy and dangerous it is to over-engineer.
Smalltalk taught me about code blocks and how to think about OOP as messages instead of function invocations.
What little Ruby i’ve played with has taught me that I’m not really sure why anyone would bother writing in Ruby when Smalltalk exists, but I still plan on spending more time exploring there before I discount it entirely.
Erlang has taught me how to think about distributed architectures (forming much of the basis for my REST obsession and the microapp stuff like Tasty and Pita that I’ve been pushing lately). It also taught me that most “enterprise” approaches to building high-reliability, scalable systems are misguided.
Python is a constant reminder of the power of simplicity and elegance and, for me, has come to best embody the “make the easy problems easy to solve and the hard problems possible” principle.

Still, Lisp remains the wellspring of mindbending “AHA!” moments for me. My most recent delving into Lisp was with Peter Siebel’s book “Practical Common Lisp“. On that last pass through, generic functions finally clicked for me and I was able to seem them as the conceptual underpinnings of object oriented programming. Suddenly, things like aspect oriented programming and python’s decorators were just trivial applications of this one root idea.

Now, I see that Guido’s thinking about adding generic functions to python 3000. Clearly, I think this is a good idea and I’m really curious to see how the syntax comes out. PJE’s implementation in PEAK is interesting, but having it in the syntax of the language should allow it to be even smoother.

migrating applications between servers with (virtually) zero downtime

By anders pearson 27 Mar 2006

The problem: You need to move a web application to a different server. Eg, an application accessible as app.example.com is currently running on a machine with the IP address 1.2.3.4 and it needs to be moved to 1.2.3.5. This will involve a DNS change. DNS updates, unfortunately aren’t very predictable. Unless you administrate your own DNS servers, you probably don’t have much control over exactly when a change goes out. Even if you can control DNS, it can take up to a day for the new DNS entry to propagate out to users (because it may be cached at several points in between).

If you admin your own DNS, you can lower the TTL (time-to-live) and do the update exactly when you want and that will probably be good enough. Most of us don’t have that level of control over our DNS though. At work, we don’t have direct control over DNS; the network admins take our requests and push everything out in a nightly batch, with a relatively high TTL. For thraxil.org and other sites that I run on my own, I rely on free DNS services which I also have no real control over. Pretty much anyone who isn’t an ISP is probably in a similar situation.

At work we’ve come up with a way to do the move with basically zero downtime and complete control over the exact time that the transition happens. It doesn’t require any admin access above and beyond control over the two web servers involved. IMO, it’s just about as simple as messing with DNS TTL stuff.

We’re probably not the first to have come up with this technique. It’s fairly simple and obvious if you’re familiar with apache configuration. I haven’t seen it mentioned anywhere before though and we had to figure it out ourselves, so clearly it’s not as well-known as it ought to be.

The technique basically revolves around apache’s mod_rewrite module and its ability to proxy requests. Make sure it’s installed and enabled on your servers. With some setups, you’ll also need to ensure that mod_proxy is installed for mod_rewrite to be able to do proxying. Make sure you understand how your apache server has been compiled and configured. Technically, you only need it set up on one of them, but it’s useful in general so you’ll probably want it on both.

The procedure starts with getting the application running on both servers. On the new one, it will have to be running on some other hostname temporarily so you can test and make sure it’s all working. Then, at a point in time that’s convenient for you (late night, early morning, etc.) you replace the configuration on the old server with a proxying directive that proxies all requests coming in to that server to the new one. Now, you can do the DNS change (or request the change from the DNS admins). Once the proxy is running, any request for to either the old IP address or the new one will both end up going to the new server. Once the DNS change has gone through and has had time to propagate out to everyone (checking the logs on the old server to make sure there are no more requests to it is a good way to be fairly sure), you can safely turn off the proxy and decommision the old server.

Here’s a more detailed example. Say we have an application running with a hostname of app.example.com, which is an alias for the IP address 1.2.3.4. We went to move it onto 1.2.3.5 with as little downtime as possible.

First, we register app-new.example.com to point to 1.2.3.5 (or just set it up in /etc/hosts since it doesn’t really have to be public). We get the application installed and running, probably against a test database, on 1.2.3.5 and make sure that everything works like it should.

On 1.2.3.4, apache’s configuration probably has a virtual host section like:

:::apacheconf
<VirtualHost *>
   ServerName app.example.com
   # ... etc. 
</VirtualHost>

We duplicate the same virtual host section to the apache config on the new server, 1.2.3.5. It’s a good idea to then test it by overriding app.example.com in your /etc/hosts (or your OS’s equivalent) to point to the new server.

Then, on 1.2.3.4 we change it to:

:::apacheconf
<VirtualHost *>
  RewriteEngine On
  RewriteRule (.*) http://app-new.example.com/$1 [L,P]
</VirtualHost>

If there’s data to migrate, we then take apache down on 1.2.3.4, migrate the data to the new server, and bring apache back up. If data needs to be migrated, there isn’t really a way to avoid some downtime or you’ll run the risk of losing some during the move. Having a script or two pre-written to handle the data migration is usually a good idea to ensure that it will go quickly and smoothly. There are other tricks to minimize this downtime, but for most of us a few seconds or a few minutes of downtime isn’t the end of the world (and is highly preferable to lost or inconsistent data).

At this point, requests to app.example.com will be proxied over to 1.2.3.5 without users noticing anything except perhaps a little extra latency.

Then the request is put in to the DNS admins to change the app.example.com alias to resolve to 1.2.3.5 instead of 1.2.3.4.

Eventually, that will go through and users will start hitting 1.2.3.5 directly when they go to app.example.com. For a while, there will still be a trickle of hits to 1.2.3.4, but those should fade away once the DNS change propagates.

That’s pretty much what we do. It’s worked well for us for quite a few server moves. Obviously, since this involves messing with the configuration on production servers, you shouldn’t attempt it without fully understanding what’s involved. You should also test it out on a test setup before trying it with real applications. Plus, you should always have a fallback plan in case any step of it doesn’t work like you expect it to for some reason.

tasty lightning

By anders pearson 08 Mar 2006

at pycon the other week, Jonah and i gave a lightning talk on tasty and some of the ideas we’ve been thinking about in the area of “mini apps”. a lightning talk is a super fast 5 minute presentation. the idea is that a lot of people can get an idea or two each out in front of a whole conference without much stress or prep work. in particular, people tend to give lightning talks to show off a cool little trick or to just put out an idea that might not be fully formed yet and get some feedback early on. that was pretty much why we gave one.

it occurred to me though that i should probably write up the basics of the talk and post it here partly to disseminate the ideas a little more and partly because writing it forces me to clarify some of my own thinking on the subject.

so this post will basically cover the material that Jonah and i raced through in our five minutes at pycon with a bit more elaboration and refinement.

our talk was titled “Using the Web as Your Application Platform” but could also really be thought of as “Taking REST to Extremes”. none of the ideas we’ve been exploring are particularly novel or cutting edge on their own, but i think we’ve been pushing things a bit further than most.

everything i’ve been building lately has been very strongly guided by the REST architectural principles. i drank as much of that particular kool-aid as i could find. the main concept behind REST is basically designing your application to take full advantage of HTTP and the inherent architecture of the web, which is much more thoughtfully designed than many people realize.

like most developers, i’ve been chasing the pipe dream of “reusable components” for a long time. every programming shop out there at some point dreams of creating a library of reusable components specific to their problem domain that they can just put together in different ways to quickly tackle any new projects that come along and never have to write the same thing more than once. there are a million different ways that we attempt to build our little programming utopia: libraries, frameworks, component architectures, fancy content management systems, etc. every hot new language promises to make this easier and every vendor bases their sales pitch around it. most of the time though, every solution comes up a little short in one way or another. with deadlines and annoying real world requirements and restraints, the utopia never quite seems to materialize.

eg, where i work, we develop in python, perl, java, and php and we use a number of frameworks. some of those because they’re productive and nice to work with, some because of legacy reasons, some because of the particular environment that an application has to run in, and some because they are local customizations of open source applications and we can’t exactly change the language they were implemented in to suit our whims. i don’t think this situation is all that uncommon for web shops. right away, this is clearly a huge limitation on how much reusability we can get out of a library or framework. if i write something in python it isn’t going to be much use in java or php. so for a common task, like authenticating against the university’s central system, we’ve got to implement it at least four times, once for each language.

REST offers a solution for this particular predicament though. every language out there and every framework that we might reasonably use comes with built in support for HTTP. if you build a small self-contained application where you might have written a library or a component, it’s immediately usable from all the other languages and frameworks.

this was the reasoning behind tasty’s implementation as an application with a REST API instead of just as a library. tasty was written in python with turbogears which is my current high productivity environment but it was written as part of a project which we have to eventually deploy in a java course management system and which we’re also doing UI prototyping for in Plone. to be blunt, i just can’t stand java. implementing a full featured tagging library in java on a j2ee stack would have taken me months and probably driven me nuts. i could have built it on Plone; that would have been reasonable, but it still would have taken me a bit longer and i would have had to wrestle with converting the relational model that tasty uses to an the object database that Plone uses and it probably would have ended up buggy or slow and my head might have exploded.

so instead i built it with the REST API and now it can be used by any application written in any language with just a little bit of glue. we already use it with python and perl “client” applications, Sky has written a javascript client, and i’m sure the java and php clients aren’t far off.

tasty isn’t particularly unique either. i’ve been writing applications that are designed to be tied together for a while now and i’ve got a small stable of them that i can and do mash together to build more complex applications.

this weblog is a good example. the main application is python and turbogears but it ties together at least four mini applications behind the scenes. full text search is handled by a seperate application. when you type something into the searchbox here it makes an HTTP request to the search application behind the scenes. when you view any post, it makes an HTTP request to a tasty server to get the tags for that post. if you log in to the admin interface, it uses a seperate application that does nothing but manage sessions. if you upload an image, it passes the image off via an HTTP POST request to a seperate application that replies with a thumbnailed version of the image. and i’ve only started converting to this strategy. i plan on extracting a lot more of the functionality out into seperate applications before i’m through. eg, i think that threaded comments can be their own application, i have an application that does just markdown/textile/re-structured text conversion that i plan on plugging in, templating and skinning can be a seperate application, and i’ve got a nifty event dispatching application that opens up a number of potential doors.

one good example i can bring out to demonstrate just how far we’re taking this concept is a tiny little internal application that we call “pita”. pita implements “data pockets”. essentially, it’s an HTTP accessible hashtable. you can stick some data in at a particular ‘key’ using an HTTP POST request and then you can pull that data out with a ‘GET’ request to the same url.

using curl on the commandline, a sample session would look like this (assuming that pita.example.com was running a pita server):

% curl -X POST -F "value=this is some data to store" http://pita.example.com/service/example/item/foo
% curl -X POST -F "value=this is some more data" http://pita.example.com/service/example/item/bar
% curl http://pita.example.com/service/example/item/foo
this is some data to store
% curl http://pita.example.com/service/example/item/bar
this is some more data
% curl -X DELETE http://pita.example.com/service/example/item/foo
% curl -X DELETE http://pita.example.com/service/example/item/bar

it’s a fairly trivial application, but it comes in quite useful, particularly if you want to share a little data between multiple applications. the applications sharing data don’t have to all have access to the same database, they just have to all point at the same pita server and agree on a convention for naming keys.

doing the above sequence of pita operations from within a python session is fairly trivial too using my little restclient library:

:::pycon
>>> from restclient import POST, DELETE, GET
>>> base = "http://pita.example.com/service/example"
>>> POST(base + "/item/foo", params=dict(value="this is some data to store"))
'ok'
>>> POST(base + "/item/bar", params=dict(value="this is some more data")
>>> print GET(base + "/item/foo")
this is some data to store
>>> print GET(base + "/item/bar")
this is some more data
>>> DELETE(base + "/item/foo")
>>> DELETE(base + "/item/bar")

using JSON as the format for the payload (or occasionally XML for some of the really tough stuff) makes exchanging more complex structured data very easy.

hopefully by now it’s clear that we’re really following through on the promise of the title of the talk. the web itself is our application platform, framework, or component framework (depending on how you look at it). it doesn’t matter much what each of the components are built out of, as long as they speak the same language to each other (which in our case is the language of the web: HTTP).

aside from the language independence win (which is certainly not insignificant for us, although it might not impress developers who stick to a single platform as much), this approach has a number of other nice qualities.

first, it positively forces you to design components that are loosely coupled and functionally cohesive; two primary goals of software design. it’s one thing to hide an implementation behind a library API, it’s quite another when the abstraction is so tight that a component could be swapped out with one written in a completely different language and running on a different physical computer and the other components would never be any the wiser.

second, it actually scales out really well. one of the first responses i usually get when i start explaining what we’re doing to other programmers is “but isn’t doing a bunch of HTTP requests all over the place really slow?”. that question sort of annoys me on the grounds that they’re implicitly optimizing prematurely (and many of you know by now how strongly i’m against premature optimization). yes, making an HTTP request is slower than a plain function invocation. however, in practice it isn’t so much slower that it actually makes much of any difference that approaches being a bottleneck. there are some good reasons for this. first, the nature of how these things tie together usually means only a couple of backchannel HTTP requests to each mini app involved on a page view; not hundreds or thousands. an HTTP request might be slower than calling a function from a library, but only by a few milliseconds. so using this approach will add a few milliseconds to the time on each page view, but that’s pretty negligable. remember, most significant differences in performance are due to big O algorithm stuff, not differences of milliseconds on operations that aren’t performed very often. a second argument is that these sorts of components are almost always wrapping a database call or two. while an HTTP request might be much slower than a function call, it’s not much more overhead at all compared to a database hit. i’ve done some informal benchmarks against mini apps running locally (i’m certainly not advocating that you spread your components out all over the web and introduce huge latencies between them; there should be a very fast connection between all your components) and the overhead of an HTTP request to local cherrypy or mod_python servers is pretty comparable to the overhead of a database hit. so if you’re comfortable making an extra database request or two on a page, hitting an external app shouldn’t be much different.

one of the most powerful arguments against the performance naysayers though is the scalability argument. yes, there’s a small performance hit doing this, but the tradeoff is that you can scale it out ridiculously easily. eg, if thraxil.org all of a sudden started getting massive amounts of traffic and couldn’t handle the load, i could easily setup four more boxes and move the tasty server, the full-text search server, and image resizer, and the session manager out to their own machines. the only thing that would change in the main application would be a couple urls in the config file (or, with the right level of DNS control, that wouldn’t even be necessary). the hardware for each application could be optimized for each (eg, some applications would get more out of lots of memory while others are CPU or disk bound). plus, there exist a myriad of solutions and techniques for load balancing, caching, and fail-over for HTTP servers so you pretty much get all that for free. instead of implementing caching in an application, you can just stick a caching proxy in front of it.

scaling out to lots of small, cheap boxes like that is a proven technique. it’s how google builds their stuff (they use proprietary RPC protocols instead of HTTP, but the idea is the same). large parts of the J2EE stack exist to allow J2EE applications to scale out transparently (IMO, it takes an overly complex approach to scaling, but again, the core idea is basically the same). you can think of the web itself as a really large, distributed application. REST is its architecture and it’s how it scales. so, arguably, this approach is the most proven, solid, tested, and well understood approach to scaling that’s ever been invented.

i really only cover the scalability thing though as a counter to the knee-jerk premature optimization reflex. i mostly view it as a pleasant side effect of an architecture that i’m mostly interested in because of its other nice attributes. even if it didn’t scale nearly as well, for me and probably the lower 80% of web developers who don’t need to handle massive amounts of traffic and for whom maintainability, loose coupling, and reusability trump performance it would still be an approach worth trying. luckily though, it appears to be a win-win situation and no such tradeoff needs to be made.

i have a ton of other things that i’d like to say about this stuff. thoughts on different approaches to designing these mini applications, how the erlang OTP was a large inspiration to me, tricks and techniques for building, deploying and using them, recommendations for squeezing even more performance out of them and avoiding a couple common pitfalls, ideas for frameworks and libraries that could make this stuff even easier, and even one idea about using WSGI to get the best of both worlds. but i’ve already covered a lot more here than was in the lightning talk so i’ll save those ideas for future posts.

the only other thing which i do want to note and that i didn’t get a chance to mention at pycon was Ian Bicking’s posts on the subject of small apps instead of frameworks. Ian, i think, has been thinking about some of the same stuff we have. i was building these small apps (of a sort) before i read his post, but i hadn’t really thought much of it. coming across Ian’s posts made me a lot more conscious of what i was doing and helped me crystalize some of my thoughts and really focus my development and experimentation.

anyway, expect more from me on this subject. and hopefully, by next year’s pycon, we’ll have enough nice demos, benchmarks, and experience building these things that we can give a proper full talk then (or actually at just about any other web/programming conference, since these ideas aren’t really python specific at all).

turbo thraxil

By anders pearson 04 Mar 2006

i finally got around to porting this site to the TurboGears framework, which i’ve been using a lot lately.

it was already CherryPy + SQLObject, so it wasn’t that big a deal.

i also used Routes to map the urls to controller methods. i’m generally ok with the default cherrypy approach, but for thraxil.org, the url structure is fairly complicated and Routes certainly made that a lot cleaner.

next, i’m working on refactoring as much of the code out into seperate mini REST applications as i can (like Tasty) to make things even more manageable. of course, i need a few more unit tests before i can really do that safely…

wrong again

By anders pearson 03 Feb 2006

i’m losing my knack for predicting what sort of horrors the universe has in store for us.

first, there were cockroaches controlling robots and robots controlling mice and i was worried about cockroach controlled rodents. then there were zombie dogs and i was worried about cockroach controlled zombie animals.

now, i learn that the cockroaches are merely pawns of the wasp, hairworms are controlling grasshoppers, and the Toxoplasma Gondii protozoa are up to some kind of evil scheme involving controlling rats in order to subtly alter human personalities (and possibly drive us insane) via kitty litter.

there’s layer upon layer of conspiracy and intrigue in the animal world and humans are caught in the middle of it. i’m not sure we’ll ever figure out who’s really pulling the strings or why, but i for one would like to get the hell off this planet now. it’s clearly not safe here.

meanwhile, our fearless leader is worried about human-animal hybrids…

scaling tag clouds

By anders pearson 13 Dec 2005

While we’re on the subject of tagging, let’s talk a little bit about tag clouds and their display.

Tag clouds are nice visual representations of a user’s tags with the more common clouds displayed in a larger font, perhaps in a different color. Canonical examples are delicious’ tag cloud and flickr’s cloud.

It’s a clever and simple way to display the information.

Doing it right can be a little tricky too. A tagging backend (like Tasty) will generally give you a list of tags along with a count for each for how many times it appears.

The naive approach is to divide up the range (eg, if the least common tag has a count of 1 and the most common 100) into a couple discrete levels and assign each to a particular font-size (eg, between 12px and 30px).

The problem with the naive approach (which I’m now noticing all over the place after spending so much time lately thinking about tag clouds) is that tags, like many real-world phenomena typically follow a power law. The vast majority of tags in a cloud will appear very infrequently. A small number will appear very frequently.

Here‘s an example of a cloud made with this sort of linear scaling.

It’s better than nothing, but as you can see, most tags are at the lowest level and there are just a couple really big ones. Since it’s dividing up an exponential curve into equal chunks, much of the middle ground is wasted.

To make a cloud more useful, it needs to be divided up into levels with a more equal distribution. The approach i found easiest to implement was to change the cutoff points for the levels. Conceptually, it’s sort of like graphing the distribution curve logarithmically. so instead of dividing 1-100 up as (1-20, 21-40, 41-60, 61-80, 81-100), it becomes something like (0-1, 2-6, 7-15, 16-39, 40-100).

That turns that same cloud into this, which I think makes better use of the size spectrum.

The actual algorithm for doing the scaling requires a bit of tuning but this is the prototype code I wrote for testing that produced that nicer scaled cloud:

tasty

By anders pearson 12 Dec 2005

at work, we build lots of custom applications for specific classes or professors. this means that most of our stuff isn’t very generalizable and gets locked up in the ivory tower, which is a shame since we do (IMHO) good work. every once in a while though, something comes along that is a little more general purpose that other people might find useful.

lately, i’ve been on a kick of building small, self-contained web applications that are designed to be mixed and matched and used to build larger applications. exactly the kind of thing that Ian Bicking wrote about.

my most recent mini-application, and one that’s made it out of the ivory tower, is called Tasty and is a tagging engine. probably the easiest way to think of it is as del.icio.us but designed to be embedded inside an application. it supports a very rich tagging model (basically what’s explained on tagschema.com) and is very efficient.

Tasty was written in python using the excellent TurboGears framework. but Tasty’s interface is just HTTP and JSON, so it can be integrated with pretty much any application written in any language on any platform. there’s even a pure javascript Tasty client in the works.

also, fwiw, Tasty has been powering the tagging on thraxil.org for a few weeks now (it was sort of my sanity check to make sure that the API wasn’t too obnoxious to integrate into an existing architecture and to make sure it performed ok on a reasonably large set of tags).

(update: Ian Bicking has an interesting followup that gets more into the small applications vs frameworks/libraries discussion and even posits an approach to making integration between python applications even easier)

(replace-string "keyword" "tag")

By anders pearson 11 Dec 2005

i finally gave in and changed the nomenclature on the site from ‘keywords’ to ‘tags’.

this website had ‘keywords’ years before del.icio.us came along and everyone started calling them ‘tags’.

i’m stubborn so i stuck with “keywords” for quite a while, but i guess they finally wore me down.

what is a photo?

By anders pearson 27 Nov 2005

there’s been some controversy lately over at flickr about their policy of not allowing drawings, paintings, illustrations, etc. on the site. well, “not allowing” isn’t quite accurate: they don’t delete the images, they just flag a user’s account as “Not Public Site Areas“ (aka ‘NIPSA’) which means that none of a flagged user’s images will show up in global tag pages or group pools (except to logged in flickr users who are actually signed up to the group). but since flickr’s real draw is the community aspect, being NIPSA’d effectively cuts a user out of the community; they might as well just have their account shut off.

ignoring the issues of whether drawings, paintings, illustrations etc. have any overall negative effect on flickr, whether this is a very “web 2.0“ policy for a service often touted as one of the leaders of the whole “web 2.0” scene, whether it’s good business practice to purposefully alienate and frustrate a thriving and enthusiastic sector of their customer-base, and even ignoring the fact that they flag accounts in this way without in any way notifying the users that they’ve been flagged, i think this raises deeper philosophical questions.

reading through the threads in flickr’s forum, the flickr admins seem genuinely astounded that anyone could have an issue with this policy. the common refrain repeated over and over again is that “flickr is a photosharing site. it’s for sharing photos only” so why is anyone surprised?

my natural response is to ask, “well, what is a ‘photo’, anyway?”

what exactly is the magical defining quality that makes one image a ‘photo’ and another a ‘non-photo’ and thus not suitable to be posted on flickr? where is the line?

let’s start with something we can probably all agree on. here’s a photo from my flickr stream. it was taken with my digital camera and uploaded with no processing (aside from flickr resizing it). it might not be a great photo, but it’s pretty typical of what’s on flickr.

now, right off, we run into the issue that this is a digital photo. accepting a digital photo as a photo is a relatively new phenomenon in the photographic community. i don’t think you’d have to look very hard to find some stodgy old gray-beard photographers who would still insist that if it doesn’t involve film and chemicals and a dark room, it’s not “real” photography. but most of us aren’t that snobby so we’ll agree that a digital photo taken with a digital camera is still a “photo”.

next is the issue of PhotoShop. this is where things get murky real fast. most professional photographers use photoshop or some other image editing software to post-process their photos. cropping them, adjusting the contrast, removing red-eye. these are all common operations and most people wouldn’t revoke an image’s status as a “photo” because of them. of course, once again, you can also find communities of digital photographers who shun the use of photoshop and insist that it only counts if the image is left exactly as the camera recorded it. every profession or hobby has its share of cranks.

but how much can you really get away with? can i crop a photo down to one pixel and still have it be a ‘photo’? if the answer is that yes, it’s still a ‘photo’, then what about another image which consists of a single pixel of the same color except that it was created entirely digitally without light ever being reflected off an object, passing through a lens and onto a photo-sensitive surface? the resulting image files will be exactly identical so how could one justify a difference?

here are two images, one the result of photoshopping a digital camera photo (actually using the Gimp, not PhotoShop, but same diff) and the other an immaculate digital creation. can you tell which is which?

if there is some point at which manipulating a photo makes it no longer a photo, where exactly is that point? does cropping a photo down to less than 11% of its original size change it’s nature while 12% percent is ok? does it depend on what the subject matter of the photo is?

if i take this photo of a painting:

and crop it down and clean it up into this:

does that make it no longer a photo? or is any photo with a painting in it at all, not a “photo”? even if it’s not the focus of the picture?

what percentage of the image is the painting allowed to take up and still be considered a “photo”? what if someone uses photoshop to composite several images together like this?

is that still a photo? is it only because all of the sub-images pass as “photos”? what if one out of the four were a “non-photo”? two out of four? where’s the cutoff?

personally, i would call it a “photo of a painting”. actually, i would probably just call everything an “image” and leave it at that. but these are the kinds of photos that flickr has decided are not photos.

anyone’s who’s seriously tried to take decent photographs of paintings or drawings also knows that it’s not a trivial task. actually, it’s a royal pain in the ass to get it to come out right and requires some real photographic skills like an understanding of lighting and focus and depth-of-field issues. that’s why my photos of paintings suck; i’m not a very good photographer.

ok, i don’t think it’s too much of a stretch to argue that a photo of a painting or drawing is still a ‘photo’. what about scanned drawings and illustrations? does ‘photo’ mean that at some point in its life, light must have passed into some device that we label a “camera”? a scanner functions very similarly to a camera, but i guess you could argue that it’s different enough that it doesn’t really count as a “camera” and thus images that it creates aren’t “photos”.

so, how about this “photo” taken with a 35mm film camera and scanned in?

does passing through a scanner strip it of its “photo” nature? how about this photo of a painting taken with a 35mm film camera and then scanned in:

if a “photo” can only come from a camera, what about photograms, the staple of introductory photography classes? are they “photos”?

what about scanner photography? are scans of flowers ok but scans of pieces of paper with ink on them not?

clearly, i think this whole business is absurd, arbitrary, and petty. i think flickr should lighten up, remove the non-photo related NIPSA flags from accounts and promise never to do it again. flickr happens to be a great tool for sharing drawings, paintings, and illustrations whether they’re “photos” or not and i think they would do well to embrace that rather than start punishing their customers for using the service in a way that they hadn’t thought of.

second time's the charm

By anders pearson 21 Nov 2005

for the second year in a row, music for robots has been nominated for a PLUG award under the “Music Website Of The Year” category.

last year, pitchfork beat us but i think we have a really good chance of winning this year. since last year, we’ve been on MTV, put out a compilation album, organized some good shows (including our CD release party), had a glowing writeup in the new york times, Ben Gibbard from Death Cab for Cutie plugged us in Wired, and we’ve been kicking all kinds of ass with the quality of posts there.

so go vote for us and tell all your friends to vote for us. also vote for Jesu as metal album of the year even though i wouldn’t really call it metal. as i said last year, we consider it an amazing honor just to be mentioned on the same page as all these great artists. being nominated on the same page as Jesu for me is particularly wonderful.

at the very least, i would like to see mfr kick the pants of myspace (who were recently bought for $580 million dollars by Rupert Murdoch. i have long hated Fox, but now it’s personal.