Archive for March, 2009
PostGrails is a sniglet I created to describe my journey as a developer (in a similar vein, PostLib is a blog I recently created to describe my spiritual journey). Think of it as “Post” = after, and “Grails” refers to looking for “holy grails” in web development. It’s also a not-so-subtle jab at Ruby on Rails, and it expresses my preference for the PostgreSQL database server. Basically, it means that I’m done looking for the end-all in web frameworks, and want clean, light, fast, efficient tools for doing the work of web development that don’t try to mother me, don’t “abstract away” the underlying technologies, and generally stay out of the way as much as possible.
I have been specializing in web application development for several years. In the beginning, I had read Philip and Alex’s Guide to Web Publishing and responded by diving in with the OpenACS web framework. ACS was one of the earliest “full stack” web frameworks, and it still performs admirably, powering such sites as photo.net and .LRN.
Then, I moved to Python (because I like Python a lot) and began working in mod_python/Apache to create my own “full stack web framework to end all others” which, like every other individual project of the kind, died with a whimper rather than a bang.
About that time, I ran into Ruby on Rails and said, “Hey, looks great!” and dove in. That was around August 2006. Rails is a great framework in many ways, and it taught me a ton of things that I needed to know about web development and good practices. And with it I have gotten some nice projects off the ground.
I recently had an eye-opening experience, however: I deployed in staging an application that I have been working on for quite some time, on a public server but protected under a login until it’s ready for production. A couple of days later, I received an autogen email from my virtual server hosting company telling me that my usage for the month was going to cost more than the pre-pay level that I am at. I did some sleuthing, and basically found that Rails is using an enormous amount of resident memory (RAM), which is very pricey in the shared VPS world.
Basically, Rails loads everything including the kitchen sink, whether you need it or not. That makes it possible to do all the neat magic that they do. But it sure is expensive when it comes to deployment.
What web frameworks like Rails try to do is to get you writing everything in the language of the server scripting language. That’s a big mistake. It doesn’t absolve you from learning the other technologies — you need to understand all of those langauges in order to develop for the web. Not only do you now need to learn them, but now you have to learn an additional language, the language of the abstraction layer over each of those technologies. It does make it seem easy to learn, but it actually makes things harder in the long run.
BTW, if Rails is guilty, it is not an awful guilt. They’ve done a great job, and the abstractions are well-designed. ASP.NET is a different story, with layers 10 stories deep of the ugliest abstractions imaginable.
Instead of doing magic and providing thick abstractions, a web development platform should provide functional tools and as thin a layer of abstraction as possible to enable developers to get their work done.
There comes a time, when one is maturing as a developer, that one gets tired of the be-all solutions, the magic and the abstractions, and just wants a good toolkit that helps one to develop fast, efficient applications. It’s time to grow up. That’s what this blog is all about.
One of the issues that comes up in database design is whether to use part of the data for each record as a primary key on the data, or whether automatically to generate unique identifiers that have nothing to do with the data. Automatically generated primary keys are called surrogate keys because they acts as a surrogate for the data in the database. The other option is to see if there is a way to represent each record with a unique identifier that is derived from the data itself. There is a decent introduction to the matter at http://en.wikipedia.org/wiki/Surrogate_key, with a summarize of the advantages and disadvantages of each approach.
Now, I am not really going to provide a satisfying rant in the “Considered Harmful” tradition. But I am in a position to state a clear preference for meaningful, rather than surrogate, keys. Here are a few of my reasons, coming from the perspective of a web application developer:
- Surrogate keys always require a join whenever I want a piece of meaningful information. But in many instances, I only need just one piece of information from the record. If this information were the primary key, no join would be required. Joins always require resources and should be avoided, wherever possible, in a web application. Surrogate keys slow things down. They limit speed.
- Surrogate keys make it difficult for human beings quickly and easily to understand the data in a join table, requiring the construction of additional views. For example, if you have a people table and a books table, and a join table (people_books, say) indicating which people have which books, surrogate keys make the join table meaningless to a human reader. Whereas using meaningful keys makes it very easy to see at a glance which people have which books. Surrogate keys make things hard to understand. They limit discoverability.
- One of the arguments in favor of surrogate keys is that it means that changes to the data or the model do not have to result in a difficult-to-implement change to the meaningful key. For example, many systems do not allow cascading updates to fields that are referenced by foreign keys. That’s not a problem for people who use an enterprise-class database like PostgreSQL. In cases when the actual schema of the data changes, then it is not too hard to figure out how to transform the data to match the new schema. Surrogate keys attempt to solve a problem that doesn’t really amount to much. They have limited utility.
I’ll stop there. Perhaps later I’ll provide a couple of examples of situations in which we started with surrogate keys but ended up ignoring them and creating meaningful, primary keys that we used throughout the application.
In the meantime, what are your thoughts about surrogate vs. primary keys?