Note: I wrote this March 15, 2009 but never bothered posting it until now. -Sean
The model is the data storage and the logic for handling it. It is not dependent on the view or the controller, but is accessed by both. It includes the database, whatever it is, and the data access layer. Data objects should know how to get themselves, update/save themselves and present their data to the application. There are several ways to model the data access layer, one of the most popular being the Active Record pattern (http://en.wikipedia.org/wiki/Active_record_pattern). Because most applications are essentially about finding, displaying, and changing data, the model has been called the “functional core of an application” (http://www.phpwact.org/pattern/model_view_controller).
The job of the view is, when the controller passes it a user’s interaction (request in web terms), to get data from the model and display it for the user.
It follows that, in web programming, views should only be handling GET requests directly.
When people submit data to a form, we often need to give the user feedback on what they have submitted. This suggests either a tighter coupling between “controller” and view, or a way of representing in the model the feedback that the application is giving to the user in this situation.
In these cases, we _could_ use the controller to hand this feedback to the view (tighter coupling), but this tighter coupling might entangle the view with the controller more than is desirable. It makes better sense to have this feedback live in the model somehow, and have the view get this feedback when it queries the model.
I have been intrigued by the possibilities of what I now know is called the Transform View pattern (http://www.phpwact.org/pattern/transform_view). The model produces a representation of itself, which the view then transforms into a form that is appropriate for that context. XSLT is the likely choice of language for this type of situation. A couple of things are intriguing about this pattern:
- Different views can be constructed for different contexts, without changing or duplicating anything else about the underlying application. This makes it much faster and easier, for instance, to create a mobile web app for different platforms, once the initial application is complete.
- In the case of web programming, some or all of the transformation can be pushed to the client (browser), because all of the modern browsers are XSLT capable.
My inclination at this point would be to build up the application as an XML representation of the model, use CSS for as much as possible, and do the minimal XSLT transformation that is needed to deal with presentational realities. Do XSLT on the browser when possible, on the server when necessary. This workflow maximizes the agility of the application itself and the ability to create new outputs for new contexts as they come up.
That brings us to the controller, which is the part of this that I struggle with the most.
A controller is the means by which the user interacts with the application. A controller accepts input from the user and instructs the model and viewport to perform actions based on that input. In effect, the controller is responsible for mapping end-user action to application response. For example, if the user clicks the mouse button or chooses a menu item, the controller is responsible for determining how the application should respond. — http://ootips.org/mvc-pattern.html
In a web application, if the user submits a GET request, the controller will simply hand the request to an appropriate view.
If the request is a POST, PUT, or DELETE, however, the controller’s job is to submit that data to the model, receive the model’s response, and then call the view to get the model’s status. The controller is not necessarily to pass data or messages directly to the view (although it is often used to do this, such as in Ruby on Rails). It seems that, for decoupling, it makes better sense for the controller simply to tell the model to update itself, and then notify the view that the data has been updated, without acting as a courier. The view is then able to get whatever it needs from the model directly.
In Ajax apps, it would seem that the view is updating the model directly, but what’s really happening is that it is posting changes to the the controller and mirroring those changes directly, or it is querying the controller to return another view or partial view, which is used to modify the current view.
The controller, then, becomes a fairly simple routing mechanism. If the request is a POST or other data modifying request, it simply passes that to the model, which processes it and returns. (This processing can be set up to be asynchronous where necessary.) After passing the request to the model, the controller notifies the view, which can then query the model for the updates (if its an asynchronous application, the view can start polling the model to wait for updates. This is potentially where the Observer pattern, below, becomes useful).
The controller also putatively has the job of doing authentication (authn) / authorization (authz). In fact, that’s a major purpose, because the outcome authn & authz will determine exactly what is done in relation to models and views (e.g., a controller won’t POST to the model if that post is unauthorized, and will provide a view that indicates as much).
Still, I see a need to keep the controller very specifically to its limited job of routing requests and posting/putting/deleting to models. This is a very different way of thinking about the controller from what is currently being done in, say, Ruby on Rails.
It looks like what I have in mind is the Front Controller pattern: http://www.phpwact.org/pattern/front_controller
A completely different approach is called the Page Controller pattern: http://www.phpwact.org/pattern/page_controller
In the Page Controller pattern, routing passes transparently to template files, which are also the “views,” which then process everything about the request. The advantage is simplicity and immediacy. The disadvantages are (1) strong coupling between the URL space and the template space (solvable by adding a routing layer); (2) strong coupling between control logic and presentation; and (3) the potential for duplication of control code in the presentation layer. Thus it does make sense for the controller to be a bit more than a completely transparent layer passing requests directly to templates/scripts at URLs.
An observer is an entity that creates a mechanism for an *active model*, in which views / controllers can register to be notified whenever a model changes, so as to be able to respond to those changes.
Requests are initially handled by the Front Controller, which does authn & authz, passes legit POST/PUT/DELETE requests to the appropriate model, and then routes requests to the appropriate view. Also hands request and session data to the view, since it would be redundant to get this info from the database more than once. Use a dictionary to map requests to models and views.
Views receive requests from controller, deal with session data, and interact with the model to produce a presentation of the model to the client.
Models provide all the data and the logic required to manipulate that data. They also provide representations of the data based on the context. For example, at a bare minimum a model class should know how to produce XML of a record or set of records. It probably also knows how to produce an HTML representation of itself, and an HTML form representation of itself, but these things are less necessary because they can both be derived from the XML representation by the view, depending on the context.
Observers can be used to send notifications to views, which pick up these notifications and use them. Doing this in a webserver context requires some sort of storage of notifications for views to access, because of the asynchronous, event-driven nature of the interface.
I was recently reading some online discussions about continuations in web frameworks, and it was clear from the discussions that most of the people involved didn’t have a clear idea of when continuations would be useful, nor did I run across any good examples in my brief survey. Continuations can be hard to understand, especially in the absence of good example cases. But there is a very specific kind of situation in web programming in which continuations are exactly the right solution.
I am working on an application in which the user can tie into their WordPress.com blog. The application goes through several preparatory steps, then redirects to WordPress, where the user authenticates to the blog, then returns and goes through several more steps in my application. It’s really a very straightforward list of tasks, but not all of the tasks can happen at the same time, in the same request, or even on the same server. What is needed is a way to save the procedure in process at any point along the way, and pick it up again when the user comes back for the next step.
A continuation is nothing more than a stored procedure-in-process that can be resumed later. In the context of a web application, the procedure-in-process can be stored and then picked up by the user when they return. A continuation provides a way to model a web application as a procedure, even if the procedure requires several request and response cycles.
Unfortunately, most programming languages (and therefore, web frameworks) don’t support continuations, which are seen as too arcane or hard to understand. Even those that have something called “continuations” often don’t have the kind of continuation that is needed to do this particular programming job, which is one where the state of a procedure can be stored as an object or file (“picked”) for an indefinite period of time until the user happens to return.
I once wrote just such a thing in Stackless Python, by creating Python generators that could be pickled, and therefore, stored as files. I fondly called them continuators. When a continuator was reloaded from the file, it would continue from where it left off; if the same continuator was reloaded, it would re-run from the same point. This behavior is quite special, because generators in regular Python (CPython and the like) cannot be pickled, and they are run-once affairs. It is conceivable that a single-process web application, or one which uses strict session affinity, could make use of generators in “normal” Python to serve as a kind of continuation. But pickling enables the generator to be stored, and therefore loaded by whatever process happens to be the next one called. And it also allows the generator to be run from the same point any number of times, which means that the web application “back” button can work as expected.
But I haven’t yet found the old code that I wrote (something like 8 years ago), and I’m not currently building on Stackless. So I have to solve the application problem by more conventional means.
The other two main ways to solve this kind of situation are (a) to use a different url for each step of the application, or (b) to create a state machine, and store state on the server as the application proceeds. This approach is not at all ideal — for one, it completely breaks the “Back” button. But it does provide a way through. The basic idea of a state machine is that the application runs through the same overall loop several times. Each time through, the state changes — some data is stored, or some question is answered, or some authentication token is received. On each iteration, there is a massive if-then-elif-then-elif… structure that tests for the different possible states, and carries on accordingly. It works, but it’s not pretty, and it can be very hard to trace.
Some people say that continuations make it hard to follow control flow, but I found in using my homebrew continuators that it made application logic very sensible. So perhaps I’ll look harder for that old code, and see if it can be used in web programming (which now requires concurrency and real-time — topics for other posts!). Or I might see if Scala continuations will support the kind of continuators that I have in mind.
What solutions have you used for the problem of storing procedure-in-process in a web application? What has worked well for you?
It’s really not hard to access a SQL Server database from python, but you have to make sure everything is set up correctly.
First, make sure that the database server is set up for “Windows and SQL Server Authentication.” This can be set in the server security properties (in SQL Server Management Studio = SSMS, right click the server and go to Properties > Security). I spent quite a while pulling my hair out about failed logins, when this was the issue. After changing this setting, you’ll need to restart the server (using the Windows services administrative panel).
Then, within SSMS create a login, and a user for each database connected to that login, and make sure the user has been GRANTed access to the database (giving it the owner role is the easiest way to do that).
Now you should be able to login from python. Using adodbapi:
>>> adodbapi.connect(“Provider=SQLOLEDB; Data Source=.\SQLEXPRESS; Initial Catalog=database_name; User Id=user_id; Password=user_password”)
PostGrails is a sniglet I created to describe my journey as a developer (in a similar vein, PostLib is a blog I recently created to describe my spiritual journey). Think of it as “Post” = after, and “Grails” refers to looking for “holy grails” in web development. It’s also a not-so-subtle jab at Ruby on Rails, and it expresses my preference for the PostgreSQL database server. Basically, it means that I’m done looking for the end-all in web frameworks, and want clean, light, fast, efficient tools for doing the work of web development that don’t try to mother me, don’t “abstract away” the underlying technologies, and generally stay out of the way as much as possible.
I have been specializing in web application development for several years. In the beginning, I had read Philip and Alex’s Guide to Web Publishing and responded by diving in with the OpenACS web framework. ACS was one of the earliest “full stack” web frameworks, and it still performs admirably, powering such sites as photo.net and .LRN.
Then, I moved to Python (because I like Python a lot) and began working in mod_python/Apache to create my own “full stack web framework to end all others” which, like every other individual project of the kind, died with a whimper rather than a bang.
About that time, I ran into Ruby on Rails and said, “Hey, looks great!” and dove in. That was around August 2006. Rails is a great framework in many ways, and it taught me a ton of things that I needed to know about web development and good practices. And with it I have gotten some nice projects off the ground.
I recently had an eye-opening experience, however: I deployed in staging an application that I have been working on for quite some time, on a public server but protected under a login until it’s ready for production. A couple of days later, I received an autogen email from my virtual server hosting company telling me that my usage for the month was going to cost more than the pre-pay level that I am at. I did some sleuthing, and basically found that Rails is using an enormous amount of resident memory (RAM), which is very pricey in the shared VPS world.
Basically, Rails loads everything including the kitchen sink, whether you need it or not. That makes it possible to do all the neat magic that they do. But it sure is expensive when it comes to deployment.
What web frameworks like Rails try to do is to get you writing everything in the language of the server scripting language. That’s a big mistake. It doesn’t absolve you from learning the other technologies — you need to understand all of those langauges in order to develop for the web. Not only do you now need to learn them, but now you have to learn an additional language, the language of the abstraction layer over each of those technologies. It does make it seem easy to learn, but it actually makes things harder in the long run.
BTW, if Rails is guilty, it is not an awful guilt. They’ve done a great job, and the abstractions are well-designed. ASP.NET is a different story, with layers 10 stories deep of the ugliest abstractions imaginable.
Instead of doing magic and providing thick abstractions, a web development platform should provide functional tools and as thin a layer of abstraction as possible to enable developers to get their work done.
There comes a time, when one is maturing as a developer, that one gets tired of the be-all solutions, the magic and the abstractions, and just wants a good toolkit that helps one to develop fast, efficient applications. It’s time to grow up. That’s what this blog is all about.
One of the issues that comes up in database design is whether to use part of the data for each record as a primary key on the data, or whether automatically to generate unique identifiers that have nothing to do with the data. Automatically generated primary keys are called surrogate keys because they acts as a surrogate for the data in the database. The other option is to see if there is a way to represent each record with a unique identifier that is derived from the data itself. There is a decent introduction to the matter at http://en.wikipedia.org/wiki/Surrogate_key, with a summarize of the advantages and disadvantages of each approach.
Now, I am not really going to provide a satisfying rant in the “Considered Harmful” tradition. But I am in a position to state a clear preference for meaningful, rather than surrogate, keys. Here are a few of my reasons, coming from the perspective of a web application developer:
- Surrogate keys always require a join whenever I want a piece of meaningful information. But in many instances, I only need just one piece of information from the record. If this information were the primary key, no join would be required. Joins always require resources and should be avoided, wherever possible, in a web application. Surrogate keys slow things down. They limit speed.
- Surrogate keys make it difficult for human beings quickly and easily to understand the data in a join table, requiring the construction of additional views. For example, if you have a people table and a books table, and a join table (people_books, say) indicating which people have which books, surrogate keys make the join table meaningless to a human reader. Whereas using meaningful keys makes it very easy to see at a glance which people have which books. Surrogate keys make things hard to understand. They limit discoverability.
- One of the arguments in favor of surrogate keys is that it means that changes to the data or the model do not have to result in a difficult-to-implement change to the meaningful key. For example, many systems do not allow cascading updates to fields that are referenced by foreign keys. That’s not a problem for people who use an enterprise-class database like PostgreSQL. In cases when the actual schema of the data changes, then it is not too hard to figure out how to transform the data to match the new schema. Surrogate keys attempt to solve a problem that doesn’t really amount to much. They have limited utility.
I’ll stop there. Perhaps later I’ll provide a couple of examples of situations in which we started with surrogate keys but ended up ignoring them and creating meaningful, primary keys that we used throughout the application.
In the meantime, what are your thoughts about surrogate vs. primary keys?