WebAlchemy accelerates Django in 100 times

November 18, 2007 [last comment: June 10, 2008]

I host my Django blog at shared hosting environment on Dreamhost, where hundreds of sites are also running on the same server. In general I enjoy Dreamhost hosting (especially, taking into account their price for unlimited domains hosting plan) but sometimes I need more speed. When I wrote WebAlchemy and configured www.mysoftparade.com to use it, I achieved almost immediate responses of my web-site.

With WebAlchemy only pages involved in form processing are served directly by Django, the rest of the pages most of the time are served directly by Apache as static content with static content speed. In other words for Django-powered site it's possible to achieve speed about 2000 request/sec, against about 500 request/sec with memcached and about 20 request/sec for "typical" (10 fast SQL queries) page without caching at all. Actual performance results of course will vary from server/application/configuration.

As it was said: there are only two hard things in Computer Science: cache invalidation and naming things. WebAlchemy radically resolves the problem of stalled resources in cache that is natural for any caching layer based on memcached, squid or etc. With WebAlchemy pages are never become staled.

The magic is done inside concept.webalchemy.core.WebAlchemyMiddleware. When page request is passed via this middleware, it saves the response data as file inside Apache htdocs directory and add to .htaccess file in the same directory a special rewrite instruction. Then all next requests to the same url, Apache will treat as requests to static file saved at the previous step. When something that affects rewritten url will be changed in application, WebAlchemy will update corresponding .htacces file again and default behavior of the web-server will be restored. So WebAlchemy depending on situation will made the same url dynamic or static. When someone writes to url, it becomes dynamic. When someone reads url, it becomes static. Like yin and yang in the Chinese philosophy.

For example, let's analyze life-cycle of the current page http://www.mysoftparade.com/blog/webalchemy-django-apache/.

  1. The blog entry was just wrote and stored in database.
  2. The web-server receives request to the page /blog/webalchemy-django-apache/, request is passed to mod_python/FastCGI/etc and then to Django, that routes it to concept.webalchemy.core.WebAlchemyMiddleware.
  3. The middleware finds view responsible for serving the url – concept.blog.views.entry_detail and realizes that this view in the list of binded views. WebAlchemyMiddleware saves response generated by entry_detail view in the file htdocs/blog/webalchemy-django-apache/index.html and populate file htdocs/blog/webalchemy-django-apache/.htaccess with following content:
    #webalchemy_begin
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /blog/webalchemy-django-apache/
    RewriteRule ^$ /blog/webalchemy-django-apache/index.html [L]
    RewriteRule ^index\.html$ - [L]
    RewriteRule ^dispatch\.fcgi/.*$ - [L]
    RewriteRule ^(.*)$ /dispatch.fcgi/blog/hello-world/$1 [L]
    </IfModule>
    #webalchemy_end
    
  4. Web-server receives new request to to the page /blog/webalchemy-django-apache/ but this request will not be passed to Django because created at the previous step .htaccess contains rewrite instruction for Apache that instruct it to send file /blog/webalchemy-django-apache/index.html back to client immediately. Apache do this and will continue to do this even if you will stop Django and MySQL for maintenance. Here is an important nuance that I want to highlight: WebAlchemy don't uses any timeouts, so when everyone forgot this article and it will not be changed/commented anymore, web-server will serve it as static file forever: days, weeks, months...
  5. However if something that affects this page will occur (a comment to this entry will be posted, or entry is updated for example), WebAlchemy will immediately remove /blog/webalchemy-django-apache/index.html and htdocs/blog/webalchemy-django-apache/.htaccess and so next request to this page will again be passed to Django. So we return to the stage 2 of the page lifecycle.

WebAlchemy uses Django signal framework to monitor changes in models. Mapping between models and views are defined in file webalchemy_settings.py. Bellow is content of this file for the www.mysoftparade.com:

# -*- coding:utf-8 -*-

# $Id: webalchemy_settings.py 289 2007-11-18 17:03:13Z dogada $
# Copyright (c) 2007, Dima Dogadaylo (www.mysoftparade.com)

"""WebAlchemy configuration for the www.mysoftparade.com.
Mapping between models and views that models affect.
WebAlchemy uses this information to switch between
dinamic and static versions of an url.
"""
import sys
from os import path

from django.core.urlresolvers import reverse
from django.conf import settings

from concept.views import home, about,
     latest_feed, latest_feed_clone, tag_feed

from concept.tagging.models import TaggedItem, Tag
from concept.tagging.views import tag_list, tag_detail

from concept.blog.views import *
from concept.blog.models import Entry

from concept.comments.models import Comment
from concept.projects.models import Project

from concept.projects.views import *
from concept.webalchemy.core import Site

from concept.webalchemy.utils import content_object

class TagHandler(object):
    """Produce list of urls affected by a changed Tag,
    TaggedItem or  object with tags."""

        
    def get_affected_paths(self, obj):
        """Return list of paths affected by this object.
        It handles only pages depends on tag directly, content objects
        details and lists pages are handled when content_object saved.
        It also possible to rebuild here content object related pages,
        but this is not need for concept.
        """
        tags = []
        if type(obj) == TaggedItem:
            tags = [obj.tag.name]
        elif type(obj) == Tag:
            tags = [obj.name]
        else:                           # an tagged object

            tags = [tag.name for tag in Tag.objects.get_for_object(obj)]
        paths = [reverse(tag_list, None, (), {})]
        for tag_name in tags:
            paths.append(reverse(tag_detail, None, (),
                                 {'object_id': tag_name}))
            paths.append(reverse(tag_feed, None, (),
                                 {'url': 'tag/%s' % tag_name}))
        return paths

class EntryArchiveHandler(object):
    """Archive urls affected by the changed Entry."""

    def get_affected_paths(self, obj):
        paths = [reverse(entry_archive_year, None,
                         (str(obj.created.year),))]
        paths.append(reverse(entry_archive_month, None,
                        (str(obj.created.year),str(obj.created.month),)))
        return paths

    
site = Site(settings.WA_PUBLISHER)
site.bind(home, [Entry, Project])

site.bind_static(about)
site.bind([entry_detail, entry_queue], Entry, None, slug="slug")

site.bind([entry_detail, entry_queue], Comment,
          content_object(Entry), slug="slug")

site.bind([entry_list, latest_feed, latest_feed_clone], Entry)

site.bind(project_detail, Project, None, slug="slug")

site.bind(project_list, Project)

site.bind_custom([tag_feed, tag_detail, tag_list], [TaggedItem, Tag],
                 TagHandler())

# even if tags itself aren't changed we need to rebuild tag views because
# it may use changed parts of content objests (name, summary, etc.)
site.bind_custom([tag_feed, tag_detail, tag_list], [Entry], TagHandler())

site.bind_custom([entry_archive_year, entry_archive_month], [Entry],
                 EntryArchiveHandler())

This quite short file defines mapping between all models used on web-site and between all pages. As you see it supports custom binders, that makes possible to define rules of any level of complexity. So if your requirements aren't satisfied by standard model2view binders, you can define own binder by providing object with only one method get_affected_paths.

To test WebAlchemy with your project you need to download webalchemy.tar.gz with following content:

webalchemy/core.py
webalchemy/htaccess.py
webalchemy/__init__.py
webalchemy/models.py
webalchemy/publishers.py
webalchemy/rewriters.py
webalchemy/utils.py
LICENSE.txt
Place the content of the archive in your project directory or anywhere in PYTHONPATH. Ensure that WebAlchemy is visible for you project:
$ ./manage.py shell
>>> import concept.webalchemy
Then activate WebAlchemy usage in your settings.py. Here at Dreamhost in FastCGI environment I use following configuration:
WEBALCHEMY_SETTINGS = "concept.webalchemy_settings"

HTDOCS_ROOT = '/root/dogada/mysoftparade.com/'
from webalchemy.rewriters import FastCGIRewriter
from webalchemy.publishers import ApacheHtaccessPublisher

WA_PUBLISHER = ApacheHtaccessPublisher(FastCGIRewriter,
               HTDOCS_ROOT, '', delete_files = True)
Configuartion is a bit different on my local workstation where I use Apache with mod_python:
WEBALCHEMY_SETTINGS = "concept.webalchemy_settings"
HTDOCS_ROOT = '/var/www/concept/public/'
from webalchemy.rewriters import ModPythonRewriter

from webalchemy.publishers import ApacheHtaccessPublisher
WA_PUBLISHER = ApacheHtaccessPublisher(ModPythonRewriter,
               HTDOCS_ROOT, '/cache', delete_files = True)

There is no common answer on the question: which rewriter to use? It depends... For my purposes I created 2 rewriters: ModPythonRewriter and FastCGIRewriter, they produce different .htaccess files for my local environment and Dreamhost environment where blog is hosted. In case with ModPython I store files in /cache/ directory that don't have associated PythonHandler:
<Location "/">
    SetHandler python-program
    PythonHandler django.core.handlers.modpython
    SetEnv DJANGO_SETTINGS_MODULE concept.settings
    SetEnv PYTHON_EGG_CACHE /var/www/concept/.cache
</Location>

<Location "/cache">
    SetHandler None
</Location>
So it's enough to rewrite request path to /cache/* and it will be handled by Apache as static file. So .htaccess files in this case looks like:
#webalchemy_begin
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/webalchemy-django-apache/
RewriteRule ^$ /cache/blog/webalchemy-django-apache/index.html [L]
</IfModule>
#webalchemy_end
On Dreamhost I don't use special cache prefix because its configuration is slightly different. Example of .htaccess for FastCGI-environment here at Dreamhost was shown in the begging of the article.

Also you need to add concept.webalchemy.core.WebAlchemyMiddleware to the end of settings.MIDDLEWARE_CLASSES and create file webalchemy_settings.py in the same directory with settings.py. My webalchemy_settings.py you already saw. For your site you can start with very simple webalchemy_settings that will map some models to the homepage view. It may look like:

from django.conf import settings
from concept.webalchemy.core import Site
from myproject.views import homepage
from myproject.blog.models import Entry

site = Site(settings.WA_PUBLISHER)
site.bind(homepage, Entry)

With such configuration your home page will become static and will be updated when an Entry instance will be changed. Then you can add more advanced binding that will take into account the parameters of changed instances and update only the pages affected by this concrete instance.

During debugging take into account page response headers. When WebAlchemy will be activated for site, it will add WebAlchemy header to the response headers.

If for an url you see WebAlchemy: ignored header, it means that view used for handling this url isn't binded. If you see 2 or more times WebAlchemy: saved, than something wrong with .htaccess files or web-server configuration – ensure that your web-server configured properly to process rewrite instructions in .htaccess files.

If instead of the page you see error with code 500 and in the Apache error.log something like Maximum Recursion Limit exceeded – used rewrite rules are incorrect.

When Apache treats the url as static file, it doesn't only respond much more faster but also send different headers that is usefull for testing. For example here is headers for WebAlchemy-powered Django page, when it was served as static file:

HTTP/1.1 200 OK
Date: Sun, 18 Nov 2007 17:33:26 GMT
Server: Apache/2.2.3 (Ubuntu) DAV/2 SVN/1.4.3 mod_python/3.2.10 Python/2.5.1
Last-Modified: Sun, 18 Nov 2007 16:58:22 GMT
Accept-Ranges: bytes
Content-Length: 17810
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8
When the same page served by Django as dynamic content the headers are different:
HTTP/1.1 200 OK
Date: Sun, 18 Nov 2007 17:36:22 GMT
Server: Apache/2.2.3 (Ubuntu) DAV/2 SVN/1.4.3 mod_python/3.2.10 Python/2.5.1
vary: Cookie
webalchemy: saved
Connection: close
Content-Type: text/html; charset=utf-8

You can create your own rewriter which best suites your specific needs. See concept/webalchemy/rewriters.py for guidelines. You can define your rewriter in your project code and tell WebAlchemy to use it in standard Django settings.py:

from myproject.utils import MyRewriter
from webalchemy.publishers import ApacheHtaccessPublisher

WA_PUBLISHER = ApacheHtaccessPublisher(MyRewiter, HTDOCS_ROOT)

For example it's possible not to deal with file extensions at all (now extensions are used to provide original mime-type of url even after it becomes static file) and use [Type] directive of .htaccess file. Or use send-as-is Apache handler for sending content with custom headers. Also it's interesting to implement rewriter for RESTful web-site when only GET-requests will be redirected to static file, but POST, PUT, DELETE request always will be redirected to Django, even if GET-requests are served directly by Apache.

If you are so big that need to use several app servers, you will need a shared between all app servers htdocs filesystem to which all app servers will write. However for reading each app server can use local htdocs filesystem replicated from shared htdocs filesystem. From the other point of view, do you really need many app servers and deal with replication/sharding if with WebAlchemy you can decrease a lot the database load and accelerate "reader" pages in 100 times?

Behind the scene I'm leaving for now other interesting issues like setting correct content-type for urls, merging manual and autogenerated content of .htaccess files, creating non-conflict rewriting rules, etc.

Enjoy the code and have fun!

Submitted 34 comments: accepted - 34, in moderation queue - 0. Add your comment.
  • David, biologeek on November 18, 2007
    This solution looks interesting but given the fact that your Latest entries are not the same when you read this article and when you read the Django profiler one, I'm a bit sceptic about caching on this site.

    I hope I'm wrong :-).
  • Dima Dogadaylo on November 18, 2007
    Well spotted, David. I just forgot that temporary added latest articles block to the details page. It's matter of configuration: homepage is rebuilt when any Entry is modified, but entry details page is rebuilt only when corresponding to page entry is modified or a comment to this entry is published.

    I can rebuilt also all details pages when an entry is changed, but don't do this, because for such things better to use a kind of sever side includes.

    BTW, I fixed issue. Thanks.
  • Michael on November 18, 2007
    Is it possible to Run your solution with lighttpd?
  • Dima Dogadaylo on November 18, 2007
    It's possible to proxy "static" requests from apache to lighttpd or other server. Also in theory it's possible to use lighttpd at the front of apache as reverse proxy and bounce requests between them, so lighttpd will pass all requests to apache and then receive back request for which exists static version and serve such request from own htdocs. Dynamic requests will continue to serve apache and return to lighttpd already processed requests.

    Or you can use .htaccess directly with lighttpd if it support it :)
  • sceptic on November 19, 2007
    How about you storing the blog entries as text files only, and not in a database?
    It'll keep things much simpler and transparant than this hack.

    All that's required are the proper django code to create the flat files when pages are edited. The editing pages become the only dynamic part of the site.
  • Dima Dogadaylo on November 19, 2007
    2sceptic: blog entries may contain included section like "see also", counters, comments, etc. Also it's not worth to store part of data in db and part as flat files - it's bad design IMHO.
  • wiz on November 19, 2007
    Wasn't Django's Cache middleware + memcached for dynamic with separate web server (nginx or other) for static files outperforms this stuff?

    You can even deliver pages directly from memcached using Nginx's memcached engine.
  • Dima Dogadaylo on November 19, 2007
    with memcached you will have staled resources in the cache and near 500 req/sec, with static files served by apache or other server you can have 2000 req/seq. BTW, run ab ;)
  • mer on November 20, 2007
    V. interesting - any view on how this approach might get extended out to more personalised web pages, if at all. For instance on, where a user's personal details are displayed on a particular page?
  • Dima Dogadaylo on November 20, 2007
    mer, personalisation and caching are well-known enemies :) Often people save user state in a cookie and then modify pages with javascript (for example, LiveJournal blogs use such technique and use memcached for caching of pages). Other approach - AJAX. As last resort people buy more hardware.
  • AH on November 20, 2007
    nice work!
  • AH on November 20, 2007
    how to tell WebAlchemy/apache to create this http://www.mysoftparade.com/search/?q... when i browse http://www.mysoftparade.com/search/?q... ?
  • Dima Dogadaylo on November 21, 2007
    Search pages and POST-pages involved in form processing are always served by Django. If page depends on GET- or POST-parameters it will be served by Django always.

    Actually on this site only /search/ and comments posting are always served by Django, all the rest pages, served by Apache.
  • AH on November 21, 2007
    does Webalchemy work with Django 0.95 ?
  • Dima Dogadaylo on November 21, 2007
    AH, I work on trunk version of Django, so I'm not sure about quite old 0.95
  • AH on November 21, 2007
    oki, i cannot get it running with django 0.95, i have no tracebacks and it does not create any file (.htacess/index.html).
    maybe i did something wrong in webalchemy_settings.py ? i leave it empty

    Dima, is it needed to put something in webalchemy_settings ? can i leave it empty ?
    i thougth it is only to tell to Webalchemy that it must update a specified page..

    i will try again with django trunk.
  • Rocketer on November 21, 2007
    Django after first start will create index.html .
    I can't understand why do you need any rewrite rules in .htaccess if you already had index.html? Apache will send index.html back to client by default. When index.html will be deleted, django will start in any case.

    Could you please explain?

    thanks
  • Dima Dogadaylo on November 22, 2007
    2Rocketer: even if you have, say, htdocs/dir/index.html, Apache will redirect requests to Django if for Location / set ModPythonHandler. So we need to instruct Apache to not redirect /dir/ request to Django and instead return htdocs/dir/index.html. By default Apache ignores content of htdocs if for / set an Hadndler - all request are just passed to this handler.
  • Dima Dogadaylo on November 22, 2007
    2AH: you should have non-empty webalchemy_settings.py. See examples in the article.
  • Sebastian Macias on January 09, 2008
    What happens if you have multiple servers and a loadbalancer? is there a way to write the .htaccess file multiple servers? how would performance be if you have to write that file to lets say 80 servers? also would you recommend it for a site with 3000,000 users and around 45,000,000 dynamically generated pages?
  • Dima Dogadaylo on January 09, 2008
    Sebastian, there is no simple answer on your question. It depends. For example you can make a network share to which all app servers will write and replicate this share to local filesystems of your app-servers. It looks tedious with 80 app-servers, but with WebAlchemy you can speedup the site in 100 times, so you will need to use only 0.8 servers ;)

    Seriously, you don't need to make all you pages static, but homepage, rss-feeds, common entry pages like, /community/, /news/, /photos/latest/ IMHO MUST but static and should not use SQL-queries.

    You always can start with little changes, make your homepage fast as rocket and then think about other pages.
  • Dima Dogadaylo on January 09, 2008
    New blog entry about WebAlchemy is published. Please see:
    www.mysoftparade.com/blog/webalchemy-vs-staticgenerator/
  • AH on January 10, 2008
    Dima,

    have you used the directive "AllowOverride all" somewhere in your httpd.conf
    so that apache evaluate all .htaccess it encounter in /cache ??


    SetHandler None



    thanks
  • oron on January 14, 2008
    Great piece of software
  • compare mobiles on January 14, 2008
    BTW, how do you use it with direct_to_template ?
    when there is no view involved ? surly there need to be some way to bind_static in such cases
  • Dima Dogadaylo on January 14, 2008
    instead of direct_to_template I use @render_to decorator. I found that simplest way is to use unique views for every page, for example:
    
    @render_to("home.html")
    def home(request):
        return {}
    
    @render_to("about.html")
    def about(request):
        return {}
    
    
  • Tai Lee on January 16, 2008
    Feel like dumping this in a public SVN repository tagged with version numbers so people can track it easily?

    Also you only need a single .htaccess file in your front Apache server to serve static media or proxy to the back Django server (Apache, Lighttpd, FCGI, mod_python, whatever).

    DirectoryIndex index.html
    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME}/index.html !-f
    RewriteRule ^(.*)$ http://127.0.0.1:1704/$1 [P,QSA,L]
  • oron on February 07, 2008
    What about authenticated and anonymous users
    Any way to use this for caching a page depending on the status of the user (logged in or not) something like the Django built in cache has.

    I understand the best case is for static like pages, but for example the home page in my site which is classic for this kind of cache has 'logged in as X' and different menus for logged in users.
  • Dima Dogadaylo on February 14, 2008
    oron, with htaccess it's possible to redirect requests of anonymous users to a static file, but logged-in users to Django. See usage of RewriteCond with cookies for details.
  • xeboy on May 22, 2008
    HTTP/1.1 200 OK
    Date: Thu, 22 May 2008 14:05:25 GMT
    Server: Apache/2.2.8 (Win32) DAV/2 mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.8 OpenSSL/0.9.8g mod_autoindex_color PHP/5.2.5
    Vary: Cookie
    Content-Type: text/html; charset=utf-8
    WebAlchemy: saved
    X-Transfer-Encoding: chunked
    Content-length: 13095

    I am keeping get this kind of head message. And I am using TrivialPublisher not mod_rewrite to deal with .htaccess file.

    It seems work that it generate a static file.
  • Bernard on June 08, 2008
    Hi Dima,

    This seems like a great piece of software and I'm trying to integrate it into one of my projects.

    I've run in to a problem that I don't quite understand(I'm still learning python). When I look at the headers, it always comes up as "WebAlchemy: ignored". You mentioned that this probably means that I haven't binded the method to the model yet.

    However after looking at the source, I found that "view in self.get_binded_views()" the process_response function inside the Site object always returns false even when the view is in the binded_views set.

    I did a small comparison of the view that is resolved from request.path and the actual view:

    >>> resolve(request.path)[0]

    >>> self.get_binded_views()
    set([])

    I'm not exactly sure why these are coming up as false when they are the same function, probably something to do with the 0x0 address not being the same I'm guessing. Do you know why this is happening? I cannot get webalchemy to generate static pages when it is always ignored even when the view is binded. I'm sure this is probably not happening to you or it would have been fixed.

    I'm using django from the trunk running on Python 2.5 on Windows. I'm currently testing this off the development server that comes with django. Is it possible that this happens only with this configuration?
  • Bernard on June 08, 2008
    this didn't quite come out right with the brackets

    >>> resolve(request.path)[0]
    function popular at 0x019CC2F0
    >>> self.get_binded_views()
    set([function popular at 0x0199F470])

    thanks in advance!
  • Dima Dogadaylo on June 09, 2008
    Bernard, if you see "WebAlchemy: ignored" header, then you need to add proper rule for given view in webalchemy_settings.py.
  • Bernard on June 10, 2008
    Thanks for your reply Dima, as I mentioned above, I have added a rule in the webalchemy_settings.py file. But it still gives the ignored header.


   
Web log, research lab and soft parade of Dima Dogadaylo.
Email: entropyhacker at gmail dot com