Skip to main content

Boost your Drupal site!

Posted in

Boost: Static HTML caching for Drupal

I've recently become quite familiar with Arto Bendiken's Boost for Drupal. For context, Drupal is an open source, modular, PHP-based content management system (CMS) that I use with many of my clients. Boost is a module for Drupal which assists you in caching content as static HTML, bypassing Drupal (and thereby PHP and MySQL) in order to handle much more traffic and serve content much more quickly. Essentially, you let Apache do what it does best -- serve HTML pages. With a busy site or one with a lot of content, this can be a lifesaver.

Arto has a good write-up about Boost in his original blog post. However, Boost is a little more complex than most Drupal modules, so what I hope to add here is a couple things:

  • the basics of what Boost gives you
  • how the two "halves" of Boost complement each other
  • how Boost gets you outside of Drupal entirely
  • the status of Boost with regard to Drupal 5.x
  • a little more detail about how it works
  • some caveats that I've found

Arto's documentation for the setup of Boost is great, so I won't be rehashing that. Rather, I hope to provide a little more technical info about how the module works.

The Basics

Basically, Boost is two parts: first, a Drupal module (in the traditional sense) that manages the cache and provides for an administrative user interface, and second, some rule lines to add to your Drupal site's top-level .htaccess file which allow Apache to bypass Drupal entirely and serve pages from the cache.

The biggest thing to understand about how Boost works is to understand its utilization of Apache's mod_rewrite via the .htaccess file rule lines (by the way, .htaccess is just the default name for files in your site that Apache will read for configuration info). Many people may not understand that Drupal's use of clean URLs is dependent upon mod_rewrite. Every URL on a Drupal site is basically just a path argument to the top-level index.php, which dispatches calls to various points in the code to handle that argument. So, when you go to /about, the index.php file actually gets an argument of about and determines what content to serve. Apache's mod_rewrite is able to keep the browser pointed at /about while actually running index.php. Another popular open source CMS, WordPress, behaves similarly.

Once you understand this, it's easy to understand what Boost does and why it requires mod_rewrite. Boost by default stores the cached versions of pages under /cache on your website (this path is configurable, though). Then, when a request comes in, the .htaccess file is consulted (because that's what Apache does), which tells it to look for cache files first before sending anything to Drupal's index.php. Since the cache files are plain HTML, they go out much more quickly than Apache running PHP, firing up Drupal, querying MySQL, and then serving content. Arto provides some graphs in his original post showing just how dramatic this improvement can be.

Lastly, a word about the cache filename standard. If in fact the /about URL were cached, it would actually be in your site at /cache/about.html. If Apache finds this file, it assumes that the cache is still valid (the Drupal module side takes care of expiring and removing stale content) and serves it directly. For path aliases (such as "/about should serve the same content as /node/137"), Boost uses UNIX symbolic links in the cache filesystem, so /cache/about.html would be a link to /cache/node/137.html.

Boost and Drupal 5.x

I have been using Boost on a Drupal 5.1 site, thanks to this port of Boost to Drupal 5.x by the maintainer of drupal.ru. This seems to be the only source of Drupal 5.x-compatibile Boost material currently. The only caveat to be aware of about this version is that by default, the front page is not cached -- more on this below. If your site is anything like the one I used Boost on, you will need to remedy this since your front page is likely your busiest as well as most complicated page and is in need of caching.

A Little More Detail

A couple other notes about Boost's operation:

  • Cache files are created on demand. For example, if your front page is not cached when someone requests it, Drupal will construct the page and cache the file, but serve the constructed page to the user. Every user thereafter, until the cache file becomes stale and is removed, will receive the cached version. If you have pages that are particularly demanding, think about running a cron to request them anonymously in order to get them cached for regular users.
  • Special paths like /user/login and /admin, as well as HTTP POST requests and any request for a logged-in user, are not cached. Arto has put a lot of thought into this area. Note that this means that sites with mostly logged-in users will not benefit from Boost very much -- anonymous users see the real benefit.
  • Boost takes over the configuration interface for Drupal's built-in caching mechanism. This just means that it avoids confusion between two types of caching and just "upgrades" your current setup to be Boost-ified.
  • Like the built-in cache, Boost has multiple cache lifetime intervals to choose from; anywhere from one minute up to one day.
  • Boost expires content in one of two ways. It implements hook_nodeapi to catch node updates, insertions, and deletions and responds to those, and it also implements hook_cron to expire content that has become stale but has not had any specific actions performed on it.
  • Technical note: Boost uses PHP's output control functions (i.e. ob_start et al.) and hook_init to intercept every Drupal page request, buffer the content, compare to and update the cache, and then send the content along through Drupal normally.
  • Nothing stops you from expiring content manually by deleting its file from the cache. However, note that for pages which have path aliases (and thus Boost symbolic links) to them, the links do not get removed automatically so you may cause some wonkiness by doing this.
  • Boost inserts a small HTML comment at the very bottom of cached pages with the start and end cache times so that you can tell if it's working and how long a given file will persist in the cache.

Caveats

Like any somewhat intrusive technology (and by this I mean that it works with every page and changes the way your site operates as a whole), Boost should be used with caution. Arto states that the project is still in an alpha state.

The biggest issue that I've noticed is a strange bug which occasionally caches the front page as a Drupal "access denied" page. Others have seen this as well and I've never been able to nail it down completely. This is the main reason why drupal.ru's port of Boost to Drupal 5.x leaves out the front page from caching. I was able to work around this by hacking Boost's boost.api.inc file, in the boost_cache_set function, to not cache pages containing the words "access denied". I hope to report more once I figure this out.

The second issue is that currently, Boost will not work for sites that are not at the top-level. That is, if your site is domain.com/mysite, it will not work -- only domain.com would work. I believe this is on the .htaccess side, but it only really affected me in testing a development version of the site and since I was able to set up a top-level sandbox, I didn't investigate it any further. Once again, if I make any improvements in this area, I'll update this post.

Conclusion

This concludes my overview of Boost. As I mentioned above, I will update this post if I make any progress on the (very minor) issues that I've had with it. It's a great system and I highly recommend it!

You may also be interested in my Drupal page here at Code Sorcery Workshop for more info about my work with Drupal.

Thanks for reading!

Trackback URL for this post:

http://codesorcery.net/trackback/73

Great article, Justin - and a cool logo to boot :-) Glad to hear you persevered to get the module working for 5.x. It does indeed still have a lot of rough edges, but once you grok its modus operandi (and you clearly have) it's based on pretty straightforward concepts underneath.

I've added a link to this article to my original blog post about Boost and to the Drupal.org project page, too. If you have the chance, would you please share your changes for 5.x in the issue tracker - this article certainly gives me a kick in the behind to get a release 1.0 of the module out for 5.x soonest. (The project co-maintainer position is also up for grabs if you want it.)

@Arto: I'd be happy to help maintain Boost, as well as some of the other modules that we've talked about. I'll email you about these before too long.

Justin, can you make the working Drupal 5.x module (with your homepage tweak) available? That'd be great. Thanks.

By the way, if you could use some SEO/usability/copywriting consulting, I am willing to barter to speed up the development of the module.

What about sites that display a different theme & layout for mobile devices? I had to turn caching off because it prevented anonymous users on a iPhone from seeing the optimized view since the standard home page was cached.

Whati is the status of the Drupal 5.x port of the boost module?
Why port from drupal.ru isn't commited to drupal.org (boost module)?

This is an amazing module. Very, very, very needed.

My biggest question is - without it working on subdomains I personally would be very nervous to light it up on any production site. I always test in a staging area first (stage.LiveDomain.com.) then push live.

What is the recommended course of action?

@Cozmo: Boost has no problem with subdomains (after all, what else is www.livedomain.com but a subdomain of livedomain.com?). The limitation is that the Drupal site needs to be in the root directory.

In other news, the 5.x version is now available from drupal.org. And guys, please note that the right place (I should say, the one and only place) to request support for this module is at http://drupal.org/project/boost. Support requests in comments on blogs here and there will most likely simply be ignored by everyone involved.

for the subdirectory issue, see contemporaneous comment at http://drupal.org/node/101147 for a hack that is working for me.

.b

Nice write up. I like boost

Nice write up. I like boost too. You can also have smaller threads in your webserver if it's just serving static content, it can be pretty light in fact. So for a step beyond I tried running the apache sever with the static content somewhere else:-

http://iskra-it.com/blog/ekes/2008/01/10/boost-module-and-rsynced-apache-mirrors

Any rules for boost to work

Any rules for boost to work with lighttpd ? (in .lua or lighttpd.conf)

Re: lighttpd: No, I don't

Re: lighttpd: No, I don't have any experience with it. I'll contact Arto and see if he has heard of anything, but I've only used it (and Drupal, for that matter) with Apache.

I have posted some lua code

I have posted some lua code (if you are using mod_magnet) for boost module to work with lighttpd.

Check the link:

http://drupal.org/node/150909#comment-997460

Try it out and pls provide any feedback to make it better.

Thks.

@drupdrips: Thanks for this.

@drupdrips: Thanks for this. I'll leave the comment for folks to find in this post and if I ever get to trying lighttpd myself in this way, I'll give them a shot.