Drupal

Drupal File Taxonomy Server public release!

August 28, 2008 by Justin

Posted in

I've finally been able to make a public, alpha release of the File Taxonomy Server for Drupal 6, allowing integration of a Drupal site's category structure into a drag-and-drop WebDAV interface. I blogged about this software a bit previously, but am finally getting some code out there. If it sounds like something you'd be interested in, then I encourage you to check out the project and give it a spin.

I've fixed a couple minor bugs and now the install instructions mention the latest tested versions of the DAV API and File Framework projects, so it should be pretty easy to get going.

You can create and edit categories and upload and tag files to your site, all from the comfort of your WebDAV client, including Mac OS X's Finder, Windows' Explorer, or other applications like Transmit, Coda, and Cyberduck.

Feedback is welcome, but please report bugs over on the project page, not here. I welcome comments and feedback, but please try to keep official feature requests and bug reports over there.

Major advancements in Drupal file handling

July 23, 2008 by Justin

Posted in

I wanted to mention a few technologies that I've been working with and contributing to lately. This post will be pretty tech-heavy, so I'll throw out some keywords ahead of time so that you can decide if this is something that you'll be interested in reading further: WebDAV, desktop-web integration, RDF, and content-addressable storage, all in the context of my content management system of choice, Drupal.

My work specifically has been in porting and updating some work started by Arto Bendiken (whose projects, it seems, I just start to wrap my head around, and appreciate the foresightedness of, about two years after he himself conceives of them). Aside from writing a guide to his high-traffic caching module, Boost, and porting his debugging tool, Trace, to the latest version of Drupal some time ago, I've been porting his Drupal-WebDAV content bridge, File Server, fixing some bugs in the underlying DAV API, and integrating with the updated File Framework that Arto and my other colleague Miglius Alaburda have been working on.

I'll tackle each of these technologies in layers, starting from what the user sees on down to the gory details under the hood in how the files are stored and queried.

File Server

File Server lets you take a WebDAV client, such as Mac OS X's Finder or Windows Explorer or, better yet, a richer program like Transmit or Cyberduck, login to your Drupal-based website with it using the account that you already have, and drag and drop the files and folders presented there to you. Assuming your Drupal site has file nodes (essentially, chunks of content in the form of file uploads) on it, this dragging and dropping can be used to re-categorize the files, upload new files, and change your site's category structure.

Here's an example:

1) I have a Drupal-based website.

2) I connect to http://my-website-url/dav in a WebDAV client and login with my Drupal user credentials. I'm then presented with a view of my tags:

3) I drag a file from my desktop into a folder corresponding with the tag that I'd like it to be categorized under:

4) When I go to my Drupal site, the file exists as a node and has been tagged appropriately:

This happens via File Server, which I've recently ported to Drupal 6, and Arto's DAV API, which lets you hook all kinds of Drupal facilities into a standardized DAV interface. In this case, we're connecting Drupal's taxonomy (i.e. tags and categories) and in turn, its file nodes, to DAV. It makes a lot of sense here because the files are going into Drupal as file nodes, with automatic conversion into other formats, automatic metadata extraction (as seen in the sidebar above), and automatic indexing into the search system.

File Framework

File Framework is the link between the actual files and the node structure in Drupal. In short, File Framework takes the default facility in Drupal for file uploads and replaces it with a more robust system for backend storage, exporting of info, and conversion into alternative formats. For example, you can upload a PowerPoint presentation and automatically get related nodes out as PNG, PDF, Flash slideshow, and more.

File Framework is a lot of under-the-hood stuff and is in active development for Drupal 6. It also builds upon two other frameworks, RDF and Bitcache.

RDF

RDF stands for Resource Description Framework and is not Drupal-specific, though it embodies a concept that Drupal is trying to move towards. RDF is a step in the direction towards the semantic web, where computers can understand what data is about, not just what it contains.

The most common application for this technology is search engine technology. Today, when trying to find a picture, we search for web pages that contain the words picture of a sunset and only turn up hits if those words are found. In the semantic web, this info could be found because the search engine can understand that there is a person who has a name of Joe, who has a profession of professional photographer, and who has a website at http://joephoto.com, which in turn has a file which is a digital photograph, which has a description containing the word sunset. As a result, some context can be gleaned by the computers indexing all of this stuff, such as the fact that this likely to be a good photo of a sunset since Joe is a professional photographer, and can provide much richer info than the search engine merely looking for words that the content author may or may not have written near the object in question.

Anyway, RDF is the stuff that stores these triples, the idea of subject-predicate-object, e.g., website has a file. Arto has hacked together an API for RDF storage in Drupal and the File Framework now uses it. This RDF storage facilitates not only descriptiveness on the site, but also helps with cross-site searches when this data is needed.

None of this stuff is useful if people have to focus on creating the RDF, so this RDF API combined with Drupal makes it easy for people to keep doing what they were doing before and have the system take care of all of this context stuff.

Bitcache

Bitcache is a means for content-addressable storage (CAS), which means that, unlike most filesystems that we deal with today, the address or URL pointing to a file is based merely on the content, not on the set location. To put it another way, when you put a file on your hard drive, it gets an address like file:///users/justin/myfile.txt that is assigned arbitrarily (well, actually by you based on how you name it and where you put it, but it's arbitrarily related to the actual content). In Bitcache, a unique string of letters is calculated when the file is put into the system and as long as the content remains the same, the pointer URL to that file will remain unchanged. When the file changes, a new copy is created which necessarily has a different address; however, the old file remains in the system as well. So, a benefit is that a given file that exists in a given state is never put into the system twice, since it can be continually referred to by its content address. Another benefit is that the old versions are necessarily retained for archival purposes as well.

A quick hypothetical example: a computer server is in use by an office of people to store their MP3 collections on. When someone puts a new song on the server, it gets a content address. When a second person later puts the same song on the server, a second copy is not created -- instead, the second person gets a reference to the original file's address, since the file is the same anyway. That way, half the storage is used. As more people add the same song to their collections, the storage benefit increases. This is a simplified and contrived example, but it's the basic gist of things.

The File Framwork for Drupal makes use of Arto's Bitcache project. So, when you are using the stack of tools I've been talking about here, under it all, you also get the benefits of CAS behind the scenes.

Conclusion

I'm going to stop there, as I'm sure I've done some grave injustice to some of the complexity involved, left out some of the cons that come with the pros of these systems, and I've probably mangled some of the descriptions too. But then again, that's why I'm working on a piece of it and not the whole thing ;-)

I get excited about this stuff not just because of the Cool Factor™, but also because this sort of thing, when combined with Drupal 6's capabilities in the workflow department with triggers and actions, can lead to some powerful publishing and conversion capabilities for file uploads. And all of this paves the way for Drupal to be the most forward-thinking content management system out there.

Questions? Comments? Corrections? Leave a note below and let's sort it out.

17 comments

The Meerkat is loose!

May 28, 2008 by Justin

Posted in

After many months of development, I'm pleased to announce that Meerkat is ready! As the tag line goes, UNIX power, Mac style: SSH tunnels made easy. Meerkat is an easy to use SSH tunnel manager built specifically for the Mac. A lot of blood, sweat, and tears has gone into this release and I'm happy (and relieved) to get to this point.

If you're reading this on the actual site, you may also notice that I've redesigned the website. I hope that it makes more information available, while staying uncluttered and more easily manageable.

Lastly, I've also moved from WordPress to Drupal as my content management system. There are many reasons why, and I hope to blog about them in the near future, but for now, please do drop me a line if anything seems to be out of place.

And now, go get Meerkat! :-)

18 comments

Pets for the environment!

April 18, 2008 by Justin

Posted in

Congratulations to Environmental Working Group for launching Pets for the Environment yesterday! From the announcement:

According to the latest research from EWG, I'm the canary in the living room, soaking up more chemicals than you or your children.

New tests confirmed that I'm full of toxic industrial chemicals, and I'm barking mad. You should be, too. I grow 7 times faster than humans, so what happens to pets like me - increased cancer rates, for one - might be happening to people soon.

Did you know that the humans' government doesn't make companies test our toys, furniture, or even our food for safety?

I did a little work on this site (built in Drupal, by the way), but the bulk of the credit goes to the EWG team and Mike McCaffrey.

Clearly, this is a big problem and I look forward to more of EWG's research as this is something that is affecting our cat, too. Code Sorcery Workshop and Macy are definitely on board!

5 comments

Come say hi at Drupalcon Boston!

March 3, 2008 by Justin

Posted in

Though I'm headed to SXSW in Austin later this week, I'm in Boston right now for the twice-annual Drupal conference, aptly named Drupalcon. If you're around, say hi -- this is a fairly good representation of what I look like, though sadly I do not have my hat, or at least a non-winter hat. It's kind of cold up here!

Today, the first day, was pretty engaging and inspiring. I'm excited to see the roadmap for the future of Drupal, and I'm always on the lookout for knowledge that will benefit me both in the Drupal services that I provide as well as some cross-pollination of ideas between both web services and Mac desktop software, be they data storage, user interface, industry trends, or any number of other issues. Not to mention meeting & hanging out with some great folks!

Probably the most interesting session so far for me was the last, on Drupal in China, covering issues as varied as outsourcing & off-shoring, governmental technology initiatives, software piracy, language barriers & internationalization, community involvement, and lifestyle & free-time activities -- which of course, if you look at them, are all integral to China's current standing and future trajectory in the technology scene. Really interesting and fascinating stuff!

Though it's of course easy for me to hang out with the large DC representation up here, I do need to meet some other folks, so again, don't be shy -- come say hello. And enjoy Drupalcon!

DrupalCon Barcelona

September 27, 2007 by Justin

Posted in

Nothing like a vacation followed immediately by a European software conference to get things off track! I'm a little behind, but I'm starting to catch up and I wanted to post about a number of things, the first of which is Drupal.

I was fortunate enough to travel to Barcelona last week for DrupalCon, pretty much the ground zero for all things Drupal (more on my work with Drupal here). With over 400 people in attendance, the growth in the community has been tremendous and after three years working with the software myself, it's really inspiring to see both others picking it up as well as the future directions in which things are headed.

I attended a number of excellent panels (which I hope to blog about soon) including topics such as multilingual websites, advanced JavaScript, asset management, publishing workflows, database abstraction, new things in PHP 5, feed aggregation, and the scaling of the Drupal website, itself running Drupal (of course).

I had the pleasure of traveling and working with the team behind the project featured in this panel -- essentially, open source software in support of humanitarian and stabilization operations around the world -- as well as seeing Workshop friends and fellow DC residents Development Seed as they proceeded to tear up the conference with a multitude of outstanding sessions highlighting their amazing work. Read more about Development Seed's work on their blog and you might even see a cameo by yours truly!

Lastly, I was inspired to participate in Friday's Lightning Talks with a quick presentation of the Boost module, which I now co-maintain and have talked about previously. Will White from Development Seed was kind enough to capture a couple photos of me presenting the Environmental Working Group site. You can check out more of Will's DrupalCon pics here.

I'm looking forward to next year -- and if you're interested in Drupal, leave a comment and I may be able to point you in the right direction, be you a coder, site maintainer, or just a curious tinkerer!

Boost your Drupal site!

July 23, 2007 by Justin

Posted in

Drupal

Boost: Static HTML caching for Drupal

I've recently become quite familiar with Arto Bendiken's Boost for Drupal. For context, Drupal is an open source, modular, PHP-based content management system (CMS) that I use with many of my clients. Boost is a module for Drupal which assists you in caching content as static HTML, bypassing Drupal (and thereby PHP and MySQL) in order to handle much more traffic and serve content much more quickly. Essentially, you let Apache do what it does best -- serve HTML pages. With a busy site or one with a lot of content, this can be a lifesaver.

Arto has a good write-up about Boost in his original blog post. However, Boost is a little more complex than most Drupal modules, so what I hope to add here is a couple things:

the basics of what Boost gives you
how the two "halves" of Boost complement each other
how Boost gets you outside of Drupal entirely
the status of Boost with regard to Drupal 5.x
a little more detail about how it works
some caveats that I've found

Arto's documentation for the setup of Boost is great, so I won't be rehashing that. Rather, I hope to provide a little more technical info about how the module works.

The Basics

Basically, Boost is two parts: first, a Drupal module (in the traditional sense) that manages the cache and provides for an administrative user interface, and second, some rule lines to add to your Drupal site's top-level .htaccess file which allow Apache to bypass Drupal entirely and serve pages from the cache.

The biggest thing to understand about how Boost works is to understand its utilization of Apache's mod_rewrite via the .htaccess file rule lines (by the way, .htaccess is just the default name for files in your site that Apache will read for configuration info). Many people may not understand that Drupal's use of clean URLs is dependent upon mod_rewrite. Every URL on a Drupal site is basically just a path argument to the top-level index.php, which dispatches calls to various points in the code to handle that argument. So, when you go to /about, the index.php file actually gets an argument of about and determines what content to serve. Apache's mod_rewrite is able to keep the browser pointed at /about while actually running index.php. Another popular open source CMS, WordPress, behaves similarly.

Once you understand this, it's easy to understand what Boost does and why it requires mod_rewrite. Boost by default stores the cached versions of pages under /cache on your website (this path is configurable, though). Then, when a request comes in, the .htaccess file is consulted (because that's what Apache does), which tells it to look for cache files first before sending anything to Drupal's index.php. Since the cache files are plain HTML, they go out much more quickly than Apache running PHP, firing up Drupal, querying MySQL, and then serving content. Arto provides some graphs in his original post showing just how dramatic this improvement can be.

Lastly, a word about the cache filename standard. If in fact the /about URL were cached, it would actually be in your site at /cache/about.html. If Apache finds this file, it assumes that the cache is still valid (the Drupal module side takes care of expiring and removing stale content) and serves it directly. For path aliases (such as "/about should serve the same content as /node/137"), Boost uses UNIX symbolic links in the cache filesystem, so /cache/about.html would be a link to /cache/node/137.html.

Boost and Drupal 5.x

I have been using Boost on a Drupal 5.1 site, thanks to this port of Boost to Drupal 5.x by the maintainer of drupal.ru. This seems to be the only source of Drupal 5.x-compatibile Boost material currently. The only caveat to be aware of about this version is that by default, the front page is not cached -- more on this below. If your site is anything like the one I used Boost on, you will need to remedy this since your front page is likely your busiest as well as most complicated page and is in need of caching.

A Little More Detail

A couple other notes about Boost's operation:

Cache files are created on demand. For example, if your front page is not cached when someone requests it, Drupal will construct the page and cache the file, but serve the constructed page to the user. Every user thereafter, until the cache file becomes stale and is removed, will receive the cached version. If you have pages that are particularly demanding, think about running a cron to request them anonymously in order to get them cached for regular users.
Special paths like /user/login and /admin, as well as HTTP POST requests and any request for a logged-in user, are not cached. Arto has put a lot of thought into this area. Note that this means that sites with mostly logged-in users will not benefit from Boost very much -- anonymous users see the real benefit.
Boost takes over the configuration interface for Drupal's built-in caching mechanism. This just means that it avoids confusion between two types of caching and just "upgrades" your current setup to be Boost-ified.
Like the built-in cache, Boost has multiple cache lifetime intervals to choose from; anywhere from one minute up to one day.
Boost expires content in one of two ways. It implements hook_nodeapi to catch node updates, insertions, and deletions and responds to those, and it also implements hook_cron to expire content that has become stale but has not had any specific actions performed on it.
Technical note: Boost uses PHP's output control functions (i.e. ob_start et al.) and hook_init to intercept every Drupal page request, buffer the content, compare to and update the cache, and then send the content along through Drupal normally.
Nothing stops you from expiring content manually by deleting its file from the cache. However, note that for pages which have path aliases (and thus Boost symbolic links) to them, the links do not get removed automatically so you may cause some wonkiness by doing this.
Boost inserts a small HTML comment at the very bottom of cached pages with the start and end cache times so that you can tell if it's working and how long a given file will persist in the cache.

Caveats

Like any somewhat intrusive technology (and by this I mean that it works with every page and changes the way your site operates as a whole), Boost should be used with caution. Arto states that the project is still in an alpha state.

The biggest issue that I've noticed is a strange bug which occasionally caches the front page as a Drupal "access denied" page. Others have seen this as well and I've never been able to nail it down completely. This is the main reason why drupal.ru's port of Boost to Drupal 5.x leaves out the front page from caching. I was able to work around this by hacking Boost's boost.api.inc file, in the boost_cache_set function, to not cache pages containing the words "access denied". I hope to report more once I figure this out.

The second issue is that currently, Boost will not work for sites that are not at the top-level. That is, if your site is domain.com/mysite, it will not work -- only domain.com would work. I believe this is on the .htaccess side, but it only really affected me in testing a development version of the site and since I was able to set up a top-level sandbox, I didn't investigate it any further. Once again, if I make any improvements in this area, I'll update this post.

Conclusion

This concludes my overview of Boost. As I mentioned above, I will update this post if I make any progress on the (very minor) issues that I've had with it. It's a great system and I highly recommend it!

You may also be interested in my Drupal page here at Code Sorcery Workshop for more info about my work with Drupal.

Thanks for reading!

13 comments

Drupal DC: Of Jabber and HTTP Redirects

June 6, 2007 by Justin

Posted in

Drupal

Last night was the semi-regular monthly DC Drupal MeetUp and I got the chance to show what I've been working on lately. Since I take a keen interest in the systems side of web applications, two of the projects that I've been working on have been a perfect fit and made for great demos last night.

The first project is with MakaluMedia, whom you may have heard of from the winning Slashdot redesign. I have been involved in some research into the Drupal/XMPP space. XMPP is the protocol behind Jabber, the open source instant messaging protocol, and Drupal is a robust open source content management system that I've been involved with for a few years. Right now, things are only in the research stages, but eventually we will be developing a whole host of functionality in this area. I've put up some screenshots of what I showed last night on my Flickr account.

The other item that I showed was something that already existed in some form, but that I enhanced a bit. I've been working with DC's very own Environmental Working Group on their upcoming transition to a Drupal-based site and believe it or not, Drupal does not have any built-in functionality for doing external redirects (i.e. redirecting something like /blog to an external URL like http://blog.domain.com). While Jon over at professionalnerd.com had put together the http_redirect module, it was not compatible with Drupal 5.x and needed some slight UI adjustments to be useful for us. I recently made those changes and as per the GPL, am releasing them back out to the world. The new version is not yet available in Drupal CVS, but you can grab a tarball locally right here on my new Drupal contributions page.

In Cocoa news, I've got some things in the works that I'll hopefully post more about soon. Stay tuned!

2 comments

Drupal meets Cocoa at the corner of XML and RPC

April 5, 2007 by Justin

Posted in

Drupal

I had the pleasure of giving a "lightning talk" at the Washington DC Drupal MeetUp last night. Given the chance to talk about something vaguely Drupal-related for no more than five minutes, I gave a brief overview of taking advantage of the built-in XML-RPC capability of Drupal by showing a quick Cocoa app that I put together for file uploading with the asset module. I originally wrote asset.module over a year ago while working at EchoDitto but I recently extended it by adding XML-RPC capability (specifically for this talk, as a matter of fact). You can check out the slides, module, and app (including source, released under the BSD License) over here.

Happy uploading!

About

My name is Justin Miller and I founded Code Sorcery Workshop in 2006 as a way to keep abreast of technologies that fascinate me and to provide my skills to other individuals and organizations to help them do what it is they do best. Read more »