This is the first part in a series of articles on RDFizing Drupal, showing how you can make use of the RDF module for Drupal 6.x to set your data free and connect your Drupal site to the emerging Linked Data web.
If you've been wanting to use Drupal 6.x for creating RDF-enabled websites you've probably been annoyed at the fact that Drupal outputs feeds in RSS 2.0 format, which isn't based on RDF. This article will show you, step by step, how to upgrade[1] all of Drupal's RSS feeds into clean, extensible and RDF-compatible RSS 1.0 format.
To get started you must first, of course, install the RDF module. Any version of the module since RDF 6.x-1.0-alpha6 ought to do fine. There are no dependencies that you need to care about[2] other than ensuring your PHP version is sufficiently recent (PHP 5.2.0 or newer). Just follow the installation instructions in the accompanying INSTALL.txt file, and then enable the module at Administer » Site building » Modules:
After enabling the module, navigate to Administer » Site configuration » RDF settings » Feeds:
On this screen you'll see a listing of all the available RSS feeds output by Drupal's core modules. These are published, in Drupal 6.x, by the Node, Taxonomy, Blog and Aggregator modules. (If you've installed the Views module, this screen will also list any RDF feeds you've enabled for your views[3].)
To upgrade any of the core feeds to RDF format, simply use the enable action. This will present you with the following choice:
Note that you can downgrade back to the default RSS 2.0 feeds at any time, so don't be afraid to experiment. To RDFize your feed, simply select the RSS 1.0 option and save the configuration. You will be returned to the same screen with a status message indicating that the feed was upgraded:
Once you've upgraded a feed, some additional configuration options will be made available to you. (Note that you don't need to change any of these settings if you don't want to, and everything will work just as before using the defaults; feel free to skip ahead several paragraphs if you don't care to tinker with this at the moment.)
In the Channel settings section, you will find settings that implement the RSS 1.0 syndication hints specification. This is a standard that specifies advisory metadata that you can include in your feed to tell feed readers how often your feed is updated. This allows aggregators to optimize how often they'll re-fetch your feed, and hence also affords you some potential control over your bandwidth usage:
Here you can also change the RSS feed's serialization format. RDF can be represented in a wide variety of serialization formats, and the RDF module provides support for some of the most popular ones (if you install the optional ARC RDF library, you will get support for yet more formats). However, only explicitly RDF-aware feed aggregators can handle anything else than the default RDF/XML serialization, so be advised that changing this setting is probably a rather bad idea for the time being.
Below the channel settings you will find a section for configuring how feed items (that is, Drupal nodes, taxonomy terms, and such) are output in the RSS feed. At the moment, you have additional two settings: you can configure how body fields get output (using the teaser only, or including the full text), and you can configure whether date/time information in the feed includes the time zone component (if applicable, such as for Date module fields) or whether all times will be output in UTC:
Once you're done with the feed settings, save the configuration and you'll be returned to the RDF feed management screen. Notice that the Operations column indicates which feeds have been upgraded to RDF, with the enable action changing to configure where applicable:
A special note on Drupal's front page feed, rss.xml: once you've RDFized this feed, it isn't ideal that it still has the all-too-generic URL extension .xml. You can certainly keep it that way if you wish (feed aggregators parse feeds based on the MIME content type, not the file extension), but Drupal makes it so trivially easy to rename the feed's URL that I'd recommend doing so. A more appropriate extension for RSS 1.0 feeds would be .rss or .rdf. You can rename the feed URL by navigating to Administer » Site building » URL aliases » Add alias and entering something like the following:
Based on the feeds listed at Planet RDF, index.rdf would seem to be the most popular URL for a front page feed, so that's a data point to take into consideration. (I've been contrarian on this, myself, and named my blog's feed simply blog.rss, intending it to only include blog posts. I'm using the .rss extension to differentiate my RSS feeds from other RDF data that I will publish here later using the usual .rdf extension.)
Now, in a similar way as you would rename Drupal's rss.xml, you can also define URL aliases for any of the other non-wildcard feeds listed on the RDF feeds management screen:
And if you'd like to rename any of the displayed wildcard feeds, such as the taxonomy feeds at paths of the form taxonomy/term/%/0/feed, that's easy enough to do by installing the excellent Pathauto module that will automatically create such URL aliases where needed. Here on my blog, for instance, all my tags have RDFized feeds with URLs of the form http://ar.to/tags/drupal.rss.
If you're a perfectionist, consider also installing the Global Redirect module to ensure that attempts to access a non-aliased feed URL will result in an HTTP redirect to the canonical aliased URL. For example, should you try to load up http://ar.to/rss.xml, you will be redirected to http://ar.to/blog.rss which is the alias I've defined for my front page feed. Among other benefits, this makes sure that search engines won't index both URLs.
Once you've RDFized your feeds, you may want to use W3C's RDF Validation Service to double-check that everything turned out a-okay and that your feeds are indeed valid RDF. My blog feed is clearly bursting at the seams with RDFness, as validating it yields the following reassuring message:
In case you are still learning RDF, the validation service is also a great way to view the underlying triples (RDF statements) that constitute RDF documents such as your RSS feed. You can get the triples listed both in table format and rendered as a graph in a variety of graphics formats; this can really be helpful in grokking how simple RDF actually is beneath all that XML verbiage.
Well, that's all for now. Go forth and RDFize all your feeds; you know you want to. I will add a link here to the first several people who upgrade their Drupal feeds per these instructions (just leave me a note with a link to your site). And should you run into any trouble with these instructions, please post an issue at drupal.org and we'll see if we can sort it out.
In the next couple parts of this article series, I'll be talking about how you can include additional CCK fields in your RSS feeds, and how to enable RDFa (affectionately known as "microformats on steroids") on your Drupal site. Be sure to subscribe to the aforementioned feed to get these upcoming articles!
Update: Julia Kulla-Mader (RSS) and Kaido Toomingas (RSS) are the first pioneers to brave these waters and RDFize their feeds. Anyone else?
[1]
I won't here delve into the history of the RSS 2.0 controversy, but sufficient to say that RSS 2.0 ("Really Simple Syndication") represents a downgrade from RSS 1.0 ("RDF Site Summary") in terms of capabilities and potential. You've heard of "embrace and extend", right? Well, try "co-opt and cripple" on for size. (Update: I posted some more on this at the Reddit thread and at groups.drupal.org.)
[2]
Note that for the purposes described in this article, you don't have to install the optional ARC RDF library; the RDF module includes native support for RDF/XML output using PHP's XMLWriter extension. This extension is available by default since PHP 5.1.2, though FreeBSD users may need to explicitly install the php5-xmlwriter package.
[3]
Developers: see hook_rdf_feeds() in rdf.module for an example on how you can declare RDF-compatible feeds that will be listed on this screen.