The use of RDF in RSS has produced reactions ranging from support for the use of RDF so strong that its proponents can't envision a sane argument against to a dislike verging on a phobia (often related to onomacoelphobia - the irrational fear of namespaces).
A very common reaction has been confusion or apathy (the latter in particular once one realises that RSS can be used in complete ignorance of RDF).
This document aims to explain the RDF nature of RSS to those who have at least passing familiarity with RSS, but who have little or no knowledge of RDF (those of you who do know about RDF but still don't like it will have to wait for another document for me to argue against you☺, those of you who grok RDF and like it are invited to comment on my, no doubt flawed, explanation). It will begin with a very simple (over-simple?) explanation of RDF and then construct an RSS document in a "cookbook" manner, examining its RDF at each step.
RDF says stuff about things. Whenever RDF begins to seem complicated it can be helpful to remember that simple definition, all RDF ever does is to say stuff about things.
What are these "things" RDF says stuff about? In RDF terminology they are resources. On the World Wide Web resources are identified by URIs; by extension a resource is anything that can be identified by a URI. Sometimes these URIs can be used to download or access the resource, and sometimes they can't. The important thing is that anyone who uses that URI uses it to identify the same thing.
What "stuff" does RDF say about these resources? RDF uses a bunch of URIs and strings called triples. As the name implies they are comprised of groups of 3. The first item is the URI of the resource we are talking about. This is called the subject.
The second is called the predicate, and the third is called the object. It can sometimes be useful to think of the predicate as a property and the object as the value of the property, or the predicate as a verb, with subject and objects being equivalent to subjects and objects in grammar. The predicate is also a resource, we'll explain how later, and the object is either a resource or a literal string.
The canonical beginners RDF triplet (the "Hello World" of RDF) is an attempt to use RDF to express the sentence:
Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.
(Ora Lassila was co-author of the first W3C Recommendation about RDF).
The subject here is the resource at http://www.w3.org/Home/Lassila, the predicate is "Creator", and the object is the string "Ora Lassila".
This can be represented by the following diagram:
Now as we said earlier predicates are also resources. Since they are resources they have URIs. How can "creator" have a URI? It obviously can't have a URL like that of a web page from which we can download "creator", but we can have a URI that identifies the concept of "creator".
Who assigns these URIs to predicates? Anyone. RDF defines a few that are particularly useful, and RDF Schema defines a few more for describing resources and predicates. Using these predicates other people can create their own predicates and describe them in RDF itself.
The Dublin Core Metadata Initiative is an organisation that defines properties that it is useful to record about many different types of document. One possible use for such properties is as RDF predicates, and there is a Dublin Core property of "Creator". Dublin Core have assigned the URI http://purl.org/dc/elements/1.1/creator to mean "Creator", so we can now use that URI in the above graph, changing it to:
Another way of expressing this triple is with the following "N-Triples" syntax:
<http://www.w3.org/Home/Lassila> <http://purl.org/dc/elements/1.1/creator> "Ora Lassila"
Where resources are represented by their URI between less-than and greater-than symbols, and strings are enclosed in quotes.
From now on we will express all RDF as a diagram like the one above, using the N-Triples syntax as a textual alternative.
The RSS hacker might be wondering "where's the XML?" RDF is not in itself an XML application. It does however use XML for serialisation (converting to bytes for storage or transmission), which is called RDF/XML. One way of encoding the graph above in XML would be:
Another possible encoding would be:
Both of these encode exactly the same graph, and from an RDF perspective are exactly the same. RSS does not have the same freedom of expression; there is generally only one correct way of expressing the RDF behind RSS in the RSS document.
One of the clever things about RSS is that it can be used either as RDF, or as a more traditional XML application. Restricting the number of ways RSS can be written makes it easier for those parsers that don't care about RDF to process the RSS. At the same time the fact that it is still one of the allowed encodings of RDF means that an RDF processor can read RSS as well.
The way RDF/XML makes use of namespaces is interesting. But first a brief note on namespaces in general. Skip this if you are already familiar with namespaces.
One of the strengths of XML is that anyone can define his or her own document type for his or her own uses. In doing so one decides on the names of elements and attributes, what they mean and how they are used.
This is great if you are either using the XML internally, and in isolation.
When you use XML for a public document type the first problem with this comes
up. While your XML may only ever be seen by a tool that is designed to handle
it (for example, XHTML is primarily read by web browsers, many of which don't
even know they are dealing with valid XML), it may also be processed by a
general purpose XML tool which has no way of knowing if
<li> is a HTML list item, an RDF list item, or a completely
The second problem is that if you try to mix different types of XML together then there may be an element name used by both types of XML and it could be difficult or impossible to tell which is which. This means you have to either disallow such mixing, or else have strict rules on how they should be mixed which requires one person or organisation to set these rules.
One particularly nasty side-effect of this inability to mix freely is that either one person has the sole rights to decide what is and isn't allowed in the next version of the document type, or else lots of competing, and possibly conflicting, versions can be produced resulting in the kind of hassles that the "browser wars" between Microsoft and Netscape produced (and that was relatively harmless given that there were only two powerful combatants, if there'd been just one more browser producer with comparable weight it could have become simply too confusing to keep track of).
Namespaces solve this problem by enabling document authors to associate an
element or attribute with a URI. There is no conflict between the
<li> in html and the
<li> in RDF because
the first has a URI of http://www.w3.org/1999/xhtml
and the second has a URI of
These URIs (called "Namespace Names") are associated with elements and attributes through a mixture of prefixes and xmlns attributes.
<html> and every child element of
<html> that doesn't have a prefix has a namespace name of
http://www.w3.org/1999/xhtml (unless such a child
overrides this by providing its own xmlns attribute).
<ht:html> and every child element of
<ht:html> that has a ht: prefix has a namespace name of
To a modern XML parser this is exactly the same element as the previous example. However it wouldn't be backwards compatible with old browsers that don't pay attention to namespaces.
This comes into its own when we mix different types of document. XSLT always uses this, a typical example would be:
That is the first element of a stylesheet designed to convert an RSS document into a HTML document. Within that document an element name beginning with xsl: is interpreted as being part of the stylesheet itself, an element name beginning with rdf: is interpreted as being from the RDF namespace, an element name beginning with rss: is interpreted as being from the RSS namespace, elements with no prefixes are interpreted as being from the XHTML namespace.
Without such a mechanism it would be impossible for the XSLT to tell the
difference between an RDF
<li> and a HTML
<li> or even an XSL
<param> and HTML
This doesn't mean we can necessarily mix any XML in an ad hoc manner; but it does mean that when creating a type of XML document you can state places where it is acceptable to have elements from either particular foreign namespaces, or from any namespace.
Because of this I, you, or anyone else can extend such a document type by introducing elements from our own namespace and we will do no damage to each other's work. Parsers are written to expect namespaces they don't know to be used and can handle such elements in an appropriate manner (in the case of RSS it's generally best to ignore them entirely).
Most attributes in most namespace-using documents are defined in the context
of their containing element. To find out what type means in
However attributes can also be defined as belonging to a namespace. The
xml:lang attribute is an example of this; when it is used in
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-IE"> its meaning
isn't defined by the element it is used on, or in the context of HTML at all.
It is defined by the namespace
http://www.w3.org/XML/1998/namespace (this is a
special namespace that is always considered to have been declared, as if every
element had the attribute
xmlns:xml="http://www.w3.org/XML/1998/namespace" on it).
It is important to note, that there is nothing important about the prefixes
used. As such
<html xmlns="http://www.w3.org/1999/xhtml"> should
always be considered as being exactly the same element. You can use any prefix
you want as long as it doesn't begin with the letters "xml" in any case
(because the usual rule that elements and attributes beginning with such
letters are reserved for elements and attributes core to the behaviour of XML
Namespaces play a special rôle in RDF/XML. Firstly it's important for the growth of RDF that anyone can define types of resources and predicates, and the use of namespaces decentralises such development and allows us all to define our own RDF resources.
Secondly RDF/XML uses namespaces in tandem with the way that RDF uses URIs to identify resources. Consider the earlier example again:
<rdf:RDF> element is just a wrapper around the RDF
<rdf:Description> element and rdf:about attribute
identify the subject.
<dc:creator> is interesting. When an RDF/XML parser
encounters any element or attribute apart from a few (like
<rdf:RDF>) that are used by the syntax itself it constructs
a URI by adding the local name of the element or attribute (in this case
"creator") to the namespace name (in this case
http://purl.org/dc/elements/1.1/) to create a new URI
(http://purl.org/dc/elements/1.1/creator) that it
uses in the RDF. Because of this concatenation namespace names that are
primarily intended for use with RDF often end with #, so that the constructed
URI is a fragment in the original URI, or with / so the constructed URI is a
step further along the path hierarchy than the original URI.
So it can take the subject from rdf:about (http://www.w3.org/Home/Lassila), the URI it has constructed in this manner (http://purl.org/dc/elements/1.1/creator) and the value "Ora Lassila" and construct the RDF:
Which round-trips us back to the RDF we began with.
There is a predicate that is very commonly used, and of great importance to RSS which is http://www.w3.org/1999/02/22-rdf-syntax-ns#type (abbreviated to rdf:type in most RDF/XML documents). This predicate identifies the resource as being of a particular "Class". Classes are RDF categorisations of resources which can be further used in statements which may, for example, state that a predicate only ever applies to resources of that Class, and so on. Because rdf:type is so commonly used there is a special RDF/XML abbreviation which is heavily used in RSS. Beginning with the following RDF/XML:
We can see that that equates to a diagram like:
Here for the resource http://www.example.com/feed.rss we have two statements using it as a subject; one with a predicate of http://purl.org/rss/1.0/title and an object of the string "Example", and the other with a predicate of http://www.w3.org/1999/02/22-rdf-syntax-ns#type and an object which is another resource, identified by http://purl.org/rss/1.0/channel.
The abbreviated syntax for this construction performs a similar namespace-based abbreviation on the URI of the object of the statement containing the predicate rdf:type. In the above example this would give us an element with no namespace prefix (hence binding it to http://purl.org/rss/1.0/) and a local name "channel".
This element name is then used in place of the rdf:Description element to give us:
Already we have the beginnings of an RSS document. When using this way of encoding RDF into XML we end up with elements encoding subjects, enclosing elements encoding predicates, enclosing either strings or elements which encode objects.
In the case where the object is a resource, and hence encoded as an element, it can contain further elements to define predicates of statements where it is the subject. Because this style of encoding results in an alternation between resources and predicates it is called "Striped RDF syntax". (see Dan Brickley's RDF: Understanding the Striped RDF/XML Syntax).
We're going to take the beginning RSS document we arrived at earlier and add to it. First we'll add the simpler elements which take PCDATA values:
Example.com - Home of thousands of hypothetical sites that never got built.
This gives us the graph:
We can see how the three strings are associated with the channel
http://www.example.com/feed.rss. At this point it may
be worth tackling a question that frequently comes up; what should be the
rdf:about URI of the channel element? The question arises because there are
two different practices in common use, one is to use the URI of the feed
itself, and the other to use the same URI as the
It's clear from the graph that RDF considers that URI to be an identifier for the channel. So the question really is, what is a channel?
In some cases the channel can be considered to be another view of the same
information that is in the page
<link>ed to. An example is
the feed at
contains much the same information as the central "news" section of the W3C
home page (and in fact the RSS is produced from an XSLT transformation of the
page's HTML). In such a case using that URI as the rdf:about URI makes a lot
Another view though is that an RSS channel doesn't really have any existence outside of an RSS document. This would seem to make the URI of the document itself, or a URI of "" (which would after all be a relative URI that would resolve to the URI of the document) to be the obvious choice.
Neither choice is terribly wrong. However a better understanding of the RDF nature of RSS may help you decide which one seems most correct for a given RSS document.
The association of the channel with it's string children is pretty straight-forward, so lets press ahead and add an image:
Example.com - Home of thousands of hypothetical sites that never got built.
Which produces the graph:
Here we can see that the object of one statement is the subject of more statements. RDF graphs can be extremely rich sources of information in this manner as more and more resources are linked by being the subjects and objects of statements.
The linkage between the two is provided by the rdf:resource attribute on the image element. This gives the URI of a resource which is an object of the statement. Because this URI matches that in the rdf:about attribute of the latter image element we know that the same resource is being referred to.
It's worth noting that from an RDF perspective the following document would be identical:
Example.com - Home of thousands of hypothetical sites that never got built.
However RSS doesn't allow this syntax. Why not? Again, it's easier for parsers that aren't based on RDF to process RDF documents if we limit the ways that RSS can be written. The separation also helps avoid confusion about the fact that http://purl.org/rss/1.0/image. is being used both as a predicate and as a Class type.
<textinput> works in a similar manner to
<image> and the reader should understand how it works in
RDF from the above, so the only part of the core RSS spec that we haven't
used yet is probably the most important; the RSS items.
We'll only add two to keep our example relatively simple:
Example.com - Home of thousands of hypothetical sites that never got built.
<title>Our First Example</title>
<description>Our first example RSS item.</description>
<title>Our Second Example</title>
<description>Our second example RSS item.</description>
<rdf:li> could also have an "unqualified"
resource attribute which didn't have an prefix. This is commonly used in RSS, but has been deprecated in RDF. This document assumes a similar deprecation will occur in RSS and uses the qualified form.
This produces the graph:
There are a few interesting things here.
The first is the node which has no URI in the graph and is denoted as _:gen1 in the N-triples in the text version. This resource doesn't have a URI because it doesn't need one. As far as RDF is concerned it could have a URI that the RDF parser doesn't know about (but may learn about later), but it is enough here that it is correctly placed in the graph. The _:gen1 in the N-triples is used to correctly match up various statements for which this resource is either the subject or object, but it doesn't mean anything outside of that context (it could be a resource that another RDF document assigns a URI to, or it could be referred to with a different code in another N-triples description).
The second interesting thing is that there are no predicates with the URI
instead these have been replaced with
<rdf:li> is just a convenient way of encoding a list that
<rdf:_1> and potentially goes on forever.
The anonymous resource is of type http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq. RDF defines 3 types of collection, rdf:Seq (which we are using here) which contains a sequence of items for which the order is significant, rdf:Bag which contains items for which the order is not significant and rdf:Alt which contains items which are equivalent to each other and of which one may serve the purposes of the others (the first item being the preferred alternative).
Because rdf:Seq is used RDF parsers know that the order of the items is important. In other cases RDF ignores document order.
So rather than saying "this channel has two items; doc1.html and doc2.html" we say "this channel has items; an ordered sequence. This ordered sequence's first member is doc1.html and its second member is doc2.html".
By the way, to RDF there is nothing to say that there isn't a third member of the sequence; there may be a third member that we aren't telling the RDF processor about, and there may not. In the case of RSS this is pretty much an academic point.
Okay we now have a bit of an understanding of how an RDF processor interprets an RSS document. The one remaining question is "who cares?". After all we can write RSS parsers which don't use RDF, so the only possible value to all this is if an RDF processor written by someone with no knowledge of or interest in RSS can do something useful with all this.
Well the first thing an RDF processor can do is obtain the RDF Schema document for RSS.
It is in itself an RDF document, and is actually about the very class types and predicates defined by RSS. We'll take just a tiny sub-graph from that document and examine it closely:
The schema has more stuff to say about the rss:description predicate, but we'll just consider the consequences of the statement using rdfs:subPropertyOf, which produces the graph:
Now http://purl.org/dc/elements/1.1/description is a URI defined by Dublin Core, who we mentioned before. It refers to a predicate that is used to indicate a description. This is comparable to rss:description, but because of Dublin Core's almost universal applicability dc:description is much more likely to be understood by a general purpose RDF application. It is also likely that someone will be seeking to find a particular string somewhere inside the dc:description of a resource as part of an Internet search.
Now if we add this graph to the graph of our RSS we can then apply some rules to produce the "RDFS closure" of the graph.
One of these rules states that if we have a sub-graph of the form:
And you have another graph of the form:
Then we can add the sub-graph:
Therefore any RDF processor can create a new statement which is the same as our description, but using the Dublin Core description instead. Hence without any knowledge of RSS this RDF processor now has a piece of text it knows is a description of the resource at the URI in the rdf:about.
Similar graphs can be added for RSS's title, link and URL predicates.
When considering RSS Modules and RDF we can group the modules into 2 main groups; those for which RDF is of great importance, and those for which it is secondary.
An example of a module for which RDF is of great importance is the Dublin Core module. The Dublin Core module allows one to add more statements to an RSS document which use the Dublin Core predicates. The main advantage in doing so is that the RSS document now serves as an RDF document about the items and image it uses for little additional overhead over that required to produce the RSS. This is a great way to begin providing Dublin Core metadata about your site in RDF form "cheaply", especially at this stage in the development of RDF when a more complete solution may be hard to justify.
Another example is the Creative Commons module proposed recently. This allows metadata records to contain information about the copyright license of the feed itself, and of the objects (items, images, etc.) pointed to by it. The fact that the copyright license is "out of band" with the item pointed to means that a search-engine catalogue of the item can store information about the license with it's information about the resource even if the license restricts it's right to store the resource itself.
It's harder to think of a clear example of a module that doesn't benefit from RDF at all, so for our example we'll invent a nightmare "Anti-RDF-Module-From-Hell" and then show how it can co-exist happily with RDF.
Our module is going to allow any XML content whatsoever. As long as it's well formed we aren't going to stop people using it. We would probably advise that the content be namespace-qualified, but we won't even insist on that.
Obviously if we're allowing any XML content we can't be sure that this XML content will be okay as RDF/XML. However we can survive this. Lets give our module the namespace http://www.example.com/arbitrary#, and define a predicate of http://www.example.com/arbitrary#stuff to mean the arbitrary stuff we're associating with an RSS item.
We can prevent this from destroying all our RDF goodness by adding the attribute rdf:parseType="Literal" on the stuff element like so:
<!-- more elements elided for brevity -->
The graph this produces looks like:
As you can see the contents of the arb:stuff element haven't been processed as RDF/XML, but are considered to be a literal value in much the same way as the string contents of a predicate like rss:title.
Because this mechanism is available to us there is no need for any module to damage the RDF qualities of RSS, and it adds little complication (one attribute with a fixed value) to how such modules work.