Understanding XML Namespaces
Namespaces are used heavily across many XML applications, tools, and technologies. Even if you decide not to use namespaces for your own custom XML vocabularies, you will eventually need to learn how they work. For example, XSLT uses namespaces to disambiguate between code and data. All of XSLT’s elements are in the XSLT namespace, while other elements are taken to be literal result elements, i.e. elements that are copied to the result as data rather than interpreted as code.
Table of Contents
Motivation
The “Namespaces in XML 1.0” recommendation cites two primary reasons you might want to use XML namespaces: to avoid name collisions, and to facilitate name recognition.
Avoiding name collisions
Namespaces allow you to use multiple markup vocabularies within the same document
without having to worry about name collisions. For example, you might have an XML
document that contains two different elements named title
. One of them might describe the title of a bibliographic reference, whereas the other
might describe a person’s professional title. It’s not really an issue if you control
the definition of both elements; you could tell the difference by their context within
the document. But the possibility for name collisions becomes a bigger problem when
you don’t control both definitions—perhaps because they were defined as part of distinct
schemas by different parties. XML namespaces addresses that problem by supplementing
(in this case) the name title
with a universally unique namespace name, also called a namespace URI.
Facilitating name recognition
Avoiding collisions is the most common rationale that’s given for using XML namespaces,
but an even stronger (and more positive) motivation for using them is that they facilitate recognition of elements or attributes based only on their namespace URI. For example, software
modules that are designed for processing elements in a given vocabulary, such as UBL
(Universal Business Language) orders, can be automatically invoked as soon as an element
in the UBL namespace appears in a document that you’re processing. In that case, your
code may not need to know anything about UBL orders except that their namespace URI
is urn:oasis:names:specification:ubl:schema:xsd:Order-1.0
. When you come across any element in that namespace, you can then dispatch to the
appropriate module that knows how to process UBL orders and let it do the work.
Grafted into the Foundation
Namespaces in XML were defined as a layer on top of XML 1.0. But in practice, that layer has become a required layer. Nowadays, when people say “XML”, they usually mean “XML 1.0 + Namespaces”. That doesn’t mean you must always use namespaces. It just means that if you don’t want to use namespaces, you must ensure that you:
- Don’t use colons (
:
) in your element and attribute names, and - Don’t use the
xmlns
attribute
Colons are reserved for namespace prefixes, and the xmlns
attribute is reserved for namespace declarations. If you avoid both of those, then
your XML can peacefully coexist with namespace-aware XML parsers. If your only purpose
for reading this chapter is figuring out how to avoid namespaces, then you can stop
here! But since you probably won’t be able to avoid them anyway, and since namespaces
are in fact quite useful, let’s take a look at how they work.
A Namespaces Primer
Below is a brief tutorial that explains how the XML namespaces mechanism assigns expanded names to elements and attributes. We’ll conclude this section with a brief FAQ that ties up some loose ends not addressed by the examples.
A simple example
An example use of namespaces is the Atom Syndication Format (RFC 4287), which is an XML vocabulary used for describing blog content. Take a look at the example Atom feed below.
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://xmlportfolio.com/xmlguild-examples"> <title>Example Feed</title> <rights type="xhtml" my:type="silly"> <xhtml:div> You may not read, utter, interpret, or otherwise <xhtml:strong>verbally process</xhtml:strong> the words contained in this feed without <xhtml:em>express written permission</xhtml:em> from the authors. </xhtml:div> </rights> <!-- ... --> </feed>
At the top of this document are three namespace declarations—one for Atom, one for XHTML, and one for a custom extension namespace that I made up. The one for Atom is called a default namespace declaration, because it applies to unprefixed elements:
xmlns="http://www.w3.org/2005/Atom"
It declares that all unprefixed element names in the document (in this case, <feed>
, <title>
, and <rights>
) have the namespace URI http://www.w3.org/2005/Atom
. In other words, these elements are “in the Atom namespace”. Default namespace declarations
like this one only apply to element names; they do not apply to attributes. That means, for example, that the unprefixed type
attribute on the <rights>
element is considered to be not in a namespace, even though it is defined by the Atom specification.
The next namespace declaration denotes that all elements and attributes prefixed with
“xhtml:
” (namely, <xhtml:div>
, <xhtml:strong>
, and <xhtml:em>
) are a part of the XHTML namespace:
xmlns:xhtml="http://www.w3.org/1999/xhtml"
If it were not for that declaration, the document would not be well-formed with respect
to namespaces, and a namespace-aware XML parser would complain with a message like
“The prefix xhtml
is not bound.” Any time you use a colon in an element or attribute name, you must
include a corresponding namespace declaration that binds that prefix to a non-empty
namespace URI.
Atom allows you to extend its vocabulary by defining elements or attributes in your own namespace. That’s what our third namespace declaration is for:
xmlns:my="http://xmlportfolio.com/xmlguild-examples"
In this case, we add our own my:type
attribute to the <rights>
element. This attribute has the same local name as Atom’s built-in type
attribute, but it has a different namespace URI, as indicated by the my
prefix. More accurately, since my:type
has a prefix, we know that it is in a namespace. On the other hand, we know that
the naked type
attribute is not in a namespace, because it does not have a prefix. Sometimes we say that elements
or attributes that are not in a namespace have the “null” or “empty” namespace URI.
Either way, it means the same thing.
An equivalent example
Below is an alternative representation of the same Atom document. Before reading further, see what else you can conclude about how namespace declarations work, based on comparing these two examples.
<feed xmlns="http://www.w3.org/2005/Atom"> <title>Example Feed</title> <rights type="xhtml" example:type="silly" xmlns:example="http://xmlportfolio.com/xmlguild-examples"> <div xmlns="http://www.w3.org/1999/xhtml"> You may not read, utter, interpret, or otherwise <strong>verbally process</strong> the words contained in this feed without <em>express written permission</em> from the authors. </div> </rights> <!-- ... --> </feed>
Did you notice the differences? This example reveals some additional features of namespaces:
- You can put a namespace declaration on any element, not just the document element. That binding is said to be in scope for that element and its descendants.
- You can override an in-scope namespace declaration. For example, the
<div>
element overrides the default namespace declaration, so that unprefixed element names among<div>
and its descendants will be in the XHTML namespace, not the Atom namespace. - It doesn’t matter what prefix you use. All that matters is that the namespace URI
is the correct one. For example, we used
example
as the prefix instead ofmy
this time around, but theexample:type
attribute still has the same expanded name as themy:type
attribute in the previous example. The expanded name has two parts: the local part (type
) and the namespace URI (http://xmlportfolio.com/xmlguild-examples
).
Disabling the default namespace declaration
A namespace-qualified attribute is easy to spot. If it has a prefix, then it’s namespace-qualified. If it doesn’t, then it’s not. Unprefixed elements, on the other hand, may or may not be namespace-qualified. That depends on whether a default namespace declaration is in scope. Consider the following simple XSLT stylesheet.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title>My Web Page</title> </head> <body> <!-- ... --> </body> </html> </xsl:template> </xsl:stylesheet>
As mentioned earlier, XSLT uses namespaces to distinguish between XSLT instructions
and literal result elements. In this stylesheet, <html>
, <head>
, <title>
, and <body>
are the literal result elements. These unprefixed elements are not in a namespace, because the stylesheet doesn’t have a default namespace declaration.
As you can see, using namespaces isn’t an all-or-nothing proposition. A document may
contain some elements that are namespace-qualified and some that are not.
The stylesheet above uses the conventional xsl
prefix for XSLT elements. What would happen if we decided that we didn’t want to
use a prefix at all? In that case, we would need to use a default namespace declaration:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
This associates unprefixed elements with the XSLT namespace. The only problem with
that is that our stylesheet includes elements that aren’t in a namespace at all. We
don’t want <html>
, for example, to be interpreted as an XSLT instruction (causing an XSLT error). We
need some way of disabling the default namespace declaration. Fortunately, the authors of the namespaces recommendation
thought of that scenario. Here’s the full stylesheet example again, using only default
namespace declarations (no prefixes):
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <template match="/"> <html xmlns=""> <head> <title>My Web Page</title> </head> <body> <!-- ... --> </body> </html> </template> </stylesheet>
As in the previous example, the <stylesheet>
and <template>
elements are in a namespace, but all the other elements are not. The xmlns=""
declaration on the <html>
element disables the default namespace declaration for that branch of the document
tree. Another way of looking at it is that xmlns=""
sets the default namespace to the empty string, which is effectively saying the same
thing.
Primer FAQ
The rest of this Primer is formatted as a FAQ, as a quick way to fill in the missing pieces and to clear up any misconceptions you might have at this point.
You showed examples of redefining the default namespace. Can you also override namespace declarations that use a prefix, binding the prefix to a different namespace URI?
Yes. In this example, <my:foo>
and <my:bar>
are in different namespaces:
<my:foo xmlns:my="http://example.com/uri1"> <my:bar xmlns:my="http://example.com/uri2"/> </my:foo>
Does the parser retrieve the namespace URI from the Web?
No. Consider the namespace URI as nothing more than a case-sensitive string.
Are http://example.com/ and http://EXAMPLE.COM/ the same namespace URI?
No. Even though they’re the same logical URI, they are different namespace URIs. An XML parser will not treat them as equivalent. They’re only the same if they’re the same string, character-for-character.
Is it okay to use a relative URI reference as a namespace name?
It’s strongly discouraged, though chances are it will just be treated like any other string. The Namespaces recommendation itself does not forbid the use of relative URI references, but because of inconsistent stories on whether or not they should be resolved into absolute URIs during namespace processing, the W3C decided to officially deprecate them. You can (and should) sidestep the whole issue by always using absolute URI references for namespace names. For more background on the decision to deprecate them, see http://www.w3.org/2000/09/xppa.
What about the xml:lang and xml:space attributes? I’ve seen those used without a corresponding namespace declaration.
This is the one exception to the rule that names with a colon must have a corresponding
namespace declaration. The Namespaces recommendation defines a fixed binding between
the xml
prefix and this namespace URI:
http://www.w3.org/XML/1998/namespace
Both the prefix and the namespace URI are reserved, which means you can’t override
this binding or bind a different prefix to this namespace name. While it’s legal to
explicitly declare the implicit binding using xmlns:xml
, it’s never necessary to do so.
I see how to determine what namespace an element or attribute belongs to. Is there a way to determine what elements or attributes belong to a given namespace?
No, there is no standard mechanism nor requirement for “registering” a namespace. One possible objection is that W3C XML Schema’s notion of “target namespace”, which associates a namespace URI with the elements in a given schema, provides a way to do this, as does the RDDL specification (http://www.openhealth.org/RDDL/). While these mechanisms exist, no one is required to use them. Namespaces are ripe for the picking. If you can type a namespace declaration, you can use a namespace. Thus, while we say an element is "in a namespace", we really are referring to the fact that it has such-and-such a namespace URI.
I see that you can disable a default namespace declaration using xmlns="". Can you also disable a namespace declaration that uses a prefix?
No, you can’t in XML 1.0. But you can in XML 1.1. It looks like this: xmlns:foo=""
.
But if a prefixed name is invalid without a corresponding declaration that binds the prefix to a non-empty namespace URI, why would you ever need to do something like <my-element xmlns:foo="">? What would that even mean?
Excellent question. You’re right that such an “undeclaration” of a prefix binding does nothing to aid the representation of element and attribute names. That’s because prefixed elements and attributes must be associated with a non-empty namespace URI. Unfortunately, things are more complicated than this: namespace declarations are now used for more than just representing element and attribute names. They are also used to qualify names that appear in XML content. Before we can fully answer this question, we have to consider the formidable topic of QNames in content.
QNames in Content
In an ideal world, XML namespaces would have a simple, cohesive purpose: the representation of element and attribute names in an XML document. The details of where you put your namespace declarations, what prefixes you use, and whether or not you use a default namespace would be mere lexical details. In that world, an XML processor could throw away those details and report just the expanded names of elements and attributes to the application.
Alas, that ideal world never really existed. That’s because core XML technologies, including XSLT and W3C XML Schemas, use namespace declarations to not only expand the names of elements and attributes, but also to expand QNames that appear in attribute values or document content (character data).
We’ll see an example of this shortly. But first let’s define what we mean by “QName.”
Remember that an expanded name describes a pair of strings: the local part and the
namespace URI. Well, a QName (short for “qualified name”) is the syntactic construct
for representing an expanded name. It’s an XML name with an optional colon (:
) character. For example, both foo
and my:foo
are QNames. The QName by itself doesn’t tell you what the expanded name is. You have
to consult the in-scope namespace declarations to determine that. (The one exception,
of course, is an unprefixed attribute name; in that case, you know that the local
part is the QName itself and that the namespace URI is null.)
One of the most common uses of QNames in content is in XSLT, as shown in this example:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:x="http://www.w3.org/1999/xhtml"> <xsl:template match="/"> <x:html> <x:head> <x:title> <xsl:value-of select="/x:html/x:head/x:title"/> </x:title> </x:head> <x:body> <!-- ... --> </x:body> </x:html> </xsl:template> </xsl:stylesheet>
In this example, the stylesheet is using the XHTML namespace declaration not only
for element names but also for resolving names in the XPath expression /x:html/x:head/x:title
(for extracting the title value from the input document). In the ideal world I alluded
to, you would be able to change the lexical details of a namespace declaration without
breaking a thing. In this example, I’ve changed the x
prefix to xhtml
:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <xsl:template match="/"> <xhtml:html> <xhtml:head> <xhtml:title> <!-- broken --> <xsl:value-of select="/x:html/x:head/x:title"/> </xhtml:title> </xhtml:head> <xhtml:body> <!-- ... --> </xhtml:body> </xhtml:html> </xsl:template> </xsl:stylesheet>
Now the elements still have the same expanded names and the document is perfectly
well-formed, but the stylesheet breaks. Specifically, the XPath expression can’t be
evaluated, because there’s no way to determine what namespace the x
prefix binds to. You may be wondering why this is really a problem. Why didn’t I
just update the XPath expression to use the xhtml
prefix instead? Admittedly, that would be easy enough to do since I was just editing
the XML document by hand. Having to make one more edit is not the issue. The real
problem is the additional burden this dependency places on general XML processors.
No longer is it safe for an XML processor to merely report the expanded names of elements
and attributes, throwing away the details of how the namespaces are declared. Instead,
the XML processor must now preserve the set of in-scope namespace bindings for each
element—all because document content (or attribute values in the above example) might depend on that information.
Such practice went against the original spirit of the Namespaces in XML recommendation, but we have to live with it today. And while this puts an additional complexity burden on everyone—whether beginners who are learning XML, developers writing XML processing tools, or writers of specifications that depend on XML, the good news is that it’s a one-time cost. All you have to remember is that, when writing generic XML processing tools, the namespace prefix bindings are significant too, not just the element and attribute names they help define. More precisely, for each element, you must preserve its set of in-scope namespaces, a property that is formalized in the XML Infoset specification.
Coming to terms with QNames in content is a rite of passage every XML developer must go through: the initial realization, the horror, the protest, and finally acceptance. Mine has even been documented.
Misusing the namespace context
We’ve now seen that because QNames might appear in content, namespace context must be preserved, including the prefixes. But just because you can get the value of a namespace prefix doesn’t mean you should let any of your application code depend on it.
At a client site, I recently used an internally developed tool for generating HTML
documentation for W3C XML Schema documents. I ran it against some sample .xsd files
and was dismayed to see this error message: No xs:schema element found. Well, the problem was that my input .xsd files used the xsd
prefix rather than the xs
prefix—a perfectly acceptable choice, as would be any other prefix (or no prefix,
using a default namespace declaration) so long as it’s bound to the right namespace
URI. The problem was that this application was hard-coded to only recognize the xs
prefix. What it should have been doing is looking for the W3C XML Schemas namespace
URI and ignoring the prefix. There are only a few practices that I would describe
as universally bad, and this is one of them.
XSLT doesn’t prevent you from making this same mistake. See “Perils of the name() function” later in this chapter.
Overloading “QName”
While QName is defined as an XML name with an optional colon (:
) character, more recent usage has complicated things a bit. In XML Schema Part 2:
Datatypes, the value space of the xs:QName datatype is the set of all possible expanded
names (that is, tuples of local part and namespace URI), while the lexical space is
the set of all strings that match the QName production of the Namespaces recommendation
(that is, what we have been calling a QName: a name with an optional prefix). In XPath
2.0 and XQuery 1.0, the datatype xs:QName is actually a triple consisting of namespace
URI, local part, and prefix.
The upshot is that you have to be careful to note the context when someone uses the term “QName”. They might be referring to a simple, self-contained string consisting of a name with an optional prefix; or otherwise to an “object” from which you can also extract the namespace URI.
Un-declaring Namespaces
Now we can finally answer that last question from the Primer FAQ, earlier in this chapter. Here it is again, in a nutshell:
Why does Namespaces in XML 1.1 allow this: <my-element xmlns:foo="">?
The first thing to keep in mind is that Namespaces in XML 1.1 only applies to XML 1.1, so you may not find yourself directly using this very often (if ever). But you might see it output from XML 1.1-aware tools.
The reason it’s needed is that it allows you to embed an XML fragment into another document, using technologies like XInclude, without cluttering the in-scope namespaces property of the elements in that fragment.
For example, the following document uses XInclude to embed another document (doc2.xml) inside it:
<my:doc xmlns:my="http://xmlportfolio.com/xmlguild-examples"> <xi:include href="doc2.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/> </my:doc>
Let’s say that the content of doc2.xml looks like this:
<simple> <remark>We don't use namespaces.</remark> </simple>
Namespaces aren’t used in doc2.xml, so the only in-scope namespace that is present
for the <simple>
and <remark>
elements is the implicit one that binds the reserved prefix xml
. But the situation changes when we perform the inclusion:
<my:doc xmlns:my="http://xmlportfolio.com/xmlguild-examples"> <simple> <remark>We don't use namespaces.</remark> </simple> </my:doc>
Now the <simple>
and <remark>
elements have an additional namespace binding—for the my
prefix, which is inherited from their ancestor. The embedded document has effectively
been altered simply by being included inside another document. To be sure, the names of the elements have not changed, but their set of in-scope namespaces has been augmented.
The only way the XInclude processor can avoid making this alteration is if it is able
to un-declare that namespace binding. And the only way it can do that is if it supports XML 1.1
as its output format:
<?xml version="1.1"?> <my:doc xmlns:my="http://xmlportfolio.com/xmlguild-examples"> <simple xmlns:my=""> <remark>We don't use namespaces.</remark> </simple> </my:doc>
While that may look more cluttered, it’s actually less cluttered with respect to the set of in-scope
namespaces for each element. The xmlns:my=""
declaration (or rather undeclaration) has the effect of removing the my
namespace binding from the scope of <simple>
and its descendants.
What is the practical import of all this? Why does it matter? The negative impact
of unwanted namespaces is really only felt when you later go to extract that same
document out of its containing envelope. For example, in XSLT, you could use an instruction
like this to perform a deep copy of the <simple>
element:
<xsl:copy-of select="//simple"/>
If your input was of the XML 1.0 flavor (without namespace undeclarations), then the serialized result of that copy will look like this:
<simple xmlns:my="http://xmlportfolio.com/xmlguild-examples"> <remark>We don't use namespaces.</remark> </simple>
All we wanted to do was get back the contents of our original doc2.xml file, but instead
we see that a namespace has “bled through” as an artifact of the document’s processing
history. That’s almost certainly not what we intended. We have no use for that namespace
declaration, but there’s no getting around it. It must be present in order to accurately
represent the in-scope namespaces of the <simple>
and <remark>
elements as they occurred in the input document.
On the other hand, if the input was of the XML 1.1 variety and it used a namespace
undeclaration to keep <simple>
and its descendants pristine (free from unwanted namespaces), then we’ll get the
uncluttered result that we wanted:
<simple> <remark>We don't use namespaces.</remark> </simple>
Also, you can see that the namespace undeclaration isn’t present anymore, now that
<simple>
has been extracted from the containing document. The my
prefix binding is no longer present on an ancestor element, so there’s no need to
disable it.
SOAP, which uses XML “envelopes” as a transport mechanism for other XML documents, has the same problem as XInclude. In XML 1.0, this simply cannot be done in a clean way—any SOAP-related namespace prefixes will bleed through into the embedded document. The real kicker was described in the requirements document for Namespaces in XML 1.1:
Even worse, the inability to roundtrip an infoset through XML accurately prevents accurate canonicalization, and the security features based upon it [like XML Digital Signatures and XML Encryption].
So does that mean we should all be using XML 1.1? Not hardly. First of all, the addition of namespace undeclarations is a small change compared to other changes in XML 1.1 (such as the expanded set of allowed Unicode characters in element names). Secondly, while it’s possible that most of the world will eventually migrate to XML 1.1, the most likely situation is that XML 1.0 will continue to be used alongside XML 1.1 for a long time. XML 1.0 is firmly entrenched and meets the needs of most applications. Use XML 1.1 (and supporting tools) only when XML 1.0 does not meet your needs—such as when you absolutely must have the ability to un-declare namespace prefix bindings.