How XSLT Works
XSLT is a language for transforming XML documents. As described in Chapter 1, “Data Model”, the XSLT processor is concerned with three XPath data model trees: the source tree, the stylesheet tree, and the result tree. Figure 1 shows the relationship between these three. The stylesheet and source trees are fed to the XSLT processor, which then produces the result tree.
Stylesheet Structure
The general structure of an XSLT stylesheet looks like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- optional top-level elements, such as: --> <xsl:import href="..."/> <xsl:param name="..."/> <!-- set of template rules: --> <xsl:template match="...">...</xsl:template> <xsl:template match="...">...</xsl:template> ... </xsl:stylesheet>
The document, or root, element of the stylesheet is xsl:stylesheet
. Alternatively, you can use the xsl:transform
element, which behaves exactly the same way. Which you use is a matter of personal
preference. The XSLT namespace is http://www.w3.org/1999/XSL/Transform
. The conventional namespace prefix is xsl
, but any prefix can be used—provided that it binds to the XSLT namespace URI.
See Chapter 4, “Elements”, for a classification of all of XSLT's elements and where they can occur within a stylesheet.
Processing Model
All XSLT processing consists of iterations over lists of nodes. At any given point in the execution of a stylesheet there is a current node for that iteration, and there is a current node list being iterated over. The current node list is the ordered list of nodes being iterated over, and the current node is a member of that list.
There are two mechanisms for iterating over a list of nodes: xsl:apply-templates
and xsl:for-each
. xsl:apply-templates
is an XSLT instruction that iterates over a given node-set. It invokes the best-matching
template rule for each of the nodes in the node-set. For example, here is an instruction that iterates
over the node-set returned by the expression foo
:
<xsl:apply-templates select="foo"/>
The value of the select
attribute is an XPath expression that must evaluate to a node-set. The nodes in that
set populate the current node list, sorting themselves in document order. Then, for
each node in the list, the XSLT processor invokes the best-matching template rule.
Regardless of what your stylesheet contains, XSLT processing always begins with a virtual call to:
<xsl:apply-templates select="/"/>
This sets the current node list to a list containing only one node—the root node of the source tree. It then invokes the template rule that matches the root node. This virtual call constructs the entire result tree, which, after all, is the point of executing a stylesheet. Your job as an XSLT stylesheet author is to define—using template rules—what happens when the XSLT processor executes this instruction.
Now, let's define what a template rule is.
Template Rules
An XSLT stylesheet contains a set of template rules. Broadly speaking, there are two kinds of template rules:
- Those you define
- Those that XSLT defines for you, a.k.a. the built-in template rules
XSLT defines a built-in template rule for each of the seven types of node. This ensures
that any call to xsl:apply-templates
will never fail to find a matching template rule for the current node, even if your
stylesheet contains no explicit template rules at all (an empty stylesheet). We'll
define exactly what the built-in template rules are in the Built-In Template Rules section later in this chapter.
The template rules that you define are explicitly present in your stylesheet. They are xsl:template
elements that have a match
attribute. Example 1 shows a template rule that matches any foo
element.
<xsl:template match="foo"> <!-- construct part of the result tree --> <xsl:apply-templates/> ... </xsl:template>
The value of the match
attribute is an XSLT pattern. Unlike an XPath expression, a pattern is not concerned with selecting a set of nodes
from a given context. Instead, it has a more passive role. The above template rule
effectively announces “I know how to process foo
elements.” It will only get invoked when the node being iterated over is a foo
element. We'll look more closely at how patterns are interpreted in the upcoming
section Patterns.
Applying Template Rules
Whenever xsl:apply-templates
is called, the XSLT processor examines the patterns of all the stylesheet's template
rules. For each node being iterated over, it first finds all the template rules with
patterns that match the node and then instantiates the best-matching template rule among them.
xsl:apply-templates
iterate over a node-set in an order other than document order. See the section “xsl:sort”
in Chapter 4.The template rule we saw in Example 1 contained an xsl:apply-templates
element. There are two things worth noting about this. First, its presence illustrates
the recursive nature of XSLT processing. Starting from the first virtual call to <xsl:apply-templates select="/"/>
, the recursion continues as long as the xsl:apply-templates
instruction appears inside an instantiated template rule and is given a non-empty
list of nodes to process. This process continues until there are no more nodes to
process. At that point, the entire result tree has been constructed.
The second thing worth noting is that the select
attribute on xsl:apply-templates
is optional:
<xsl:apply-templates/>
When absent, it is short for:
<xsl:apply-templates select="node()"/>
This instruction applies templates to the child nodes of the current node. In other
words, it populates the current node list with the set of nodes returned by the node()
expression, and it iterates over them in document order, invoking the best-matching
template rule for each node.
xsl:apply-templates
is like a function that iterates over a list of objects (nodes) and, for each object,
calls the same polymorphic function. Each template rule in your stylesheet defines
a different implementation of that single polymorphic function. Which implementation
is chosen depends on the runtime characteristics of the object (node). Loosely speaking,
you define all the potential bindings by associating a “type” (pattern) with each
implementation (template rule).Patterns
XSLT patterns appear most commonly inside the xsl:template
element's match
attribute. (They also appear in certain attributes of the xsl:key
and xsl:number
elements.) The allowed syntax for a pattern is a subset of the allowed syntax for
XPath expressions (see Chapter 2, “The XPath Language”). In other words, every pattern
is also a syntactically valid expression, but not every expression is a valid pattern.
A pattern consists of one or more location path patterns separated by |
. A location path pattern is a location path that exclusively uses either the child
or attribute axis in each of its steps. The //
operator (in its abbreviated form only) can also be used. Finally, a location path
pattern can also start with an id()
or key()
function call with a literal string argument.
The following table lists several example patterns and what they match.
Pattern | What it matches |
---|---|
/ |
the root node |
/doc[@format='simple'] |
the root element only if its name is doc and it has a format attribute with the value simple |
bar |
any bar element |
foo/bar |
any bar element whose parent is a foo element |
id('xyz')/foo |
any foo element whose parent is an element that has an ID-typed attribute with the value
xyz |
section//para |
any para element that has a section element ancestor |
@foo |
any attribute named foo |
@* |
any attribute |
node() |
any child node (i.e., element, text, comment, or processing instruction) |
text() |
any text node |
* |
any element |
xyz:* |
any element in the namespace designated by the xyz prefix |
*[not(self::xyz:*)] |
any element that is not in the namespace designated by the xyz prefix |
para[2] |
any para element that is the second para child of its parent |
para[last()] |
any para element that is the last para child of its parent |
Whether a given node matches a pattern may be intuitive, but the precise definition is this:
A node matches a pattern if the node is a member of the result of evaluating the pattern as an expression with respect to some possible context node.
Because patterns allow only downward-looking axes (child, attribute, and //
), the “possible context node” will always be one of the node's ancestors (or the
node itself in the case of the pattern “/
”).
When a pattern consists of more than one location path pattern separated by |
, the location path patterns are treated as alternatives. A node matches the pattern
if it matches any of the alternatives. The upshot of this is that two or more template
rules that have the same content can be syntactically combined into one xsl:template
element, simply by putting their match
values together and separating them with |
. For example, this:
<xsl:template match="foo | bar"> <hi/> </xsl:template>
is short for this:
<xsl:template match="foo"> <hi/> </xsl:template> <xsl:template match="bar"> <hi/> </xsl:template>
Conflict Resolution for Template Rules
When a given node matches the patterns of more than one template rule, the XSLT processor decides which template rule to instantiate according to its rules for conflict resolution. For example, it is quite common to have a stylesheet that includes two template rules like these:
<xsl:template match="foo"> <!-- ... --> </xsl:template> <xsl:template match="*"> <!-- ... --> </xsl:template>
The first template rule matches foo
elements. The second matches any element. That means that a foo
element will match both template rules. But the XSLT processor has to pick only one
of them. Assuming that the stylesheet containing these rules isn't imported into another
stylesheet that overrides them, the XSLT processor will pick the template rule with
the foo
pattern. Based on a comparison of the patterns *
and foo
, it determines that foo
has higher priority. Generally speaking, the more specific a pattern is, the higher priority it has.
We'll describe exactly how priority is determined in the next section, Priority.
However, before patterns are ever examined for their relative priority, the XSLT processor first eliminates all matching template rules that have lower import precedence. Basically, template rules in an imported stylesheet have lower import precedence than template rules in the importing stylesheet. For the precise rules on how import precedence is determined, see the “xsl:import” section in Chapter 4.
Thus, there are two steps in this process of elimination:
- The XSLT processor eliminates rules with lower import precedence.
- Among the remaining template rules, the XSLT processor eliminates the rules with lower priority.
It is an error if there is more than one template rule left. If that happens, the XSLT processor can either signal the error or recover by invoking the matching template rule that occurs last in the stylesheet. Most processors will at least give a warning if this happens.
Priority
A template rule can explicitly specify its priority using the optional priority
attribute on the xsl:template
element. The value of the priority
attribute may be any decimal number—positive or negative. The higher the number in
the priority
attribute, the higher the priority of the template rule.
If the priority
attribute is absent (which is most often the case), then the template rule assumes
a default priority based on the format of its pattern, i.e., the format of the match
attribute's value. If the pattern consists of multiple location path patterns separated
by |
, then the multiple alternatives are considered to be separate template rules for
purposes of assigning a default priority. There are four default priority values:
-.5
, -.25
, 0
, and .5
. All location path patterns can be classified into one of these four default priority
values, as shown in the following table.
Default priority | Format of location path pattern | Examples |
---|---|---|
-.5 |
Name test wildcard (any name), or node type test (regardless of name) |
* @* node() text() comment() processing-instruction() |
-.25 |
Namespace-qualified wildcard (regardless of local name) |
xyz:* @xyz:* |
0 |
Name test for a particular name, or processing-instruction( Literal) |
foo xyz:foo @foo @xyz:foo processing-instruction('foo') |
.5 |
Any other location path pattern. In other words, any one that includes any of these
operators: / , // , or [] |
/ /foo foo/bar foo[2] /foo[bar='bat'] |
Modes
Both the xsl:template
and xsl:apply-templates
elements can have an optional mode
attribute. Modes let you partition sets of template rules into different independent
scopes. For example, this instruction will only consider template rules associated
with the foo
mode:
<xsl:apply-templates mode="foo"/>
And here is an example template rule that's associated with the foo
mode:
<xsl:template match="*" mode="foo"> <!-- ... --> </xsl:template>
If you leave the mode
attribute off of xsl:apply-templates
, then only the template rules that have no mode
attribute will be considered. This is considered to be the unnamed, or default, mode.
Modes effectively allow you to define different template rules for the same node. In other words, you can process the same node two different times and do something different each time. A common use case for modes is generating a table of contents. Most of your template rules in a stylesheet might be concerned with generating the document content (headings, paragraphs, etc.) like this:
<xsl:template match="heading"> <h1> <xsl:value-of select="."/> </h1> </xsl:template>
However, to generate entries for a table of contents, you could define corresponding
template rules in the toc
mode:
<xsl:template match="heading" mode="toc"> <li> <xsl:value-of select="."/> </li> </xsl:template>
For heading
elements, <xsl:apply-templates/>
will generate h1
elements, and <xsl:apply-templates mode="toc"/>
will generate li
elements.
xsl:apply-templates
. (See the second Tip in the Applying Template Rules section earlier in this chapter.) When mode="foo"
is set, foo
acts as the name of a polymorphic function, and each template rule with mode="foo"
defines an implementation of the foo
“function”.Built-In Template Rules
By definition, the built-in template rules have lower import precedence than any template rules that you explicitly define. Thus, explicit template rules always override built-in template rules. The built-in rules come in handy when you don't specify an explicit template rule to match a particular node.
The built-in template rule for root nodes and element nodes is to apply templates to children. The explicit formulation of this rule is:
<xsl:template match="/ | *"> <xsl:apply-templates/> </xsl:template>
<xsl:template match="/">
.... This allows you to take control of processing right off the bat; however, it
isn't required. Instead, you could rely on the built-in template rule for root nodes
and elements to recursively apply templates until they reach a node for which you
have defined an explicit template rule.For each mode that's used in a stylesheet, XSLT also automatically defines an equivalent built-in template rule for root nodes and elements that automates continued processing of children within the same mode:
<xsl:template match="/ | *" mode="mode-name"> <xsl:apply-templates mode="mode-name"/> </xsl:template>
The built-in template rule for text nodes and attribute nodes is to create a text node with the string-value of the node. The explicit formulation of this rule is:
<xsl:template match="text() | @*"> <xsl:value-of select="."/> </xsl:template>
xsl:stylesheet
element that contains no explicit template rules. The result tree consists of one
large text node—a concatenation of all text nodes in the source tree.The built-in template rule for processing instructions and comments is to do nothing:
<xsl:template match="processing-instruction() | comment()"/>
The built-in template rule for namespace nodes is also to do nothing. Since no pattern can match a namespace node, there is no explicit formulation of this rule, and it cannot be overridden.
Template Rule Content
When an XSLT processor invokes a template rule, it instantiates the contents of the template rule, thereby constructing part of the result tree.
The content of the xsl:template
element (following zero or more optional xsl:param
elements) is a “template” for constructing part of the result tree. This “template”
can contain both elements and text. Elements in the XSLT namespace are called instructions, elements in an extension namespace are called extension elements, and elements in any other namespace (or no namespace) are called literal result elements.
A text node acts as an instruction to create a corresponding text node in the result tree. In other words, text nodes in the stylesheet are copied to the result tree automatically.
Comments and processing instructions in the stylesheet are ignored. To create those, you must use the corresponding XSLT instruction for doing so.
Literal Result Elements
A literal result element acts as an instruction to construct an element node with the same name in the result tree. The XSLT processor effectively creates a shallow copy of the literal result element from the stylesheet and inserts it into the result tree at the location within the result tree that is currently being constructed.
Attributes that appear on literal result elements, except for attributes in the XSLT
namespace, are also copied to the result tree, attached to the corresponding element
in the result tree. For example, this template rule creates an order
element with a num
attribute:
<xsl:template match="..."> <order num="123-987"> <!-- ... --> </order> </xsl:template>
Each time this template rule gets instantiated, the order
element is copied shallowly to the result tree along with its num
attribute. The content of the order
element in the result tree is the result of instantiating the content of the order
element in the stylesheet.
Attribute Value Templates
Attributes on literal result elements are interpreted as attribute value templates (AVTs). This means that you can use curly braces ({
...}
) to insert a dynamically computed value into the attribute value. For example, here
is a modification of the previous example:
<order num="{$prefix}-987"> <!-- ... --> </order>
The curly braces within the attribute delimit an XPath expression evaluated in the
current XSLT context. In this case, the expression is a variable reference. The $prefix
expression is evaluated, and in place of {$prefix}
, the value of the expression after converting it to a string appears in the result.
For example, if $prefix
evaluates to the string (or number) 555
, then the result would look like this: <order name="555-987"/>
.
In addition to the attributes of literal result elements, some attributes of elements
in the XSLT namespace are interpreted as AVTs. In other words, the curly braces ({
...}
) have the special significance just illustrated. In either case, if you want to include
an actual brace character in the resulting attribute value, you can escape it by repeating
the brace. In an AVT context, {{
is the escape sequence for {
, and }}
is the escape sequence for }
.
How XPath Context Is Initialized
Many XSLT instructions have XPath expressions in their attributes, e.g., the respective
select
attributes of xsl:value-of
, xsl:copy-of
, xsl:for-each
, and xsl:apply-templates
. XPath expressions may also, of course, appear inside attribute value templates.
As far as XSLT processing goes, an XPath expression is a black box that yields a value—a
node-set, number, string, boolean, or result tree fragment. However, as noted in the
previous chapter, all XPath expressions are evaluated in a context. The current node
and current node list supply an important part of that context, as shown in the following
table.
Context component | Set to: |
---|---|
Context node | The current node |
Context size | The number of nodes in the current node list (1 or greater) |
Context position | The position of the current node in the current node list (1 or greater) |
Namespace declarations | The namespace declarations in scope for the element whose attribute contains the expression (excluding any default namespace declarations) |
Variable bindings | The variable bindings in scope for the element whose attribute contains the expression |
Function library | The built-in XPath/XSLT functions, in addition to any extension functions that are available |
The current node and current node list remain the same throughout the content of a
given template rule—with one important exception. XSLT's other mechanism for iterating
over a list of nodes, the xsl:for-each
instruction, also changes the current node and current node list. Like xsl:apply-templates
, xsl:for-each
iterates over a given node-set in document order (by default). But rather than dispatching
the behavior for each node to a template rule, it instantiates the content of the
xsl:for-each
element itself—the same content for every node in the list.
For example, the following template rule includes several relative XPath expressions. The context node for each expression depends on what the current node is in XSLT processing:
<xsl:template match="order"><!-- current node is an "order" element -->
<p>Order: <xsl:value-of select="number"/></p> <xsl:for-each select="item"><!-- current node changes to an "item" element -->
<p>Item: <xsl:value-of select="name"/></p> </xsl:for-each><!-- current node changes back to the "order" element -->
<p>Total: <xsl:value-of select="total"/></p> </xsl:template>
The number
, item
, and total
expressions are evaluated with an order
element as the context node. However, the expression name
is evaluated with an item
element as the context node. That's because the current node, and thus the XPath
context node, changes as processing enters the xsl:for-each
instruction and changes back after it completes. Thus, the document that this template
rule is designed to process probably has a structure like this:
<order> <number>123</number> <total>$34.95</total> <item> <name>Widget</name> </item> <item> <name>Dingbat</name> </item> ... </order>
“Current node” and “context node” refer to the same node, except inside predicates.
Inside a predicate, the context node changes for each evaluation of the predicate
expression. “Current node”, however, is an XSLT term and refers to the outer context
node of the entire expression. XSLT provides a function specifically for the purpose
of accessing the current node from inside a predicate expression. See the current()
function in Chapter 5, “Functions”.
Whitespace Stripping
Whitespace-only text nodes in an XSLT stylesheet are considered insignificant and
are stripped from the stylesheet tree before XSLT processing begins—except when they
occur inside xsl:text
elements or elements with the declaration xml:space="preserve"
. See the “xsl:text” section in the next chapter.
Whitespace stripping is also an optional process that can be applied to the source tree before XSLT processing begins. By default, unlike the stylesheet tree, all whitespace is preserved in the source tree. See the sections “xsl:strip-space” and “xsl:preserve-space” in the next chapter.
Serializing the Result Tree
XSLT processing is primarily concerned with constructing a result tree. Serialization
involves converting that result tree to an actual XML stream or file. The xsl:output
element is a top-level element that lets you give hints to the XSLT processor about
how you want your result tree to be serialized. Technically, the XSLT processor is
not required to heed the hints you give it (or even to serialize the result tree at
all), but if it does heed your hints, it must follow the rules for interpreting the
xsl:output
element. See the “xsl:output” section in the next chapter.
Disabling Output Escaping
The xsl:value-of
and xsl:text
instructions have an optional attribute named disable-output-escaping
, whose value must be yes
or no
. The default value is no
. When the value is yes
, the XSLT processor disables the normal escaping of markup characters in the value
of the text node when it serializes the result. For example, consider this instruction:
<xsl:text disable-output-escaping="yes"><</xsl:text>
The above instruction will output a literal <
character in the result instead of its normal escaped representation (<
).
You should rarely, if ever, use the disable-output-escaping
attribute. Quoting the XSLT recommendation itself:
Since disabling output escaping may not work with all XSLT processors and can result in XML that is not well-formed, it should be used only when there is no alternative.
XSLT Elements by Use Case
Chapter 4, “Elements”, contains a reference for all of the XSLT elements. The following table shows a list of general programming use cases and the corresponding XSLT elements that you should refer to in that chapter. If you don't already know what you're looking for, this table can serve as a map.
Use case | Relevant XSLT elements |
---|---|
Creating nodes | xsl:element , xsl:attribute , xsl:text , xsl:comment , xsl:processing-instruction |
Copying nodes | xsl:copy-of , xsl:copy |
Repetition (looping) | xsl:for-each |
Sorting | xsl:sort |
Conditional processing | xsl:choose , xsl:if |
Computing or extracting a value | xsl:value-of |
Defining variables and parameters | xsl:variable , xsl:param |
Defining and calling subprocedures (named templates) | xsl:template , xsl:call-template |
Defining and applying template rules | xsl:template , xsl:apply-templates , xsl:apply-imports |
Numbering and number formatting | xsl:number , xsl:decimal-format |
Debugging | xsl:message |
Combining stylesheets (modularization) | xsl:import , xsl:include |
Compatibility | xsl:fallback |
Building lookup indexes | xsl:key |
XSLT code generation | xsl:namespace-alias |
Output formatting | xsl:output |
Whitespace stripping | xsl:strip-space , xsl:preserve-space |