What's New in XSLT 2.0
By Evan LenzIn my previous article, I went through a brief overview of some of the features that XPath 2.0 offers over and above XPath 1.0. We saw that XPath 2.0 represents a significant increase in functionality for XSLT users. In this article, we'll take a look at some of the new features specific to XSLT 2.0, as outlined in the latest working draft. Again, this assumes that you are familiar with the basics of XSLT/XPath 1.0.
XSLT 2.0 and XPath 2.0
XSLT 2.0 goes hand in hand with XPath 2.0. The two languages are specified separately and have separate requirements documents only because XPath 2.0 is also meant to be used in contexts other than XSLT, such as XQuery 1.0. But for the purposes of XSLT users, the two are linked together. You can't use XPath 2.0 with XSLT 1.0 or XPath 1.0 with XSLT 2.0. (At least, the W3C is not currently proposing any such combination.)
A Welcome Arrival
A new version of XSLT has been heavily anticipated in the XSLT user community for some time. As is true with the first versions of many languages, it did not become clear which extensions to the language would prove to be the most important until there had been some real-world experience with it. Since November 16, 1999, when XSLT 1.0 became a recommendation, it has become quite apparent that certain areas of missing functionality are due for inclusion in the next version of the language. In this article, we'll show how XSLT 2.0 addresses four of these areas.
- Conversion of result tree fragments to node-sets
- Multiple output documents
- Built-in support for grouping
- User-defined functions (implemented in XSLT)
Death To the Result Tree Fragment!
In XSLT 1.0 the result tree fragment (RTF) type is like a node-set, but it is really
a second-class citizen. An RTF is what you get whenever you use xsl:variable
to construct a temporary tree. The problem is that you can't then use an XPath expression
to access the innards of this tree, unless you use a vendor-specific extension function,
usually called something like node-set()
, to convert the RTF into a first-class node-set (consisting of one root node). The
rationale for the RTF data type was that it would reduce implementation burden, but
since almost all existing XSLT processors provide their own version of a node-set()
extension function anyway, that consideration has become moot. In any case, the need
to overcome this limitation has been clear for some time, as it is important to be
able to break up complex transformations into sequences of simpler transformations.
If you haven't guessed already, XSLT 2.0 has shown RTFs the door. Now when you use
xsl:variable
to create a temporary tree, the value of that variable is a true node-set. Actually,
in XPath 2.0 terms, it is a true node sequence, consisting of one document node, which is XPath 2.0's name for what XPath 1.0 called a "root node". With that sequence
you can then use path expressions to drill down inside the tree, apply templates to
it, and so on, just like you would with any other source document. With XSLT 2.0,
there is no longer a need for the node-set()
extension function.
Enabling Multiple Output Documents
Another extension which many XSLT 1.0 processors provide is support for multiple output
documents. This extension has proven very useful, especially for statically generating
web sites containing multiple pages. The problem with extensions is that they aren't
standard. Each XSLT processor has a different extension element for doing this, e.g.
saxon:output
, xt:document
, etc.
XSLT 2.0 provides a standard way to output multiple documents, using the xsl:result-document
element. The following example stylesheet constructs multiple output documents, one
"principal result document" and a variable number of "secondary result documents".
The principal source document will be serialized as XHTML, and the secondary result
documents will be serialized as text.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="xhtml"/> <xsl:output method="text" name="textFormat"/> <xsl:template match="/"> <html> <head> <title>Links to text documents</title> </head> <body> <p>Here is a list of links to text files:</p> <ul> <xsl:apply-templates select="//textBlob"/> </ul> </body> </html> </xsl:template> <xsl:template match="textBlob"> <xsl:variable name="uri" select="concat('text', position(), '.txt')"/> <li> <a href="{$uri}"> <xsl:value-of select="$uri"/> </a> </li> <xsl:result-document href="{$uri}" format="textFormat"> <xsl:value-of select="."/> </xsl:result-document> </xsl:template> </xsl:stylesheet>
The href
attribute of xsl:result-document
is used to assign the URI of the corresponding output document. For many processors,
this will mean writing the document to a file with that name. The format
attribute refers to a named output definition. In this case, it points to the xsl:output
element that we appropriately named textFormat
.
Another thing worth noting from the example above is the use of the XHTML output method, newly introduced in XSLT 2.0.
Grouping Simplified
XSLT 1.0 did not include built-in support for grouping. Certain grouping problems certainly can be solved using various techniques, such as the Muenchian Method, but such solutions tend to be rather complex and verbose. One of XSLT 2.0's requirements was that it must simplify grouping. As we shall see from a simple example below, it is well on its way to meeting that goal.
An example that's used in both the Requirements document and the XSLT 2.0 working draft involves converting the list of cities in the following simple XML document,
<cities> <city name="milan" country="italy" pop="5"/> <city name="paris" country="france" pop="7"/> <city name="munich" country="germany" pop="4"/> <city name="lyon" country="france" pop="2"/> <city name="venice" country="italy" pop="1"/> </cities>
to an HTML table that groups the cities by the country they are in, as follows:
<table> <tr> <th>Country</th> <th>City List</th> <th>Population</th> </tr> <tr> <td>italy</td> <td>milan, venice</td> <td>6</td> </tr> <tr> <td>france</td> <td>paris, lyon</td> <td>9</td> </tr> <tr> <td>germany</td> <td>munich</td> <td>4</td> </tr> </table>
The difficult part of this transformation is generating the last three rows (in bold). An XSLT 1.0 solution can be seen below:
<xsl:for-each select="cities/city[not(@country = preceding::*/@country)]"> <tr> <td><xsl:value-of select="@country"/></td> <td> <xsl:for-each select="../city[@country = current()/@country]"> <xsl:value-of select="@name"/> <xsl:if test="position() != last()">, </xsl:if> </xsl:for-each> </td> <td><xsl:value-of select="sum(../city[@country = current()/@country]/@pop)"/></td> </tr> </xsl:for-each>
In the above example, we first identify the first city for each unique country, which is selected by the following XPath expression:
cities/city[not(@country = preceding::*/@country)]
Then, for each group, we need to be able to refer back to all other members of the group, in order to get the list of city names for each country as well as the total population for each country. In each case, we have some redundancy because the only way to refer to the current group is with an expression such as the following:
../city[@country = current()/@country]
This is clearly not an ideal situation, since the redundancy tends to make it rather
error-prone. Enter xsl:for-each-group
, XSLT 2.0's answer to many of your grouping problems. The following example shows
the much simpler XSLT 2.0 solution to this problem (with new features in bold):
<xsl:for-each-group select="cities/city" group-by="@country"> <tr> <td><xsl:value-of select="@country"/></td> <td> <xsl:value-of select="current-group()/@name" separator=", "/> </td> <td><xsl:value-of select="sum(current-group()/@pop)"/></td> </tr> </xsl:for-each-group>
In the above example, xsl:for-each-group
initializes the "current group" as part of the XPath evaluation context. The current
group is simply a sequence. Once we've set up our group using the group-by
attribute, we can thereafter refer to the current group using the current-group()
function. This completely eliminates the redundancy that was present in the XSLT
1.0 solution.
Note also the separator
attribute on xsl:value-of
. The mere presence of this attribute instructs the processor to output not just the
string value of the first member of the sequence (XSLT 1.0's behavior), but the string
values of all members of the sequence, in sequence order. The value of the separator
attribute is an optional string that is used as a delimiter between each string in
the output. For the sake of backward compatibility with XSLT 1.0, only the sequence's
first member's string value is output when the separator
attribute is not present.
separator
attribute is only true when XPath 1.0 compatibility mode is enabled (by using version="1.0"
in that context of the stylesheet—usually on the <xsl:stylesheet>
element). However, when the stylesheet indicates version="2.0"
, then <xsl:value-of>
will still output all nodes in the sequence, even when there's no separator
attribute. In that case, it uses the default separator value: the space character.
At the time this article was written, the default space separator hadn't been invented
yet. Watch out for this potential compatibility issue when updating 1.0 stylesheets
to 2.0.Finally, xsl:for-each-group
is able to solve different kinds of grouping problems depending on which of the three
attributes you choose from: group-by
(which we saw in action above), group-adjacent
(which enables grouping based on adjacency of nodes in document order, e.g. transforming
inline <para> elements into block <para> elements), and group-starting-with
(which groups by patterns of elements in a sequence). Examples of each of these can
be found in the latest XSLT 2.0 Working Draft in "13.3 Examples of Grouping."
group-ending-with
attribute.User-defined Functions
XSLT 2.0 introduces the ability for users to define their own functions which can
then be used in XPath expressions. This is an extremely powerful mechanism that should
prove to be very useful. Stylesheet functions, as they are called, are defined using
the xsl:function
element. This element has one required attribute, the name
attribute. It contains zero or more xsl:param
elements, followed by zero or more . xsl:variable
elements, followed by exactly one xsl:result
elementThis restricted content model may sound limiting, but you will discover that the real
power lies in the use of XPath 2.0 to define the result in the As you may recall, XPath 2.0 includes the ability to do conditional expressions (select
attribute of the xsl:result
element.if
...then
) and iterative expressions (for
...return
).
<xsl:result>
element in the final XSLT 2.0 recommendation. Instead, after the optional <xsl:param>
elements, the <xsl:function>
element may contain any arbitrary sequence constructor (i.e. essentially any sequence
of XSLT instructions). The function simply returns the sequence that results from
evaluating those instructions. Oftentimes, you'll want to use the new-in-2.0 <xsl:sequence>
element, so you can return a sequence using XPath. In the code example below, I replaced
the now-obsolete <xsl:result>
element with the correct element: <xsl:sequence>
.As the following example (taken straight from the latest working draft) shows, most
of the work is done inside the select
attribute of <xsl:sequence>
. This stylesheet invokes the user's recursively-defined function, <xsl:result>
str:reverse()
, to output the string "MAN BITES DOG
".
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:str="http://user.com/namespace" version="2.0" exclude-result-prefixes="str"> <xsl:function name="str:reverse"> <xsl:param name="sentence"/> <xsl:sequence select="if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence"/> </xsl:function> <xsl:template match="/"> <output> <xsl:value-of select="str:reverse('DOG BITES MAN')"/> </output> </xsl:template> </xsl:transform>
Other Useful Stuff
XSLT 2.0 includes a number of other useful features that we won't go into detail here. They include a mechanism for defining a default namespace for XPath expressions, the ability to use variables in match pattern predicates, named sort specifications, the ability to read external files as unparsed text, and so on.
In addition, a large part of the XSLT 2.0 specification remains to be written, particularly the material dealing with the construction and copying of W3C XML Schema-typed content. About this, the latest working draft says, "This is work in progress. Facilities for associating type information with constructed elements and attributes are likely to appear in future drafts of XSLT 2."
Getting Your Hands Dirty
For those of you who can't wait to start trying some of this stuff out, Michael Kay has released Saxon 7.0, which includes an "experimental implementation of XSLT 2.0 and XPath 2.0". It implements a number of features in the XSLT 2.0 and XPath 2.0 working drafts, with particular attention to those features that are likely the most stable. I've tested each of the examples in this article, and Saxon 7.0 executes them all as expected.
XSLT 2.0 is still very much a work in progress, so be forewarned that a number of things could change between now and the time it reaches Recommendation status. Until then, the public is encouraged to review the specification and send their comments to [email protected].
See also: What's New in XPath 2.0