XQuery: Reinventing the Wheel?
Abstract
There is a tremendous amount of overlap in the functionality provided by XQuery, the newly drafted XML query language published by the W3C, and that provided by XSLT. The recommendation of two separate languages, one for XML query and one for XML transformations, if they don't have some sort of common base, may cause confusion as to which language should be used for various applications. Despite certain limitations, XSLT as it currently stands may function well as an XML query language. In any case, the development of an XML query language should be informed by XSLT.
Table of Contents
- Introduction
- Deconstructing XQuery
- Using XSLT as a query language
- Advantages of using XSLT as a query language
- Conclusion
- References
The proliferation of XML[1] as a data interchange format and document format is creating new problems and opportunities in the field of information retrieval. While much of the world's information is housed in relational database management systems, not all information is able to fit within the confines of the relational data model. XML's hierarchical structure provides a unified format for data-centric information, document-centric information, and information that blurs the distinction between data and documents. Accordingly, a data model for XML could provide a unified way of viewing information, whether that information is actually stored as XML or not. Access to, extraction from, and manipulation of this information together comprise the problem of an XML query language.
This paper explores some issues, advantages, and disadvantages of using XSLT[2] as a query language for XML. It attempts to show that the basic building blocks of an XML query language can be found in XSLT, by way of an introduction to and comparison with XQuery[3], the newly drafted XML query language published by the W3C XML Query Working Group. This paper is not a proposal for a specific implementation.
XQuery is the newly drafted W3C XML Query language. This section serves as an introduction to XQuery for XSLT users, as well as a comparison between the two languages.
All of the XQuery examples in this section are taken directly from chapter 2 of the XQuery working draft[3]. Each of the headings in this section, beginning with "Path expressions," are also borrowed from this work. To get the most out of this section, the reader is strongly encouraged to consult the respective sections in the XQuery working draft in parallel, for a fuller explanation of each of the XQuery constructs demonstrated here.
Following each XQuery example is an equivalent query expressed in XSLT. Commentary is interspersed between the examples to highlight key differences and similarities between the two languages.
XQuery's path expression syntax might be called an "extended subset" of XPath[4], where this oxymoron describes the fact that, while similar to XPath and defined in terms of XPath, it is decidedly not XPath. One of the most notable differences is that XQuery's expressions do not return node-sets, but instead return "ordered forests" of nodes. XPath's node-sets are proper sets in that they are not ordered. However, the evaluation context of XPath expressions requires the entire document to be accessible to enable operations on document order and traversal of the XPath axes. XQuery, on the other hand, only supports axes that correspond to the axes available when using XPath's abbreviated syntax. This means, essentially, that XQuery path expressions support something similar to XPath's child, descendant, and attribute axes. The nodes returned by an XQuery path expression are ordered so that the original document need not be subsequently consulted for operations using document order.
The first six examples below compare XQuery's path expression language to XPath.
Query 1 shows that the XQuery path expression language is very much like that used in XSLT. In fact, it even borrows the document() function from XSLT for selecting the root nodes of documents by URI.
Example 1. (Q1) In the second chapter of the document named "zoo.xml", find the figure(s) with caption "Tree Frogs".
document("zoo.xml")/chapter[2]//figure[caption = "Tree Frogs"]
Query 2 shows one of the extensions to the XPath syntax provided in the XQuery path expression language. This extension had its origin in XQL[5], which, when it was defined, was an extension not to XPath but to the "XSL pattern" syntax, XPath's predecessor. Apart from its conciseness in comparison to the XPath equivalent, it adds nothing.
Query 3 introduces XQuery's syntax for dereferencing ID references which is provided as a replacement of XPath's id() function. Apart from its purportedly better readability, it too adds nothing. Note, however, that the -> syntax includes an implicit name test; it will return only those referenced elements that are named "fig", whereas id() will return all referenced elements, regardless of name. Thus, in the XSLT equivalents below, I've included an explicit name test on the self axis.
It's not clear in the XQuery draft whether the -> syntax will operate only on IDREF and IDREFS, or values of any type (as is the case with XPath).
Query 4 seeks to demonstrate the better readability of the -> syntax. In addition to the straight XPath equivalent, I've included an XPath version that uses a variable reference, in order to show that better readability can be attained without deviating from the XPath standard. Note that variable bindings are part of the XPath specification[4].
The XPath equivalents here are admittedly not as readable as the XQuery example, but that is primarily due to the inclusion of the [self::emp] predicate. The added value of an automatic name test for referenced elements is debatable. In any case, this functionality is already available in standard XPath.
Example 7. (Q4) List the names of the second-level managers of all employees whose rating is "Poor".
/emp[rating = "Poor"]/@mgr->emp/@mgr->emp/name
Example 8. XPath equivalent to (Q4)
id(id(/emp[rating = "Poor"]/@mgr)[self::emp]/@mgr)[self::emp]/name Or, for better readability, use this: id($poorEmpManagers/@mgr)[self::emp]/name ...in conjunction with a variable binding, shown here in XSLT: <xsl:variable name="poorEmpManagers" select="id(/emp[rating = 'Poor']/@mgr)[self::emp]"/>
Query 5 demonstrates how namespace prefix mappings may be used with XQuery's path expression language. Note that this is perfectly compatible with XPath, whose evaluation context includes a set of namespace prefix mappings. XSLT's approach, for better or worse, is to use the in-scope XML namespace declarations as its set of mappings.
Query 6 shows that a default namespace can be defined for use in path expressions. Users of XSLT may find this particularly surprising, since in XSLT the default namespace is not used to evaluate XPath expressions.
Query 7 introduces XQuery's "element constructor" mechanism for constructing arbitrary XML. Clearly, element constructors are much like XSLT's literal result elements. Likewise, inline variable references in XQuery are interpreted in almost the exact same way as XSLT's xsl:copy-of instruction in conjunction with a variable reference.
The XQuery syntax is not quite XML; it is only designed to look like XML. Readability is arguably easier to attain in XQuery, because it has more syntactic flexibility. Note, however, that XSLT, at least in this example, does a pretty good job of attaining readability, even within the confines of XML syntax (thanks in large part to attribute value templates).
Query 8 shows that the names of result elements may also be dynamically constructed. The equivalent XSLT instruction used to accomplish this is xsl:element.
Query 9 introduces XQuery's FLWR (pronounced "flower") expression. The FOR statement creates a collection of variable bindings, one for each node in the "ordered forest" returned by the path expression. For each of these bindings, the subsequent statements are executed. The WHERE clause filters the nodes in the collection. RETURN constructs the result.
The XSLT equivalent is similar; there's even a rough one-to-one mapping between the XQuery constructs and the XSLT instructions used. xsl:for-each corresponds to FOR, xsl:if corresponds to WHERE, and xsl:copy-of corresponds to RETURN. Note that in both the XQuery and XSLT versions, this particular query could be formed differently. Instead of using WHERE or xsl:if, an XPath predicate could be used.
This correspondence is rough, because slightly different processing models are used. XSLT does not need to use variables here, because its notion of context includes a current node and current node list. XQuery needs the use of variables here because it retains no such context. Also, the "ordered forest" returned by the path expression in the FOR statement is what dictates the order of the result. In XSLT, the XPath expression returns a node-set rather than a list. xsl:for-each then operates on that node-set using document order. This distinction may seem somewhat academic, but it greatly simplifies the concept of node-sets, which conveniently inherits the semantics of sets. For a given node-set, the user doesn't need to worry about order until certain kinds of operations are performed. Order is not part of the set; it is applied to the set. In any case, XQuery's "ordered forest" is not consistent with the semantics of XPath.
Since all of the remaining XQuery examples stand alone as queries, the equivalent query in XSLT will be expressed as a full stand-alone stylesheet. The use of xsl:transform as opposed to xsl:stylesheet is deliberate.
Example 17. (Q9) List the titles of books published by Morgan Kaufmann in 1998.
FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title
Example 18. XSLT equivalent to (Q9)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:for-each select="document('bib.xml')//book"> <xsl:if test="publisher='Morgan Kaufmann' and year='1998'"> <xsl:copy-of select="title"/> </xsl:if> </xsl:for-each> </xsl:template> </xsl:transform>
Query 10 introduces the "L" in "FLWR". LET binds a variable to a value, which may be a scalar or a collection. In this case, it is bound to a number. The use of a LET statement below a FOR clause is akin, in XSLT, to declaring a variable inside the body of xsl:for-each. The equivalent XSLT query shown below does precisely this (using two variables only for convenience).
XQuery's distinct() function filters out all duplicates, where “two elements are considered to be duplicates if their values (including name, attributes, and normalized content) are equal.”[3] Precisely what this means is unclear. Also, the distinct() function is said to return an "unordered set of elements." Assuming that the path expression in the FOR statement is what determines the order of the result, the resulting order when using the distinct() function is apparently undefined. This too is unclear. XQuery is a work in progress; accordingly, it should not be criticized too heavily for underspecification. However, I believe there is a greater ground for criticism to the extent that XQuery deviates from established W3C standards, regarding, for example, what is returned by an XPath expression.
In XPath, the equals (=) operator compares the string-value of nodes. Assuming this approximates an XQuery comparison between duplicates, the XSLT query shown below filters duplicates by selecting the first node in document order for each unique value found. The problem of finding unique values is essentially what's known as the grouping problem. XSLT lacks an intuitive way of filtering duplicates. Thus, while it is possible to accomplish this in XSLT, it is not very convenient. This problem is a well-known and common one, and a solution to it is among the requirements listed in the XSLT 2.0 requirements working draft[6].
The lack of an avg() function in XSLT is another convenience issue. It is certainly possible to compute the average of a set of values, by using sum() and count(). While this is slightly less convenient than if an avg() function were available, it is certainly more intuitive than the XSLT solution to the grouping problem. In any case, the requirement for an extended set of aggregation functions is included in the XPath 2.0 requirements working draft[7].
Example 19. (Q10) List each publisher and the average price of its books.
FOR $p IN distinct(document("bib.xml")//publisher) LET $a := avg(document("bib.xml") /book[publisher = $p]/price) RETURN <publisher> <name> $p/text() </name> , <avgprice> $a </avgprice> </publisher>
Example 20. XSLT equivalent to (Q10)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:for-each select="document('bib.xml')//publisher[not(.=preceding::publisher)]"> <xsl:variable name="prices" select="document('bib.xml')/book[publisher=current()]/price"/> <xsl:variable name="avgPrice" select="sum($prices) div count($prices)"/> <publisher> <name><xsl:value-of select="."/></name> <avgprice><xsl:value-of select="$avgPrice"/></avgprice> </publisher> </xsl:for-each> </xsl:template> </xsl:transform>
Queries 11 and 12 introduce the ability to nest a FLWR expression within an element constructor, in order to return a well-formed XML document.
Again, in these examples, there is a close one-to-one correspondence between XSLT instructions and XQuery clauses. In fact, the XSLT queries almost look like possible formulations of XQuery into XML syntax. When the difference in appearance between two languages is this small, one has to wonder whether they should be promoted as separate languages (one for query and one for transformation), or whether they should build from a common base, both semantically and syntactically.
Note that the optional simplified syntax for XSLT is used here, specified in 2.3 Literal Result Element as Stylesheet in the XSLT recommendation[2]. In all of the queries where it is possible to do so, I will use this syntax in order to 1)illustrate the possible conciseness of XSLT, and 2)make explicit the fact that template rules are not being used.
Many of the example queries cannot fit within this syntax, not because they use template rules, but because the specified result is an XML fragment, rather than a well-formed document containing only one root element. The simplified syntax can only specify a result tree having one root element.
Example 21. (Q11) List the publishers who have published more than 100 books.
<big_publishers> FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p </big_publishers>
Example 22. XSLT equivalent to (Q11)
<big_publishers xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:for-each select="document('bib.xml')//publisher[not(.=preceding::publisher)]"> <xsl:variable name="b" select="document('bib.xml')/book[publisher=current()]"/> <xsl:if test="count($b) > 100"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> </big_publishers>
Example 23. (Q12) Invert the structure of the input document so that, instead of each book element containing a list of authors, each distinct author element contains a list of book-titles.
<author_list> FOR $a IN distinct(document("bib.xml")//author) RETURN <author> <name> $a/text() </name>, FOR $b IN document("bib.xml")//book[author = $a] RETURN $b/title </author> </author_list>
Example 24. XSLT equivalent to (Q12)
<author_list xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:for-each select="document('bib.xml')//author[not(.=preceding::author)]"> <author> <name><xsl:value-of select="."/></name> <xsl:for-each select="document('bib.xml')//book[author=current()]"> <xsl:copy-of select="title"/> </xsl:for-each> </author> </xsl:for-each> </author_list>
Queries 13 and 14 show convenient uses of XQuery's variable binding mechanism. Once again, the XSLT equivalents primarily differ in mere syntax.
Example 25. (Q13) For each book whose price is greater than the average price, return the title of the book and the amount by which the book's price exceeds the average price.
<result> LET $a := avg(//book/price) FOR $b IN /book WHERE $b/price > $a RETURN <expensive_book> $b/title , <price_difference> $b/price - $a </price_difference> </expensive_book> </result>
Example 26. XSLT equivalent to (Q13)
<result xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:variable name="avgPrice" select="sum(//book/price) div count(//book/price)"/> <xsl:for-each select="/bib/book[price > $avgPrice]"> <expensive_book> <xsl:copy-of select="title"/> <price_difference> <xsl:value-of select="price - $avgPrice"/> </price_difference> </expensive_book> </xsl:for-each> </result>
Example 27. (Q14) Variable $e is bound to some element with numeric content. Construct a new element having the same name and attributes as $e, and with numeric content equal to twice the content of $e.
LET $tagname := name($e) RETURN <$tagname> $e/@*, -- replicates the attributes of $e 2 * number($e) </$tagname>
Example 28. XSLT equivalent to (Q14)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:variable name="e" select="/foo"/> <xsl:variable name="tagname" select="name($e)"/> <xsl:template match="/"> <xsl:element name="{$tagname}"> <xsl:copy-of select="$e/@*"/> <xsl:value-of select="2 * $e"/> </xsl:element> </xsl:template> </xsl:transform>
Query 15 introduces the SORTBY clause. Note that sorting is handled differently in XQuery than in XSLT. The SORTBY clause is applied after an intermediate result has been constructed. To do this in XSLT would require processing a constructed tree as a node-set, using either a node-set() extension function or XSLT 1.1[8]. Sorting in XSLT is applied to a selected node-set before a result is constructed for each of the nodes in that node-set. As it turns out, the example below sorts on values that are available before the intermediate result is constructed. Thus, though the corresponding XSLT query does not mean exactly the same thing, it will achieve the same result.
Example 29. (Q15) Make an alphabetic list of publishers. Within each publisher, make a list of books, each containing a title and a price, in descending order by price.
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
Example 30. XSLT equivalent to (Q15)
<publisher_list xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:for-each select="document('bib.xml')//publisher[not(.=preceding::publisher)]"> <xsl:sort select="."/> <publisher> <name><xsl:value-of select="."/></name> <xsl:for-each select="document('bib.xml')//book[publisher=current()]"> <xsl:sort select="price" order="descending"/> <book> <xsl:copy-of select="title"/> <xsl:copy-of select="price"/> </book> </xsl:for-each> </publisher> </xsl:for-each> </publisher_list>
Query 16 introduces the BEFORE and AFTER extensions to the XPath syntax. “BEFORE operates on two lists of elements and returns those elements in the first list that occur before at least one element of the second list in document order.”[3] The XPath axes for document order, irrespective of hierarchy, are preceding and following. Thus, it is possible to express this in XPath, although it is not very intuitive. The XSLT solution below uses three variables in an attempt to make it more readable.
The XSLT approach used below gets the intersection of two node-sets, namely all elements after the first incision, and all elements before the second incision. The technique used for getting the intersection of two node-sets, as shown in the binding of the $between variable, is not very intuitive. Among the requirements in the XPath 2.0 requirements working draft[7] is support for intersection and difference functions. More to the point, a future version of XPath may provide equivalents to XQuery's BEFORE and AFTER operators, although these are not currently listed in the requirements. Again, this is a matter of convenience. In addition to the matter of convenience, such operators might serve as optimization hints to an XSLT processor. This would be particularly important in the context of using XSLT as a query language.
Example 31. (Q16) Prepare a "critical sequence" report consisting of all elements that occur between the first and second incision in the first procedure.
<critical_sequence> LET $p := //procedure[1] FOR $e IN //* AFTER ($p//incision)[1] BEFORE ($p//incision)[2] RETURN shallow($e) </critical_sequence>
Example 32. XSLT equivalent to (Q16)
<critical_sequence xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:variable name="p" select="(//procedure)[1]"/> <xsl:variable name="before" select="$p//incision[2]/preceding::*"/> <xsl:variable name="after" select="$p//incision[1]/following::*"/> <xsl:variable name="between" select="$before[count(.|$after)=count($after)]"/> <xsl:for-each select="$between"> <xsl:copy/> </xsl:for-each> </critical_sequence>
Example 33. An alternative XSLT solution to (Q16) that is less general, but potentially more efficient:
<critical_sequence xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:variable name="p" select="(//procedure)[1]"/> <xsl:for-each select="$p//incision[1]/following::*[not(ancestor-or-self::incision) and count(preceding::incision) = 1]"> <xsl:copy/> </xsl:for-each> </critical_sequence>
Query 17 introduces the empty() function, which operates exactly the same on a node-set as XPath's not() function. The XSLT equivalent to this query uses the same approach as in the last query, which is to find the intersection of two node-sets.
Example 34. (Q17) Find procedures in which no anesthesia occurs before the first incision.
-- Finds potential lawsuits FOR $p in //procedure WHERE empty($p//anesthesia BEFORE ($p//incision)[1]) RETURN $p
Example 35. XSLT equivalent to (Q17)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:for-each select="//procedure"> <xsl:variable name="before" select="(.//incision)[1]/preceding::*"/> <xsl:if test="not(.//anesthesia[count(.|$before)=count($before)])"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> </xsl:template> </xsl:transform>
Query 18 shows how conditional expressions in XQuery compare with those in XSLT. Once again, they differ primarily in syntax.
Example 36. (Q18) Make a list of holdings, ordered by title. For journals, include the editor, and for all other holdings, include the author.
FOR $h IN //holding RETURN <holding> $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author </holding> SORTBY (title)
Example 37. XSLT equivalent to (Q18)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:for-each select="//holding"> <xsl:sort select="title"/> <holding> <xsl:copy-of select="title"/> <xsl:choose> <xsl:when test="@type='Journal'"> <xsl:copy-of select="editor"/> </xsl:when> <xsl:otherwise> <xsl:copy-of select="author"/> </xsl:otherwise> </xsl:choose> </holding> </xsl:for-each> </xsl:template> </xsl:transform>
Query 19 introduces XQuery's SOME operator, the "existential quantifier". Testing whether a condition applies for some node within a given node-set is natural in XPath. In the XSLT solution below, if any descendant para element contains both "sailing" and "windsurfing", then the returned node-set will be non-empty and the test will return true.
Example 38. (Q19) Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph.
FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title
Example 39. XSLT equivalent to (Q19)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:for-each select="//book"> <xsl:if test=".//para[contains(., 'sailing') and contains(., 'windsurfing')]"> <xsl:copy-of select="title"/> </xsl:if> </xsl:for-each> </xsl:template> </xsl:transform>
Query 20 introduces XQuery's EVERY operator, the "universal quantifier". Testing whether a condition applies for every node within a node-set is not quite as natural in XSLT, though it is possible. Two possible XSLT solutions are presented below (the second as a comment).
The first solution determines whether the number of para elements equals the number of para elements that contain "sailing". If so, then we know there are no para elements that do not contain "sailing". Since this also might mean that there are simply no para elements, we ensure that that is not the case by additionally testing that there is at least one of them.
The second solution tests whether any of the nodes in the node-set do not satisfy the given constraint. If any do not, then the test will return false. Since we want to determine whether or not every descendant para element contains "sailing", we test the emptiness (via not()) of the node-set consisting of para elements that do not contain "sailing". If it is empty, then we at least know there are no para elements that do not contain "sailing". Since this also might mean that there are simply no para elements, we ensure that that is not the case by additionally testing that there is at least one of them.
The current working draft of the XPath 2.0 requirements states that XPath 2.0 “Must Support Explicit "For Any" or "For All" Comparison and Equality Semantics”[7]. Thus, we can expect that XPath will support something similar to XQuery's existential and universal quantifiers.
Example 40. (Q20) Find titles of books in which sailing is mentioned in every paragraph.
FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title
Example 41. XSLT equivalent to (Q20)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:for-each select="//book"> <xsl:if test="count(.//para)=count(.//para[contains(., 'sailing')]) and .//para"> <!-- OR: "not(.//para[not(contains(., 'sailing')]) and .//para" --> <xsl:copy-of select="title"/> </xsl:if> </xsl:for-each> </xsl:template> </xsl:transform>
Query 21 introduces XQuery's filter function, which provides functionality very similar to a subset of XSLT's pattern-matching template rules. “This function takes two operands, each of which is an expression that, in general, evaluates to an ordered forest of nodes. filter returns copies of some of the nodes in the forest represented by the first operand, while preserving their hierarchic and sequential relationships.”[3] Essentially, it filters out all descendant nodes of those in the first set that aren't in the second, resulting in a tree that consists of a shallow copy of each remaining node preserved in its original hierarchy with respect to the other remaining nodes. This is exactly the sort of thing that can be done using XSLT template rules, as shown in the below XSLT solution. Note that the built-in rules for elements and text nodes are overridden in this example.
Incidentally, this problem can also be solved in XSLT without using template rules. The second XSLT solution below is arguably closer to the semantics of XQuery's filter function, although it is much more painful to look at. It uses explicit recursive processing in conjunction with a named template. This is included in order to emphasize the handiness of the implicit recursive processing of XSLT's template rules.
Example 42. (Q21) Prepare a table of contents for the document "cookbook.xml", containing nested sections and their titles.
LET $b := document("cookbook.xml") RETURN <toc> filter($b, $b//section | $b//section/title | $b//section/title/text() ) </toc>
Example 43. XSLT equivalent to (Q21)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <toc> <xsl:apply-templates select="document('cookbook.xml')/node()"/> </toc> </xsl:template> <xsl:template match="section | section/title | section/title/text()"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="* | text()"/> </xsl:transform>
Example 44. An alternative XSLT equivalent to (Q21) (no template rules)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <toc> <xsl:call-template name="filter"> <xsl:with-param name="first" select="document('cookbook.xml')"/> <xsl:with-param name="second" select="//section | //section/title | //section/title/text()"/> </xsl:call-template> </toc> </xsl:template> <xsl:template name="filter"> <xsl:param name="first"/> <xsl:param name="second"/> <xsl:for-each select="$first/node()"> <xsl:choose> <xsl:when test="count(.|$second)=count($second)"> <xsl:copy> <xsl:call-template name="FILTER"> <xsl:with-param name="first" select="."/> <xsl:with-param name="second" select="$second"/> </xsl:call-template> </xsl:copy> </xsl:when> <xsl:otherwise> <xsl:call-template name="FILTER"> <xsl:with-param name="first" select="."/> <xsl:with-param name="second" select="$second"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:template> </xsl:transform>
Query 22 introduces XQuery's facility for user-defined functions. The XSLT equivalent uses a recursive named template to achieve the same result as XQuery's recursive function. Note, however, that a recursive named template, which may function similarly to a XQuery user-defined function, is qualitatively different, because a named template always produces a tree, rather than an arbitrary node-set or arbitrary value of any data type. By processing intermediate trees as node-sets (using a node-set() extension function or XSLT 1.1[8]), this limitation can arguably be gotten around, but the fact remains that user-definable functions, apart from extension functions, are not currently supported in XSLT.
Example 45. (Q22) Find the maximum depth of the document named "partlist.xml."
NAMESPACE xsd = "http://www.w3.org/2000/10/XMLSchema-datatypes" FUNCTION depth(ELEMENT $e) RETURNS xsd:integer { -- An empty element has depth 1 -- Otherwise, add 1 to max depth of children IF empty($e/*) THEN 1 ELSE max(depth($e/*)) + 1 } depth(document("partlist.xml"))
Example 46. XSLT equivalent to (Q22)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template name="depth"> <xsl:param name="node"/> <xsl:param name="level" select="1"/> <xsl:choose> <xsl:when test="not($node/*)"> <xsl:value-of select="$level"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="depth"> <xsl:with-param name="level" select="$level + 1"/> <xsl:with-param name="node" select="$node/*"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="/"> <xsl:call-template name="depth"> <xsl:with-param name="node" select="document('partlist.xml')"/> </xsl:call-template> </xsl:template> </xsl:transform>
In the XSLT equivalent to Query 23, we are able to effect the same result as the XQuery query by leveraging the built-in recursive processing of XSLT's template rules, rather than by implementing a recursive named template. It is a matter of debate which of the two approaches below is a better route to achieving the desired result. In my opinion, the implicit, "automatic" recursive processing supplied by template rules is easier to follow than the explicit recursion used in the XQuery version.
Example 47. (Q23) In the document "company.xml", find all the elements that are reachable from the employee with serial number 12345 by child or reference connections.
FUNCTION connected(ELEMENT $e) RETURNS LIST(ELEMENT) { $e/* UNION $e/@*->* } FUNCTION reachable(ELEMENT $e) RETURNS LIST(ELEMENT) { $e UNION reachable(connected($e)) } reachable(document("company.xml")/emp[serial="12345"])
Example 48. XSLT equivalent to (Q23)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:variable name="e" select="//emp[serial='12345']"/> <xsl:apply-templates select="$e/* | id($e/@*)"/> </xsl:template> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates select="* | id(@*)"/> </xsl:copy> </xsl:template> <xsl:template match="text()"/> </xsl:transform>
One of the most significant issues with regard to XSLT's use over a large data set is that of performance and scalability. Current XSLT processors require the entire source tree to be in memory. Clearly, another approach must be taken to enable XSLT processing over a large data set. This paper does not directly address implementation approaches. Instead, it is concerned with how the semantics of XSLT might fulfill the requirements of an XML query language. I will note, however, that the use of XSLT as a query language may reverse the common paradigm of compiled stylesheets and unknown input. XSLT's implementation over a large data set may be made possible when the input to an XSLT processor is a repository consisting of pre-parsed, indexed XML.
XSLT operates on a single source tree whose data model is the same as that used by XPath[4], except with a few small additions. The most notable addition is the relaxation of what may be contained by the root node. The root node in the XPath tree model may contain more than one child, so that processing instructions and comments outside of the root element may be modeled. The XSLT tree model extends this to allow the root node to contain text nodes and multiple element nodes as well. What this means is that the source tree does not need to model a well-formed XML document that has only one root element. However, the source tree must always be a "well-formed external general parsed entity"[2]. All this means is that elements must be properly nested, special characters properly escaped, etc.
A collection of XML documents, and even XML fragments, could be modeled by an XSLT processor as one source tree by logically concatenating them all together in some order. In this framework, the root node would contain the entire repository. Each well-formed XML document in the repository would correspond to an element child of the root node. This is similar to how XQuery models a collection as an "ordered forest".
In addition to accessing the content of documents from the repository source tree, the document() function could be used to retrieve individual repository documents by name. It would return a root node containing only the particular document, rather than the source tree's root node which contains the entire repository of documents. The implementation would need to remember the distinction between documents, keeping the appropriate prologs and epilogs (processing instructions and comments occurring before and after the document element) with their respective documents. This means, of course, that, for a document to be retrieved by the document() function, it must have a name.
A more difficult problem arises when considering the desire to return documents as single units when their name is not known. For example, we may wish to retrieve the document whose root element's "foo" attribute's value is "bar". Note that this is not quite the same as retrieving the root element of that document; instead, it will return the root node that contains not only the root element, but also any processing instructions or comments outside of the root element. This could be accomplished by means of an extension function that returns a node-set containing the root nodes of documents that contain the nodes specified by a given XPath expression. For example, abc:document-of(/*[@foo='bar']) would return a node-set containing one root node for each document whose root element's foo attribute's value is bar. This would effectively enable the retrieval of multiple documents according to criteria other than their URIs.
If support for XML fragments is included in the repository, this would not work as cleanly. For example, by concatenating two XML fragments together into one source tree, what originally was two text nodes would now be interpreted as one. In that case, the semantics of something like abc:document-of(/text()[1]) would need to be defined.
In any case, the need for an extension function shows that this is a possible limitation of XSLT's use as a query language. The problem of maintaining document delimiters is not addressed by XQuery either (apart from its use of the document() function which it borrows from XSLT). For most data-oriented applications, the coupling of processing instructions and comments with their original documents is probably not terribly important. For document-oriented applications, however, the ability to retrieve the document as a whole without knowing its name might be more important. For example, when a document is retrieved, any included stylesheet processing instructions could be used to render the document as it was originally intended.
XSLT and XPath currently have no schema-aware facilities. Since one of the W3C requirements for an XML query language[9] is that it operate on datatypes, this is one of XSLT's primary limitations as a query language. The XSLT 2.0[6] and XPath 2.0[7] requirements working drafts include a requirement to support XML Schema[10] datatypes, so this limitation will apparently be overcome.
A number of XSLT's current limitations became apparent in the previous comparison between XQuery and XSLT. As was demonstrated, these are not inherent limitations of XSLT that can't be overcome. Instead, they could easily be resolved in a future version of XSLT or XPath.
One of the chief strengths of XML is that it is not a new technology. As a formulation of the most widely used subset of SGML, more or less, its usefulness has been proven over time. In a similar way, XSLT, whose concepts are based on those of its predecessor, DSSSL[11], has a good deal of history behind it. The usefulness of these basic concepts for processing documents whose structure is unexpected, flexible, and recursive, has arguably been proven over time.
In the area of XML schemas, the W3C is not the only game in town. Other proposals such as RELAX[12], TREX[13], and Schematron[14], reveal multiple ways of approaching the problem of schemas. While there is experience with DTDs to draw from, the problems current proposals are addressing extend well beyond what DTDs are able to express. The fact that there is no real experience in this area, of course, does not mean that we should not proceed. (Indeed, the need for a schema language that includes datatypes is particularly clear in the context of an XML query language.) This lack of history does, however, suggest that we may not get it right the first time. What people will require in a schema language will only emerge as they gain experience using schema languages.
I believe this will also be the case for XML query languages. The W3C XML Query Working Group is drawing from extensive experience and knowledge in XML, databases, and information retrieval, but the the concept of XML query is still in its infancy. What people will require in an XML query language will only become clear after they gain experience using XML query languages.
XSLT has been a W3C Recommendation for well over a year now. The XSLT user base is growing rapidly, as are implementations. The W3C only stands to benefit by promoting a reuse of its own technology. Insofar as XQuery leverages the XPath standard, this is good (and I recognize that its data model could be brought into harmony with XPath's). But a query language extends well beyond the selection technology provided by XPath. A query language also involves transformations--the kinds of transformations that are already enabled by XSLT.
I understand that some applications work better with different technologies. There was a great deal of controversy over XSL in its early days. The claim was that XSL provided superfluous functionality to that already provided by CSS and the DOM. At this point in time, however, it is well recognized that XSLT (particularly when considered separately from FO, which still raises controversy) is a very different kind of language, especially with respect to its declarative processing model. On the other hand, the differences between XQuery and XSLT are far too small to warrant W3C recommendation of them both. Both are declarative, functional programming languages, both operate on the same basic data model, and, insofar as an XML syntax is one of the W3C requirements for a query language, both may be expressed in XML.
At QL'98, over two years ago, the W3C XSL Working Group submitted one of the many position papers on the requirements for an XML query language[15]. It stated that a seed of a query language could be found in XSL:
The Extensible Stylesheet Language (XSL) has facilities that could serve as a basis for an XML query language. The XSL working group believes that it would be constructive for the W3C to first look in-house for technologies that might seed a W3C-endorsed query language. It is important to the working group that the W3C strive to maximize the reuse of technology within the W3C.[15]
In addition, it recognized that the problems of XML transformation and XML query are closely related:
The problems involved in querying XML are closely linked to transformation or result construction capabilities. The similarities between XML-QL and XSL suggest that these two proposals should cross-fertilize. The results of a collaboration along these lines could result in a powerful general purpose query and transformation mechanism for XML.[15]
The conclusion was that the development of an XML query language should be closely informed by and coordinated with the development of XSL:
The coordination group would either strive to ensure that a single query language meets the requirements of all working groups or that a common query model underlies all W3C query languages.[15]
In my opinion, these words are even more applicable now than when originally written. XSLT is a de facto standard, and the burden is on the W3C to respect this fact in its development of an XML query language.
If the W3C recommended one language for both XML transformations and XML query, or if two languages built from the same semantic and syntactic base, the benefit for software developers would be great. XML has already begun to enable standardization in the training of people dealing primarily with documents and people dealing primarily with data processing. A unified XML query and transformation language would take great strides to further that benefit.
The XSL Working Group's position paper at QL'98[15] explains the issues well. Below is the text of the section entitled "Why Start with XSL?":
Suppose someone has an XML document and needs to create another XML document from it. If both XSL and the query language are capable of generating XML from XML, the person has a bit of a dilemma. Either technology would suffice. Let's say the person decides to go with XSL. A few weeks or months down the road this person may find that the query language was the proper choice and now must replace all occurrences of XSL stylesheets with XML queries. Substitute "person" with "W3C working group" and it becomes easy to see the dilemmas we could be creating for the W3C in the future.
This scenario suggests that the W3C should at least attempt to ensure that it recommends compatible technologies for similar functionality. Here are some reasons for borrowing technology from XSL:
When there are fewer standards for a given task, vendor support is less divided among the standards, and vendor products are more interoperable.
The fewer technologies users have to learn, the easier and faster it is for users to learn new products, and the less time and money companies have to spend educating users.
The W3C can get a head start by starting with related technologies that it already espouses.
XSL uses separate technologies for information retrieval and document construction, which allows the information retrieval mechanism to be used in places where construction is not required.
The W3C should be able to accrue the above benefits by using XSL technologies as the foundation of the query language and by building on this foundation to satisfy requirements that exceed XSL query requirements.[15]
In the long run, the XML Query Working Group is probably doing the right thing in first formally defining the semantics of the query language. To attain the sophistication of query optimization that we currently have with SQL, an XML query language's underlying mathematics must be well understood. But these semantics should not be developed in a vacuum. However well understood a particular set of semantics is, we will not truly understand which set of semantics is useful in an XML query language until people have built real applications involving XML query. This is the reason why XSLT should be seriously addressed: it is the most widely used and implemented XML query language yet.
[1] . Extensible Markup Language (XML) 1.0. W3C Recommendation, Feb. 10, 1998. See http://www.w3.org/TR/1998/REC-xml-19980210.
[2] . XSL Transformations (XSLT) Version 1.0. W3C Recommendation, Nov. 16, 1999. See http://www.w3.org/TR/1999/REC-xslt-19991116.
[3] . XQuery: A Query Language for XML. W3C Working Draft, Feb. 15, 2001. See http://www.w3.org/TR/2001/WD-xquery-20010215.
[4] . XML Path Language (XPath) Version 1.0. W3C Recommenation, Nov. 16, 1999. See http://www.w3.org/TR/1999/REC-xpath-19991116.
[5] , , and . XML Query Language (XQL). See http://www.w3.org/TandS/QL/QL98/pp/xql.html.
[6] . XSLT Requirements Version 2.0. W3C Working Draft, Feb. 14, 2001. See http://www.w3.org/TR/2001/WD-xslt20req-20010214.
[7] . XPath Requirements Version 2.0. W3C Working Draft, Feb. 14, 2001. See http://www.w3.org/TR/2001/WD-xpath20req-20010214.
[8] . XSL Transformations (XSLT) Version 1.1. W3C Working Draft, Dec. 12, 2000. See http://www.w3.org/TR/2000/WD-xslt11-20001212/.
[9] . XML Query Requirements. W3C Working Draft, Aug. 15, 2000. See http://www.w3.org/TR/2000/WD-xmlquery-req-20000815.
[10] . XML Schema, Parts 0, 1, and 2. W3C Candidate Recommendation, Oct. 24, 2000. See http://www.w3.org/TR/xmlschema-0, 1, and 2.
[11] . ISO/IEC DIS 10179.2:1994. Information Technology - Text and Office Systems - Document Style Semantics and Specification Language (DSSSL).
[12] ISO DIS 22250-1: Regular Language Description for XML (RELAX) - Part 1: RELAX Core. See http://www.xml.gr.jp/relax.
[13] . "TREX - Tree Regular Expressions for XML". See http://thaiopensource.com/trex.
[14] "The Schematron: An XML Structure Validation Language using Patterns in Trees". See http://www.ascc.net/xml/schematron.
[15] . The Query Language Position Paper of the XSL Working Group (Draft 11/18/98). See http://www.w3.org/TandS/QL/QL98/pp/xsl-wg-position.html.