Toward χίμαιραλ/superset

Background

Up to now, my approach to explore possible solutions to support JSON in XDM has been to evaluate the proposal to introduce maps in XSLT 3.0, leading to the χίμαιραλ proposal.

Finding out that in the current XSLT 3.0 Working Draft, JSON objects would be second class XDM citizens, I think that it’s worth considering other approaches and consider Jürgen Rennau’s UDL proposal has a very good starting point.

In their comments after Rennau’s presentation, John Cowan and Steven DeRose stressed out the need to forget about the serialization and focus on the data model, at least as a first step, when proposing alternative solutions.

Following their advise, I’d like to propose an update to core XDM 3.0 that would natively support JSON objects without introducing any new item kind.

Because this proposal is a chimera and because it’s a superset of the current XDM (any current XDM instance would be compatible with the updates I am proposing), I’ll call this proposal χίμαιραλ/superset.

The XDM carefully avoids to draw class models and prefers to specify nodes through nodes kinds and accessors. However, I think fair to say that elements are defined as nodes with three basic properties:

  • A mandatory name which is a QName.
  • A map of attributes which keys are QName.
  • An array of children nodes.

On the other hand, the JSON data model is composed of arrays, objects and atomic values. JSON atomic values can be numbers, strings, booleans or null. JSON key arrays can be any atomic values.

Traditional approaches to bind JSON and XML use XML element’s children to represent JSON object properties and UDL is no different in that matter.

There are obvious good reasons to do, the main one being that XML attribute values can only be atomic values while JSON object properties can also be maps or arrays.

However, the end result is that the interface mismatch resulting from binding JSON objects (which are maps) into XML children (which are arrays) is at the origin of most of the complexity of these proposals.

Proposal ( χίμαιραλ/superset)

Since XML elements already have both a map (of attributes) and an array (of children nodes), why not use these native features to bind JSON maps and arrays and just update the data model to remove the restrictions that make attribute maps and children arrays unfit for being bound respectively to JSON objects and arrays?

A JSON object would then just be bound to an XML element with attributes and without children nodes and a JSON array would be bound to an XML element with children nodes and without attribute.

This seems especially easy if we focus on the data model and postpone the (optional) definition of a serialization syntax.

First modification: elements can be anonymous

Neither JSON objects nor JSON arrays have name but XML elements have names and this name is mandatory.

To fix this issue, element names should become optional (in other words, we introduce the notion of anonymous elements).

Second modification: attribute names should also possibly be strings, booleans or null

If we bind JSON object keys on attribute names, it should be possible to use all the basic types that JSON accept for its keys.

Additionally, we may want to consider supporting other XML Schema simple types, possibly each of them.

To make this possible, the definition of the dm:node-name() accessor should be updated to return a typed values rather than a QName. This modification should concern attribute nodes at minima but for maximum homogeneity, we should probably extend that to other node types.

Third and last modification: attributes should have (optional) attributes and children

JSON object values can be objects and arrays and since objects are bound to attributes and arrays are bound to children, attributes should support both.

Mapping JSON into χίμαιραλ/superset

With these updates, binding JSON into XML become quite straightforward:

  • A JSON object is mapped into an anonymous element without children and one attribute per key/value pair.
  • A JSON array is mapped into an anonymous element without attribute and a child element per item.

Let’s take the now famous (at least on this blog) JSON snippet borrowed from the XSLT 3.0 Working Draft:

{ "accounting" : [
      { "firstName" : "John",
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary",
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                
  "sales"     : [
      { "firstName" : "Sally",
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim", 
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

Becomes:

  • Anonymous element without children and two attributes:
    • Attribute “accounting” (as a string) with no attributes and the two following children:
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “John” (as a string)
        • Attribute “lastName” (as a string) and a value “Doe” (as a string)
        • Attribute “age” (as a string) and a value 23 (as a number)
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “Mary” (as a string)
        • Attribute “lastName” (as a string) and a value “Smith” (as a string)
        • Attribute “age” (as a string) and a value 32 (as a number)
    • Attribute “sales” (as a string) with no attributes and the two following children:
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “Sally” (as a string)
        • Attribute “lastName” (as a string) and a value “Green” (as a string)
        • Attribute “age” (as a string) and a value 27 (as a number)
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “Jim” (as a string)
        • Attribute “lastName” (as a string) and a value “Galley” (as a string)
        • Attribute “age” (as a string) and a value 41 (as a number)

What do you think (comments very welcome)!

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Fleshing the XDM chimera

Note: this article is derived from the presentation I have given at Balisage (precedings). See also the slides (and video) used during this presentation.

Abstract

The XQuery and XPath Data Model 3.0 (XDM) is the kernel of the XML ecosystem. XDM had been extended with foreign item types to embrace new data sources such as JSON, taking the risk to become a chimera. This talk explores some ways to move this fundamental piece of the XML stack forward.


Motivation

Chimera (mythology): The Chimera (also Chimaera or Chimæra) (Greek: Χίμαιρα, Khimaira, from χίμαρος, khimaros, “she-goat”) was, according to Greek mythology, a monstrous fire-breathing female creature of Lycia in Asia Minor, composed of the parts of multiple animals: upon the body of a lioness with a tail that ended in a snake’s head, the head of a goat arose on her back at the center of her spine. The Chimera was one of the offspring of Typhon and Echidna and a sibling of such monsters as Cerberus and the Lernaean Hydra. The term chimera has also come to describe any mythical animal with parts taken from various animals and, more generally, an impossible or foolish fantasy.
Wikipedia
Chimera (genetics): A chimera or chimaera is a single organism (usually an animal) that is composed of two or more different populations of genetically distinct cells that originated from different zygotes involved in sexual reproduction. If the different cells have emerged from the same zygote, the organism is called a mosaic. Chimeras are formed from at least four parent cells (two fertilized eggs or early embryos fused together). Each population of cells keeps its own character and the resulting organism is a mixture of tissues.
Wikipedia

During her opening keynote at XML Prague 2012, speaking about the relation between XML, HTML, JSON and RDF, Jeni Tennison warned us against the temptation to create chimeras: [chimera are usually ugly, foolish or impossible fantasies].

The next morning, Michael Kay and Jonathan Robie came to present new features in XPath/XQuery/XSLT 3.0. A lot of these features are directly based on the XQuery and XPath Data Model 3.0 (aka XDM):

The XPath Data Model is the abstraction over which XPath expressions are evaluated. Historically, all of the items in the data model could be derived directly (nodes) or indirectly (typed values, sequences) from an XML document. However, as the XPath expression language has matured, new features have been added which require additional types of items to appear in the data model. These items have no direct XML serialization, but they are never the less part of the data model.

XDM 3.0 is composed of items from a number of different technologies:

  • Items from the XML Infoset (nodes, attributes, …)
  • Datatype information borrowed from the Post Schema Validation Infoset
  • Sequences
  • Atomic values
  • Functions that can also be used to model JSON arrays
A note Note
The feature that will be introduced to model JSON arrays is called “maps” and it will be specified as a XSLT feature in the XSLT 3.0 recommendation (not published yet). The XSLT 3.0 editor, Michael Kay has published an early version of this feature in his blog. In this paper, XDM 3.0 will refer to the XSLT 3.0 data model (the XPath 3.0 data model augmented with maps).

XDM 3.0 being a single data model composed of items from different data models, it is fair to say that it is a chimera!

Following Jeni Tennison on stage, I have tried to show that in a world where HTML 5 on one hand and JSON on the other hand are gaining traction, XML has become an ecosystem in a competitive environment and that it’s data model is a major competitive advantage.

Among other factors, the continued success of XML will thus come from its ability to seamlessly integrate other data models such as JSON.

If we follow this conclusion, we must admit that this chimera is essential to the future of XML and do our best to make it elegant and smart.

XML Data Models

Whether it’s a bug or a feature could be debated endlessly, but a remarkable feature of the XML recommendation it’s all about syntax and parsing rule and does not really define a data model. The big advantage is that everyone can find pretty much what he wants in XML documents but for the sake of this paper we need to choose a well known -and well defined- data model to work on.

The most common XML data model is probably the data model defined by the trio XPath/XSLT/XQuery known as “XDM” since XPath version 2.0 and that’s the one we will choose.

XDM version 3.0, still work in progress, will be the third version of this data model. It’s important to understand its design and evolution to use its most advanced features and we’ll start our prospective by a short history of its versions.

XPath/XSLT 1.0

The XPath 1.0 data model is described as being composed of seven types of nodes (root, elements, text, attributes, namespaces, processing instructions and comments).

The XSLT 1.0 data model is defined as being the XPath 1.0 data model with:

  • Relaxed constraints on root node children to support well-formed external general parsed entities that are not well formed XML documents
  • An additional “base URI” property on every node.
  • An additional “unparsed entities” property on the root node.

It’s fair to say that these two -very close- data models are completely focused on XML, but is that all?

Not entirely and these two specifications introduce other notions that should be considered as related to the data model even if they are not described in their sections called “Data Model”…

XSLT 1.0 inadvertently mentions the four basic XPath data-types (string, number, boolean, node-set) to explicitly add a fifth one: result tree fragments”.

These four basic data-types are implicitly defined in XPath 1.0 in its section about its function library but no formal description of these types is given.

XDML 2.0: XPath 2.0/XSLT 2.0/XQuery 1.0

In version 2.0, the XDM is promoted to get its own specification.

XDM 2.0 keeps the same seven types of nodes as XPath 1.0 and integrates the additions from the XSLT 1.0 data model. A number of properties are added to these nodes to capture information that had been left outside the data model by the previous version and also to support the data-type system from the PSVI (Post Schema Validation Infoset).

The term “data-type” or simply “type” being now used to refer to XML Schema data-types, a new terminology is introduced where the data model is composed of “information items” (or items) being either XML nodes or “atomic values”.

The concept of “sequences” is also introduced. Sequences are not strictly considered as items but play a very important role in XDM. They are defined as an ordered collection of zero or more items”.

The data model is thus now composed of three different concepts:

  • nodes
  • atomic values
  • sequences

XDM 2.0 notes that an important difference between nodes and atomic values is that only nodes have identities:

Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)
XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition)

This is a crucial distinction that divides the data model into two different kind of items (those which have an identity and those which haven’t one). Let’s take an example:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <foo>5</foo>
    <foo>5</foo>
    <bar foo="5">
        <foo>5</foo>
    </bar>
</root>

The three <foo>5</foo> look similar and can be considered “deeply equal” but they are three different elements with three different identities. This is needed because some of their properties are different: the parent of the first two is <root/> while the parent of the third one is <bar/>, the preceding sibling of the second one is the first one while the first one has no preceeding sibling, …

The three “5” text nodes are similar but they still are different text nodes with different identities and this is necessary because they don’t have the same parent elements.

By contrast, the atomic values of the three <foo/> element (and the atomic value of the @foo attribute) are the same atomic value, the “5” (assuming they have all been declared with the same datatype). Among many other things, this means that when you manipulate their values, you can’t access back to the node that is holding the value).

XDM 3.0: XPath 3.0/XSLT 3.0/XQuery 3.0

A note Note
These specifications are still work on progress, currently divided between XQuery and XPath Data Model 3.0and data model extensions described in XSL Transformations (XSLT) Version 3.0.

XDM 3.0 adds functions as a third kind of items, transforming XQuery and XSLT into functional languages.

Like atomic values, functions have no identity:

Functions have no identity, cannot be compared, and have no serialization.
XQuery and XPath Data Model 3.0 – W3C Working Draft 13 December

2011

XSLT 3.0 adds to XDM 3.0 a fourth king of items: maps, derived from functions which, among many other use cases, can be used to model JSON objects:

Like atomic values and functions (from which they are derived), maps have no identity:

Like sequences, maps have no identity. It is meaningful to compare the contents of two maps, but there is no way of asking whether they are “the same map”: two maps with the same content are indistinguishable.
XSL Transformations (XSLT) Version 3.0 – W3C Working Draft 10 July 2012
A note Note
In this statement, the specification does acknowledge that sequences have no identity either. This is understandable but didn’t seem to be clearly specified elsewhere.

Of course, XSLT 3.0 is also adding functions to create, manipulate maps and serialize/deserialize them as JSON and a syntax to define map literals. It does not any new pattern to select of match maps or map entries, though.

Identity Crisis

Appolonius’ ship is a beautiful ship. Over the years it has been repaired so many times that there is not a single piece of the original materials remaining. The question is, therefore, is it really still Appolonius’ ship?
ObjectIdentity on c2.com

Object identity is often confused with mutability. The need for objects to have identities is more obvious when they are mutable, their identities being then used to track them despite their changes like Appolonius’ ship. However, XDM 3.0 gives us a good opportunity to explore the meaning and consequences of having (or not having) an identity for immutable object structures.

The definition of node identity in XDM 3.0 is directly copied from XDM 2.0:

Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)
XQuery and XPath Data Model 3.0 – W3C Working Draft 13 December 2011

I find this definition confusing:

  • Why should the value “5” as an integer be instantiated and why should we care? The value “5” as an integer is… the value “5” as an integer! It’s unique and being unique, doesn’t it have an identity?
  • A node, with all the properties defined in XDM (including its document-uri and parent accessors) would be unique if it had “previous-sibling” or “document-order” accessors.
A note Note
To find the previous siblings of a node relying only on the accessors defined in XDM (2.0 or 3.0), you’d have to access to the node’s parent and loop over it’s children until you find the current node that you would identify as such by checking its identity.

Rather than focussing on uniqueness, which for immutable information items does not really matter, a better differentiation could be between information items which have enough context information to “know where they belong” in the data model and those which don’t.

This differentiation has the benefit of highlighting the consequences of having or not having an identity: to be able to navigate between an information item and its ancestors or sibling this item must know where it belongs. When that’s not the case, it is still be possible to navigate between the item and its descendants but axis such as ancestor:: or sibling:: are not available.

A note Note
Identity can be seen as the price to pay for the ancestor:: and sibling:: axis.

Let’s take back a simple example:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <foo>5</foo>
    <foo>5</foo>
    <bar> 
        <foo>5</foo>
    </bar>
</root>

In an hypothetical data model where nodes have no identity, there would be only 3 elements:

  • The root element
  • The bar element
  • The foo element (referred twice has children of root end once as child of bar)

If we add identity (or context information) properties, the foo elements become three information different items since they defer by these properties.

The process of adding these properties to an information item looks familiar. Depending on your background, you can compare it to:

  • class/object instantiation in class based Object Oriented Programming
  • clones in prototype based Object Oriented Programming
  • RDF reification.

We’ve seen that XDM 3.0 acknowledges this difference between information items which have context information and those which don’t have. I don’t want to deny that both types of data models have their use cases: there are obviously many use cases where context information is needed and use cases where lightweight structures are a better fit.

That being said, if we are serious about the support of JSON in XDM, we should offer the same features to access data whether this data is stored in maps or in XML nodes.

Let’s consider this JSON object borrowed from the XSLT 3.0 Working Draft:

{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

This object could be represented in XML by the following document:

<?xml version="1.0" encoding="UTF-8"?>
<company>
    <department name="sales">
        <employee>
            <firstName>Sally</firstName>
            <lastName>Green</lastName>
            <age>27</age>
        </employee>
        <employee>
            <firstName>Jim</firstName>
            <lastName>Galley</lastName>
            <age>41</age>
        </employee>
    </department>
    <department name="accounting">
        <employee>
            <firstName>John</firstName>
            <lastName>Doe</lastName>
            <age>23</age>
        </employee>
        <employee>
            <firstName>Mary</firstName>
            <lastName>Smith</lastName>
            <age>32</age>
        </employee>
    </department>
</company>

The features introduced in the latest XSLT 3.0 Working Draft do allow to transform rather easily from one model to the other, but these two models do not have, bar far, the same features.

In the XML flavor, when the context item is the employee “John Doe”, you can easily find out what his department is because this is an element and element do carry context information.

In the map flavor by contrast when the context item is an employee map, this object has no context information and you can’t tell which is his department without looping within the containing map.

This important restriction is at a purely data model level. It is aggravated by the XPath syntax has not been extended to generalize axis so that they can work with maps. If I work with the XML version of this structure, it’s obvious to evaluate things such as the number of employees, the average age of employees, the number of departments, the number of employees by department, the average age by department, obvious to find out if there is an employee called “Mary Smith” in one of the departments, the employees who are more than 40, to get a list of employees from all the department sorted by age, … In the map flavor by contrast, I don’t have any XPath axis available and must do all these operations using a limited number of map functions (map:keys(), map:contains(), map:get()). In other words, while I can use XPath expressions with the XML version, I must use DOM like operations to access the map version!

To summarize, yes XDM 3.0 does support JSON but to do pretty much anything interesting with JSON objects, you’d better transform them into XML nodes first! XSLT 3.0 does give you the tools to do this transformation quite easily but the message to JSON users is that we don’t treat their data model as a first class citizen.

To make it worse, XPath is used by many other specifications, within and outside the W3C and the level of support for JSON provided by XDM and XPath will determine how these specifications will be able to support for JSON. Specifications that are impacted by this issue include XForms, XProc and Schematron. Supporting JSON would be really useful for these three specifications if and only if map items could have the same features than nodes.

Furthermore, the same asymmetry exists when you went to create these two structures from other sources: to create the XML structure you can use sequence constructors but to create the map structure, you have to use the map:new() and map:item() functions.

My proposal to solve this issue is:

  • To acknowledge the fact that any type of information item can be either “context independent” or include context information and explore the consequences of thisstatement.
  • To generalize XPath axis so that they can be used with map items.
  • To create sequence constructors for maps and map entries.

You are welcome to discuss this further:

Introducing χίμαιραλ (chimeral), the Chimera Language

When I started to work on χίμαιραλ a few months ago, my first motivation was to propose an XDM serialization for maps which would turn the rather abstract prose from the specification into concrete angle brackets that you could see and read.

The exercise has been very instructive and helped me a lot to understand the spec, however a more ambitious use pattern has emerged while I was making progress. The XSLT 3.0 Working Draft is part of a batch of Working Drafts which are far more advanced. My proposals to solve the “map identity crisis” are probably too intrusive and too late to be taken into account and the batch of specifications will most probably carry on with the current proposal.

If that’s the case, we’ve seen that it makes a lot of sense to convert maps into nodes to enable to use XPath axis and χίμαιραλ provides a generic target format for these conversions.

Example

Let’s take again the JSON object borrowed from the XSLT 3.0 Working Draft:

{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

Its χίμαιραλ representation is:

<?xml version="1.0" encoding="UTF-8"?>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
    <χ:map>
        <χ:entry key="sales" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
        <χ:entry key="accounting" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
    </χ:map>
</χ:data-model>

Granted, it’s much more verbose than the JSON version, but it’s the exact translation of the XDM corresponding to the JSON object in XML.

χίμαιραλ In a Nutshell

The design goals are:

  • Be as close as possible to the XDM and its terminology
  • Represent XML nodes as… XML nodes
  • Allow round-trips (an XDM model serialized as χίμαιραλ should give a XDM model identical to the original one when de-serialized)
  • Be easy to process using XPath/XQuery/XSLT
  • Support of the PSVI is not a goal

χίμαιραλ is not the only proposal to serialize XDM as XML. Two other notable ones are:

  • Zorba’s XDM serializationis a straight andaccurate XDM serialization which does support PSVI annotations. As a consequence, nodes are serialized as xdm:*elements (an element is anxdm:element, an attribute an xdm:attribute element, …). This does’n meet by second requirement to represent nodes as themselves.
  • XDML, presented by Rennau, Hans-Jürgen, and David A. Lee atBalisage 2011 is more than just an XDM serialization and also includes manipulation and processing definitions. It introduces its own terminology and concepts and is toofar away from XDM for my design goals.

A lot of attention has been given to the first design goal: the structure of a χίμαιραλ model and the name of its elements and attributes are directly derived from the specifications.

In XDM, map entries’ values can be arrays (an array beeing nothing else than a map with integer keys) but also sequences (which is not possible in JSON). χίμαιραλ respects the fact that in XDM there is no difference between a sequence composed of a single element and represents sequences by a repetition of values.

The map map{1:= 'foo'} is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:map>
      <χ:entry key="1" keyType="number">
         <χ:atomic-value type="string">foo</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

And the map map{1:= ('foo', 'bar')} is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:map>
      <χ:entry key="1" keyType="number">
         <χ:atomic-value type="string">foo</χ:atomic-value>
         <χ:atomic-value type="string">bar</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

We’ve seen that XDM makes a clear distinction between nodes which have identities and other item types (atomic values, functions and maps) which haven’t. XDM allows to use nodes as map entry values. χίμαιραλ allows this feature too, but copying the nodes would create new nodes with different identities.

To avoid that, documents to which these nodes belong are copied into χ:instance elements and references between map entries values and instances are made using XPath expressions.

The following $map variable:

<xsl:variable name="a-node">
    <foo/>
</xsl:variable>
<xsl:variable name="map" select="map{'a-node':= $a-node}"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4" kind="document">
      <foo/>
   </χ:instance>
   <χ:map>
      <χ:entry key="a-node" keyType="string">
         <χ:node kind="document" instance="d4" path="/"/>
      </χ:entry>
   </χ:map>
</χ:data-model>

Like XSLT variable, instances do not always contain document nodes and the following $map variable:

<xsl:variable name="a-node" as="node()">
    <foo/>
</xsl:variable>
<xsl:variable name="map" select="map{'a-node':= $a-node}"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4e0" kind="fragment">
      <foo/>
   </χ:instance>
   <χ:map>
      <χ:entry key="a-node" keyType="string">
         <χ:node kind="element" instance="d4e0" path="root()" name="foo"/>
      </χ:entry>
   </χ:map>
</χ:data-model>

Nodes can belong to more than one instances, and this $map variable:

<xsl:variable name="a-node" as="node()*">
    <foo/>
    <bar/>
</xsl:variable>
<xsl:variable name="map" select="map{'a-node':= $a-node}"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4e0" kind="fragment">
      <foo/>
   </χ:instance>
   <χ:instance id="d4e3" kind="fragment">
      <bar/>
   </χ:instance>
   <χ:map>
      <χ:entry key="a-node" keyType="string">
         <χ:node kind="element" instance="d4e0" path="root()" name="foo"/>
         <χ:node kind="element" instance="d4e3" path="root()" name="bar"/>
      </χ:entry>
   </χ:map>
</χ:data-model>

Nodes can be “deep linked”, a same node can be linked several times and nodes can be mixed with atomic values at wish. The following $map variable:

<xsl:variable name="doc">
    <department name="sales">
        <employee>
            <firstName>Sally</firstName>
            <lastName>Green</lastName>
            <age>27</age>
        </employee>
        <employee>
            <firstName>Jim</firstName>
            <lastName>Galley</lastName>
            <age>41</age>
        </employee>
    </department>
    <department name="accounting">
        <employee>
            <firstName>John</firstName>
            <lastName>Doe</lastName>
            <age>23</age>
        </employee>
        <employee>
            <firstName>Mary</firstName>
            <lastName>Smith</lastName>
            <age>32</age>
        </employee>
    </department>
</xsl:variable>
<xsl:variable name="map"
    select="map{
            'sales' := $doc/department[@name='sales'],
            'Sally' := $doc//employee[firstName = 'Sally'],
            'kids'  := $doc//employee[age &lt; 30],
            'dep-names-attributes' := $doc/department/@name,
            'dep-names' := for $name in $doc/department/@name return string($name)
            }"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4" kind="document">
      <department name="sales">
         <employee>
            <firstName>Sally</firstName>
            <lastName>Green</lastName>
            <age>27</age>
         </employee>
         <employee>
            <firstName>Jim</firstName>
            <lastName>Galley</lastName>
            <age>41</age>
         </employee>
      </department>
      <department name="accounting">
         <employee>
            <firstName>John</firstName>
            <lastName>Doe</lastName>
            <age>23</age>
         </employee>
         <employee>
            <firstName>Mary</firstName>
            <lastName>Smith</lastName>
            <age>32</age>
         </employee>
      </department>
   </χ:instance>
   <χ:map>
      <χ:entry key="sales" keyType="string">
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[1]"
                 name="department"/>
      </χ:entry>
      <χ:entry key="Sally" keyType="string">
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[1]/&#34;&#34;:employee[1]"
                 name="employee"/>
      </χ:entry>
      <χ:entry key="kids" keyType="string">
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[1]/&#34;&#34;:employee[1]"
                 name="employee"/>
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[2]/&#34;&#34;:employee[1]"
                 name="employee"/>
      </χ:entry>
      <χ:entry key="dep-names-attributes" keyType="string">
         <χ:node kind="attribute"
                 instance="d4"
                 path="/&#34;&#34;:department[1]/@name"
                 name="name">sales</χ:node>
         <χ:node kind="attribute"
                 instance="d4"
                 path="/&#34;&#34;:department[2]/@name"
                 name="name">accounting</χ:node>
      </χ:entry>
      <χ:entry key="dep-names" keyType="string">
         <χ:atomic-value type="string">sales</χ:atomic-value>
         <χ:atomic-value type="string">accounting</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

Remaining Issues

A collation property should be added to <χ:map/>, probably as an attribute, the transformation to serialize to χίμαιραλ should be cleaned up and the reverse transformation should be implemented.

These are pretty trivial issues and the biggest one is probably to find a way to cleanly serialize references to nodes that are not contained within an element, such as the following $map variable:

<xsl:variable name="attribute" as="node()">
    <xsl:attribute name="foo">bar</xsl:attribute>
</xsl:variable>
<xsl:variable name="map"
    select="map{
            'attribute' := $attribute
            }"/>

Support of functions should also be considered.

χίμαιραλ and the identity crisis

To some extend, χίμαιραλ can be considered as a solution to the XDM identity crisis:

  • Serializing an XDM model as χίμαιραλ creates elements for maps, map entries and atomic values and these elements, being nodes, have identities. The serialization istherefore also an instantiation of XDM information items as defined above.
  • De-serializing a χίμαιραλ to create an XDM data model is also a de-instantiation– except of course that the identity of XML nodes is not “removed”.

However, χίμαιραλ does keep a strong difference between nodes which are kept in <χ:instance> elements and maps and atomic values.

Moving the chimera forward

χίμαιραλ is a good playground to explore the new possibilities offered by XDM 3.0. Here is a (non exhaustive) list of a few directions that seem interesting…

A note Note
Don’t expect to find fully baked proposals in this section which contains, on the contrary very early drafts of ideas to follow to support XDM maps as “first class citizens”!

Embracing RDF

If you had the opportunity to enjoy the sunny weather of Orlando in December 2001, you may remember “The Syntactic Web” a provocative talk where Jonathan Robie has shown how XQuery 1.0 could be used to query normalized XML/RDF documents.

The gap between RDF triples and the versatility of its XML representation was a big issue, but the new features brought by this new version of the XPath/XQuery/XSLT package should help us.

The basic data model of RDF is based on triples, a triple being a composed of a subject, a predicate and an object. In XDM, a triple can now be represented by either a sequence, an array or a map of three items.

XDM sequences have the property that they cannot include other sequences and representing triples as sequences would mean that you couldn’t define sequences of triples. For that reason it is probably better to define triples as maps or arrays. An array being a map indexed by integers, that doesn’t make a huge difference at a conceptual level, but I find it cleaner to access to the subject of a triple using a QName (such as rdf:subject) rather than an index. Following this principle, we could define a triple as:

map {
    xs:QName('rdf:subject')   := xs:anyURI('http://www.example.org/index.html'),
    xs:QName('rdf:predicate') := xs:anyURI('http://purl.org/dc/elements/1.1/creator'),
    xs:QName('rdf:object')    := xs:anyURI('http://www.example.org/staffid/85740')
}

The χίμαιραλ serialization of this map is:

<χ:data-model xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:χ="http://χίμαιραλ.com#"
              xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <χ:map>
      <χ:entry key="rdf:object"
               keyType="xs:QName">
         <χ:atomic-value type="xs:anyURI">http://www.example.org/staffid/85740</χ:atomic-value>
      </χ:entry>
      <χ:entry key="rdf:predicate"
               keyType="xs:QName">
         <χ:atomic-value type="xs:anyURI">http://purl.org/dc/elements/1.1/creator</χ:atomic-value>
      </χ:entry>
      <χ:entry key="rdf:subject"
               keyType="xs:QName">
         <χ:atomic-value type="xs:anyURI">http://www.example.org/index.html</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

What can we do with such triples? Using higher order functions, it should not be too difficult to define triple stores with basic query features!

Is this lightweight enough? Or does RDF support deserve new information item types to be supported by XDM?

Syntactical sugar

We’ve seen that this JSON object

{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

Is serialized in χίμαιραλ as:

<?xml version="1.0" encoding="UTF-8"?>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
    <χ:map>
        <χ:entry key="sales" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
        <χ:entry key="accounting" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
    </χ:map>
</χ:data-model>

We can work with that, but wouldn’t it be nice if we had a native syntax that does not use XML elements and attributes to represent maps?

Depending on the requirements, many approaches are possible.

A first option would be to define pluggable notation parsers within XML and write:

<χ:notation mediatype="application/json"><![CDATA[
{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}                  
]]></χ:notation>

The meaning of the <χ:notation/> element would be to trigger a parser supporting the application/json datatype. This is less verbose, more natural to JSON users, but doesn’t allow to add XML nodes in maps or sequences.

Another direction would be to extend the syntax of XML itself. To do so, again, there are many possibilities. The markup in XML is based on angle brackets and the distinction between the different XML productions is usually done through the character following the bracket in the opening tags.

This principle leaves a lot of possibilities. For instance, maps could be identified by the tags <{> and </}> to follow the characters used by XDM map literals and JSON objects:

<χ:data-model>
    <{>
        <χ:entry key="sales" keyType="string">
            <{>
                <χ:entry key="1" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
            </}>
        </χ:entry>
        <χ:entry key="accounting" keyType="string">
            <{>
                <χ:entry key="1" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
            </}>
        </χ:entry>
    </}>
</χ:data-model>

Map entries are not ordered and in that respect they are similar to XML attributes. We could use this similarity and use the character @ to identify map entries:

<χ:data-model>
    <{>
        <@"sales" keyType="string">
            <{>
                <@"1" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"1">
                <@"2" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"2">
            </}>
        </@"sales">
        <@"accounting" keyType="string">
            <{>
                <@"1" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"1">
                <@"2" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"2">
            </}>
        </@"accounting">
    </}>
</χ:data-model>

The key names have been enclosed between quotes because map keys can include any character including whitespaces, but they can be made optional when they are not needed. We could also give to the keyType a default value of “string”:

<χ:data-model>
    <{>
        <@sales>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </@firstName
                    </}>
                </@1
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </@firstName
                    </}>
                </@2
            </}>
        </@sales
        <@accounting>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </@firstName
                    </}>
                </@1
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </@firstName
                    </}>
                </@2
            </}>
        </@accounting
    </}>
</χ:data-model>

Atomic values could be identified by <=> and </=> and the same default value applied to its type attribute:

<χ:data-model>
    <{>
        <@sales>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <=>Green</=>
                        </@lastName>
                        <@age>
                            <= type="number">27</=>
                        </@age>
                        <@firstName>
                            <=>Sally</=>
                        </@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <=>Galley</=>
                        </@lastName>
                        <@age>
                            <= type="number">41</=>
                        </@age>
                        <@firstName>
                            <=>Jim</=>
                        </@firstName>
                    </}>
                </@2>
            </}>
        </@sales>
        <@accounting>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <=>Doe</=>
                        </@lastName>
                        <@age>
                            <= type="number">23</=>
                        </@age>
                        <@firstName>
                            <=>John</=>
                        </@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <=>Smith</=>
                        </@lastName>
                        <@age>
                            <= type="number">32</=>
                        </@age>
                        <@firstName>
                            <=>Mary</=>
                        </@firstName>
                    </}>
                </@2>
            </}>
        </@accounting>
    </}>
</χ:data-model>

The tags that surround atomic values are useful when these values are within a sequence but look superfluous when the item has a single value. The next step could be to define that in that case as a shortcut the value and its type attribute could be directly included in the item:

<χ:data-model>
    <{>
        <@sales>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>Green</@lastName>
                        <@age type="number">27</@age>
                        <@firstName>Sally</@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>Galley</@lastName>
                        <@age type="number">41</@age>
                        <@firstName>Jim</@firstName>
                    </}>
                </@2>
            </}>
        </@sales>
        <@accounting>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>Doe</@lastName>
                        <@age type="number">23</@age>
                        <@firstName>John</@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>Smith</@lastName>
                        <@age type="number">32</@age>
                        <@firstName>Mary</@firstName>
                    </}>
                </@2>
            </}>
        </@accounting>
    </}>
</χ:data-model>

XPath

The χίμαιραλ serialization being XML, it is possible to use XPath path expressions to query its structure. For instance, to get a list of employees which are less than 30, we can

write:

χ:map/χ:entry/χ:map/χ:entry/χ:map[χ:entry[@key='age'][χ:atomic-value < 30]]

Or, if we’re feeling lucky:

//χ:map[χ:entry[@key='age'][χ:atomic-value < 30]]

Again, that’s good as long we work on a χίμαιραλ serialization but it would be good to be able to use path expressions directly on map data structures. To do so we would need at minima to define steps to match maps and entries.

XSLT 3.0 introduces a new map() item type which could be used as a kind test to identify maps.

If we follow the idea that map entries are similar to XML attributes, we could use the @ notation to identify them. The XPath expression would then become:

map()/@*/map()/@*/map()[@age < 30]]

Or, if we’re feeling lucky:

//map()[@age < 30]]

Validation

These data models can be complex. Wouldn’t it be useful to be able to validate them with schema languages? This would give us a way to validate JSON maps!

Of course we can already serialize them in χίμαιραλ and validate the serialization using any schema language, but again it would be good to be able to validate these structures directly.

A RELAX NG schema to validate the χίμαιραλ serialization of our example would be:

namespace χ = "http://χίμαιραλ.com#"

start = element χ:data-model { top-level-map }

# Top level map: departments
top-level-map =
    element χ:map {
        element χ:entry {
            attribute key { xsd:NMTOKEN },
            attribute keyType { "string" },
            emp-array
        }*
    }

# List of employees
emp-array =
    element χ:map {
        element χ:entry {
            attribute key { xsd:positiveInteger },
            attribute keyType { "number" },
            emp-map
        }*
    }

# Description of an employee
emp-map = element χ:map { (age | firstName | lastName) + }

age =
    element χ:entry {
        attribute key { "age" },
        attribute keyType { "string" },
        element χ:atomic-value {
            attribute type { "number" },
            xsd:positiveInteger
        }
    }

firstName =
    element χ:entry {
        attribute key { "firstName" },
        attribute keyType { "string" },
        element χ:atomic-value {
            attribute type { "string" },
            xsd:token
        }
    }

lastName =
    element χ:entry {
        attribute key { "lastName" },
        attribute keyType { "string" },
        element χ:atomic-value {
            attribute type { "string" },
            xsd:token
        }
    }
A note Note
In the description of the maps used to describe employees, we cannot use interleave patterns because of the restriction on interleave and the schema is approximate. In this specific case, we could

enumerate the six possible combinations but the exercise would quickly become verbose if the number of items

grew:

emp-map = element χ:map { 
      (age, firstName, lastName)  
    | (age, lastName, firstName) 
    | (firstName, age, lastName)  
    | (firstName, lastName, age) 
    | (lastName, age, firstName)  
    | (lastName, firstName, age) 
}

A Schematron schema for the χίμαιραλ serialization could be developed based on XPath expressions similar to those that have been shown in the previous section.

Again, it would be interesting to support maps directly as first class citizens in XML schema languages.

The ability to use Schematron on XDM maps depends directly on the ability to browse maps using patterns and path expressions in XPath and XSLT (see above)…

The main impact on RELAX NG would be to add map and item patterns and the schema could look like:

namespace χ = "http://χίμαιραλ.com#"

start = element χ:data-model { top-level-map }

# Top level map: departments
top-level-map =
    map  {
        entry xsd:NMTOKEN {
            emp-array
        }*
    }

# List of employees
emp-array =
    map {
        entry xsd:positiveInteger {
            emp-map
        }*
    }

# Description of an employee
emp-map = map { age, firstName, lastName  }

age =
    entry age {
            xsd:positiveInteger
        }
    }

firstName =
    entry firstName {
             xsd:token
        }
    }

lastName =
    entry lastName {
            xsd:token
        }
    }

Sequences could probably be supported without adding a new pattern but would require to relax some restrictions to allow the description of sequences mixing atomic values, maps and nodes (in Relax NG, sequences of atomic values are already possible in list datatypes, sequences of nodes are of course available to describe node contents but these two type of sequences cannot be mixed).

Conclusion

According to the definition of chimeras in genetics from Wikipedia quoted in the introduction, [chimeras are formed from at least four parent cells (two fertilized eggs or early embryos fused together). Each population of cells keeps its own character and the resulting organism is a mixture of tissues].

The current XDM proposals have added to the XML data model a foreign model to represent maps. This new model is a superset of the JSON data model. The two data models keep their own character and the resulting model is a mixture of information items.

It’s far to say that the current XDM proposal is a chimera, something described as [usually ugly, foolish or impossible fantasies] by Jeni Tennison.

I hope that the proposals sketched in this paper will help to address this situation and fully integrate these new information items in the XML echosystem.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

XML Prague 2011 : XML à l’attaque du web

Salle de conférence pendant la pause café

Après une période un peu folle entre 2000 et 2008 pendant laquelle j’ai participé à un nombre impressionnant de conférences, je m’étais mis un peu en retrait et n’avais plus participé à aucune conférence depuis XTech 2008.

XML Prague 2011 était donc pour moi l’occasion de rencontrer à nouveau la communauté des experts XML internationaux et j’étais curieux de voir comment elle avait évolué pendant ces trois dernières années.

MURATA Makoto (EPUB3: Global Language and Comic)A côté des aspects plus techniques, je n’oublierai pas l’image de Murata Makoto exprimant sobrement sa peine pour les victimes du tremblement de terre au Japon.

La tagline de XML Prague 2011 était “XML devait être l’espéranto du Web. Pourquoi n’est-ce pas le cas?” (“XML as new lingua franca for the Web. Why did it never happen?”).

Michael Sperberg-McQueen (Closing keynote)Le contenu de la conférence est resté proche de cette ligne mais il a été résumé de manière plus exacte par Michael Sperberg-McQueen lors de sa clôture : “Mettons du XML dans le navigateur, qu’ils le veuillent ou non!”

Norman Walsh (HTML+XML: The W3C HTML/XML Task Force)Le ton a été donné par Norman Walsh dès la toute première présentation: la convergence entre HTML et XML n’aura pas lieu.

XML a tenté d’être un format neutre convenant aussi bien aux documents qu’aux données sur le web. On peut dire aujourd’hui que cet objectif n’a pas été atteint et que les formats les plus populaires sur le web sont HTML pour les documents et JSON pour les données.

Cela ne semble pas préoccuper plus que mesure le public de XML Prague composé d’aficionados des langages à balises : si la “masse des développeurs web” n’est pas intéressée par XML c’est son problème. Les bénéfices liés à XML sont bien connus et cela signifie simplement que la communauté XML devra développer les outils nécessaires pour utiliser XML dasn le navigateur aussi bien que sur le serveur.

Sur ce thème, beaucoup de présentations couvraient le support de XML dans le navigateur ainsi que les passerelles entre JSON et XML :

  • Validation XML Schema côté client par Henry S. Thompson and Aleksejs Goremikins
  • JSON pour XForms par Alain Couthures
  • XSLT dans le navigateur par Michael Kay
  • Traitement de XML efficace dans les navigateurs par Alex Milowski
  • XQuery dans le navigateur par Peter Fischer

Les outils côté serveurs ont fait l’objet de moins de sessions, peut être parce que le sujet est plus ancien :

  • Une façade JSON pour le serveur MarkLogic par Jason Hunter
  • CXAN: étude de cas pour Servlex, un framework XML pour le web par Florent Georges
  • Akara – “Spicy Bean Fritters” et services de données XML par Uche Ogbuji

Bien entendu, les standards étaient aussi au programme :

  • HTML+XML: la task force W3C HTML/XML (déjà mentionnée) par Norman Walsh
  • Standards update: XSLT 3.0 par Michael Kay
  • Standards update: XML, XQuery, XML Processing Profiles, DSDL par Liam Quin, Henry S. Thompson, Jirka Kosek

Ainsi que les applications de XML :

  • Configuration d’équipements réseau avec NETCONF et YANG par Ladislav Lhotka
  • Développements XML – XML Projects par George Bina
  • EPUB3: le langage et les bandes dessinées par Murata Makoto
  • EPUB: Chapitres et versets par Tony Graham
  • DITA NG – une implémentation Relax NG de DITA par George Bina

Sans oublier quelques présentations techniques sur les implémentations elles mêmes :

  • Traduction de SPARQL et SQL en XQuery par Martin Kaufmann
  • Réécritures déclaratives de XQuery pour le profit et le plaisir par John Snelson

Et la séance de clôture par le roi de cet excercice, Michael Sperberg-McQueen.

Ma présentation, “injection XQuery”, était assez atypique dans cet ensemble et il a fallu tout le talent de Michael Sperberg-McQueen pour lui trouver un point commun en faisant remarquer que pour avoir une chance de mettre XML sur le web il faudrait se préoccuper un peu plus de sécurité.

J’avais été impressionné lors des conférences XTech par l’évolution des techniques de présentation, la plupart des intervenants rejetant les traditionnelles présentation powerpoint et leurs “transparents” surchargés pour des alternatives plus légères et beaucoup plus imagées.

John Snelson (Declarative XQuery Rewrites for Profit or Pleasure)Je pensais ce mouvement inéluctable et ai été bien surpris de voir qu’il n’avait guère atteint les intervenants de XML Prague 2011 qui (à l’exception très notable de John Snelson) continuaient à utiliser powerpoint de manière très traditionnelle.

J’avais conçu ma présentation en suivant ce que je croyais être la technique de présentation devenue classique. Utilisant Slidy, j’avais pas moins de 35 pages très concises à présenter en 25 minutes. Chaque page avait une photo différente en arrière plan et ne comprenait que quelques mots.

Les commentaires ont été plutôt positifs bien que certaines photos d’injections aient choqué quelques participants.

Ma présentation étant du HTML standard, j’avais jugé plus sur d’utiliser l’ordinateur mis à disposition par les organisateurs. C’était sans compter sur les 74 Moctets d’images à charger pour les fonds de pages qui ont mis à mal cet ordinateur un peu poussif et les pages étaient un peu lentes à l’affichage (note personnelle : la prochaine fois, utilise ton ordinateur)!

The twitter wall (and Norman Walsh)Le “mur twitter” projeté au moyen d’un second vidéo projecteur a eu également beaucoup de succès.

Ce mur a été bien pratique pour communiquer pendant les sessions et il remplace avantageusement les canaux IRC que nous utilisions auparavant.

Twitter ne permet malheureusement pas de rechercher dans ses archives et, alors que j’écris ces mots, je ne peux déjà plus accéder aux tweets du premier jour de la conférence!

Avec un peu de recul, si j’essaye d’analyser ce qui s’est dit à XML Prague 2011, j’ai des sentiments mitigés à propos de ce fossé qui se creuse entre communautés Web et XML.

Le rêve que XML puisse être accepté par l’ensemble de la communauté des développeurs web était une vision très forte et nous ne devons pas oublier que XML a été conçu pour mettre “SGML sur le web“.

Ceci dit, il faut bien reconnaître que les développeurs web ont toujours été réticents devant la complexité additionnelle (réelle ou perçue) de XHTML. Ce fossé a toujours existé et après que XML ait manqué le virage du Web 2.0 il était trop tard pour espérer le combler.

XML sur le web restera donc une niche et continuera à être utilisé par une minorité, mais la créativité et le dynamisme de la communauté qui s’est manifesté à Prague est impressionnant et encourageant : il y a encore place pour beaucoup d’innovations et XML est, plus que jamais, une technologie de choix pour développer des applications web.

Photos

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

XML Prague 2011: XML against the web

Coffee break

After a frenzy period between 2000 and 2008 where I have spoken at an impressive number of conferences, I temporally retired and hadn’t been at a conference since XTech 2008.

For me, XML Prague 2011 was the first opportunity to meet again face to face with the community of XML core techies and I was curious to find out what the evolution had been during the past three years.

MURATA Makoto (EPUB3: Global Language and Comic)Aside from all the technical food for thought, an image of the conference that I won’t forget is Murata Makoto expressing his grief for the victims of the earthquake in Japan with simple and sober terms.

The tag line of XML Prague 2011 was “XML as new lingua franca for the Web. Why did it never happen?”.

Michael Sperberg-McQueen (Closing keynote)The actual content of the conference has been close to this tag line but was better summarized by Michael Sperberg-McQueen during his closing keynotes: “Let’s put XML in the browser, whether they want it there or not!”

Norman Walsh (HTML+XML: The W3C HTML/XML Task Force)The tone was given by Norman Walsh during the very first session: the convergence between HTML and XML will not happen.

XML has been trying hard to be an application neutral format for the web that could be used both for documents and data. It is fair to say that it has failed to reach this goal and that the preferred formats on the web are HTML for documents and JSON for data.

That doesn’t seem to bother that much the XML Prague attendees who are markup language addicts anyway: if the “mass of web developers” do not care about XML that’s their problem. The benefits of using XML is well known and that just means that we have to develop the XML tools we need on the server as well as on the browser.

Following this line, many sessions were about developing XML support on the browser and bridging the gaps between XML and HTML/JSON:

  • Client-side XML Schema validation by Henry S. Thompson and Aleksejs Goremikins
  • JSON for XForms by Alain Couthures
  • XSLT on the browser by Michael Kay
  • Efficient XML Processing in Browsers by Alex Milowski
  • XQuery in the Browser reloaded by Peter Fischer

By contrast, server side tools have been less represented, maybe because the domain had been better covered in the past:

  • A JSON Facade on MarkLogic Server by Jason Hunter
  • CXAN: a case-study for Servlex, an XML web framework by Florent Georges
  • Akara – Spicy Bean Fritters and XML Data Services by Uche Ogbuji

Of course, standard updates were also on the program:

  • HTML+XML: The W3C HTML/XML Task Force (already mentioned) by Norman Walsh
  • Standards update: XSLT 3.0 by Michael Kay
  • Standards update: XML, XQuery, XML Processing Profiles, DSDL by Liam Quin, Henry S. Thompson, Jirka Kosek

We also had talks about XML applications:

  • Configuring Network Devices with NETCONF and YANG by Ladislav Lhotka
  • Advanced XML development – XML Projects by George Bina
  • EPUB3: Global Language and Comic by Murata Makoto
  • EPUB: Chapter and Verse by Tony Graham
  • DITA NG – A Relax NG implementation of DITA by George Bina

Without forgetting a couple of implementation considerations:

  • Translating SPARQL and SQL to XQuery by Martin Kaufmann
  • Declarative XQuery Rewrites for Profit or Pleasure by John Snelson

And the traditional and always impressive closing keynote by Michael Sperberg-McQueen.

My own presentation, “XQuery injection”, was quite atypical and it took all the talent of Michael Sperberg-McQueen to kindly relate it to “XML on the web” by noticing that security would have to be taken more seriously to make it happen.

One of the things that had impressed me during XTech conferences was the shift in presentation styles, most speakers moving away from heavy bullet points stuffed traditional powerpoint presentations to lighter and better illustrated shows.

John Snelson (Declarative XQuery Rewrites for Profit or Pleasure)I had expected the move to continue and have been surprised to see that the movement doesn’t seem to have caught XML Prague presenters whom continued to do with traditional bullet points with only a couple of exceptions (John Snelson being a notable exception).

I had worked my presentation to use what I thought would be a common style. Using Slidy, I had created no less than 35 short pages to present in 25 minutes. Each page had a different high resolution picture as a background and contained only a few words.

The comments have been generally good even though some pictures chosen to represent injections seem to have hurt the feelings of some attendees.

Since my presentation is just standard HTML, I had been brave enough to use the shared computer. Unfortunately, the presentation loads 74 Megs of background pictures and that was a little bit high for the shared computer that took several seconds to change pages (note to self: next time, use your own laptop)!

The twitter wall (and Norman Walsh)Another interesting feature of this conference was the “twitter wall” that was projected in the room using a second video projector.

This wall has proven to be very handy to communicate during the sessions and it can be seen like a more modern incarnation of the IRC channels used in earlier conferences.

Unfortunately, twitter doesn’t allow to search in archives and while I am writing these words, I can no longer go back in the past and read the tweets of the first day of the conference.

Looking backward at the conference, I have mixed feelings about this gap that now seems to be widely accepted on both sides between the XML and the web developers communities.

The dream that XML could be accepted by the web community at large was a nice vision and we should not forget that XML has been designed to be “SGML on the web“.

Web developers have always been reluctant to accept the perceived additional complexity of XHTML and the gap has been there from the beginning and after XML missed the train of Web 2.0 it was too late to close it.

XML on the web will stay a niche and will be used by a minority but the creativity and dynamism of the community shown at Prague is inspiring and encouraging: there is still room for a lot of innovation and XML is more than ever the technology of choice to power web applications.

All the pictures

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites