Toward χίμαιραλ/superset

Background

Up to now, my approach to explore possible solutions to support JSON in XDM has been to evaluate the proposal to introduce maps in XSLT 3.0, leading to the χίμαιραλ proposal.

Finding out that in the current XSLT 3.0 Working Draft, JSON objects would be second class XDM citizens, I think that it’s worth considering other approaches and consider Jürgen Rennau’s UDL proposal has a very good starting point.

In their comments after Rennau’s presentation, John Cowan and Steven DeRose stressed out the need to forget about the serialization and focus on the data model, at least as a first step, when proposing alternative solutions.

Following their advise, I’d like to propose an update to core XDM 3.0 that would natively support JSON objects without introducing any new item kind.

Because this proposal is a chimera and because it’s a superset of the current XDM (any current XDM instance would be compatible with the updates I am proposing), I’ll call this proposal χίμαιραλ/superset.

The XDM carefully avoids to draw class models and prefers to specify nodes through nodes kinds and accessors. However, I think fair to say that elements are defined as nodes with three basic properties:

A mandatory name which is a QName.
A map of attributes which keys are QName.
An array of children nodes.

On the other hand, the JSON data model is composed of arrays, objects and atomic values. JSON atomic values can be numbers, strings, booleans or null. JSON key arrays can be any atomic values.

Traditional approaches to bind JSON and XML use XML element’s children to represent JSON object properties and UDL is no different in that matter.

There are obvious good reasons to do, the main one being that XML attribute values can only be atomic values while JSON object properties can also be maps or arrays.

However, the end result is that the interface mismatch resulting from binding JSON objects (which are maps) into XML children (which are arrays) is at the origin of most of the complexity of these proposals.

Proposal ( χίμαιραλ/superset)

Since XML elements already have both a map (of attributes) and an array (of children nodes), why not use these native features to bind JSON maps and arrays and just update the data model to remove the restrictions that make attribute maps and children arrays unfit for being bound respectively to JSON objects and arrays?

A JSON object would then just be bound to an XML element with attributes and without children nodes and a JSON array would be bound to an XML element with children nodes and without attribute.

This seems especially easy if we focus on the data model and postpone the (optional) definition of a serialization syntax.

First modification: elements can be anonymous

Neither JSON objects nor JSON arrays have name but XML elements have names and this name is mandatory.

To fix this issue, element names should become optional (in other words, we introduce the notion of anonymous elements).

Second modification: attribute names should also possibly be strings, booleans or null

If we bind JSON object keys on attribute names, it should be possible to use all the basic types that JSON accept for its keys.

Additionally, we may want to consider supporting other XML Schema simple types, possibly each of them.

To make this possible, the definition of the dm:node-name() accessor should be updated to return a typed values rather than a QName. This modification should concern attribute nodes at minima but for maximum homogeneity, we should probably extend that to other node types.

Third and last modification: attributes should have (optional) attributes and children

JSON object values can be objects and arrays and since objects are bound to attributes and arrays are bound to children, attributes should support both.

Mapping JSON into χίμαιραλ/superset

With these updates, binding JSON into XML become quite straightforward:

A JSON object is mapped into an anonymous element without children and one attribute per key/value pair.
A JSON array is mapped into an anonymous element without attribute and a child element per item.

Let’s take the now famous (at least on this blog) JSON snippet borrowed from the XSLT 3.0 Working Draft:

{ "accounting" : [
      { "firstName" : "John",
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary",
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                
  "sales"     : [
      { "firstName" : "Sally",
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim", 
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

Becomes:

Anonymous element without children and two attributes:
- Attribute « accounting » (as a string) with no attributes and the two following children:
  - Anonymous element with no children and the three following attributes:
    - Attribute « firstName » (as a string) and a value « John » (as a string)
    - Attribute « lastName » (as a string) and a value « Doe » (as a string)
    - Attribute « age » (as a string) and a value 23 (as a number)
  - Anonymous element with no children and the three following attributes:
    - Attribute « firstName » (as a string) and a value « Mary » (as a string)
    - Attribute « lastName » (as a string) and a value « Smith » (as a string)
    - Attribute « age » (as a string) and a value 32 (as a number)
- Attribute « sales » (as a string) with no attributes and the two following children:
  - Anonymous element with no children and the three following attributes:
    - Attribute « firstName » (as a string) and a value « Sally » (as a string)
    - Attribute « lastName » (as a string) and a value « Green » (as a string)
    - Attribute « age » (as a string) and a value 27 (as a number)
  - Anonymous element with no children and the three following attributes:
    - Attribute « firstName » (as a string) and a value « Jim » (as a string)
    - Attribute « lastName » (as a string) and a value « Galley » (as a string)
    - Attribute « age » (as a string) and a value 41 (as a number)

What do you think (comments very welcome)!

5 thoughts on “Toward χίμαιραλ/superset”

Eric, I am strictly opposed to two proposed modifications.

Criticism #1: « Names must remain names. »
In my opinion, a node name should remain a QName (optional namespace URI + mandatory local name which is an NCName). First of all, a node name (attribute or element, no matter) is a *name*, which is a different concept than a key, just as a person’s name must not contain slashes, per cent signs or hash marks. Second, the disruptions caused by suddenly allowing names to be arbitrary strings would be … unpredictable. I cannot even imagine that the W3C would seriously consider such a step – at least for very many years.

Criticism #2: « attributes must remain simple values. »
In principle, the XML model’s constraint that attributes have simple values is arbitrary – but this arbitrariness is clearly in the service of simplicity. This simplicity must not be given up: (a) the proposed change would amount to elements having two collections of children, one being an ordered set, the other being a map, which is a severe complication and not intuitive, compared with the status quo (« content is a sequence of children, plus a set of named simple values »); (b) if suddenly attributes may have complex content, an unbelievable amount of existing code would become unreliable and inadequate. Again, I cannot imagine the W3C to even remotely consider this change.

Répondre

Eric van der Vlist dit :

août 9, 2012 à 12:21 pm

Jüergen,

As you may imagine, I do not agree with these criticisms ;) …

#1: « Names must remain names »

I would like to highlight that:
#1a: Attribute names are already used as keys.
#1b: Attribute and element names are already more than names since they are « QNames ».

Therefore, names are already more than names in the common sense of the word.

#2: “attributes must remain simple values”

Hmmm…. Attributes would still have simple values but they could also have children. This would not break existing applications that would simply continue to access attribute values without even noticing their children.

I personally think that this would be a rather nice feature and that the fact that attributes cannot be annotating is a major restriction that I would be happy to see fixed (it’s a feature that I really like in LMNL for instance)!

The complication would be minimal (depending on how you phrase it, of course) and you could still say that content is a sequence of children plus a set of named attributes.

Thanks for your comment,

Eric

Répondre

Let me clarify the UDL proposal as opposed to the proposals made here. The latter include attributes becoming potentially complex and node names becoming arbitrary strings.

Both changes I regard as disruptive to a high degree. It is not an addition to the existing model, but a radical change of fundamental rules. The UDL proposal, on the other hand, is not disruptive at all. It is a very small addition, which can be summarized as follows: (a) element nodes have two new properties, optional [key] and mandatory [model]; (b) the [model] value « sequence » (the default) corresponds to conventional XML: child nodes are an ordered sequence, and the child elements must not have a [key]; (c) the [model] value « map » switches the content model over to a pure map model: there must not be text children, every child element must have a [key], and child elements are an unordered sequence, or taken together: element content is a map of child elements, with the child [key]s serving as map keys.

A JSON object is then modeled by an XML element with [model] equal « map », and the object members’ keys are captured by the [key] property of the « map-element’s » child elements. JSON arrays and simple values are modeled by conventional XML elements ([map] equal « sequence »).

Finally I would like to clarify that the UDL proposal could not be more focused on the data model, as opposed to markup or serialization: the key idea is that JSON markup must be redefined as a node tree; UDL is all about a shift of focus from markup (one or the other) to information content, defined in terms of a node tree.