XDM Maps should be first class citizens

Note: This issue has been submitted to the W3C as #16118.

The XPath/XQuery/XSLT 3.0 Data model distinguishes three types of information items:

Nodes that directly relate to the XML Infoset with some information borrowed from the PSVI.
Functions
Atomic types.

Michael Kay has recently proposed to add maps as a fourth item type derived from functions.

The main motivation for this addition is to support JSON objects that can be considered as a subset of maps items.

However, in the current proposal map items are treated very differently from XML nodes and this has deep practical consequences.

Take for instance the following simple JSON sample borrowed from Wikipedia:

{
     "firstName": "John",
     "lastName" : "Smith",
     "age"      : 25,
     "address"  :
     {
         "streetAddress": "21 2nd Street",
         "city"         : "New York",
         "state"        : "NY",
         "postalCode"   : "10021"
     },
     "phoneNumber":
     [
         {
           "type"  : "home",
           "number": "212 555-1234"
         },
         {
           "type"  : "fax",
           "number": "646 555-4567"
         }
     ]
 }

To get the postalCode from an equivalent structure expressed as XML and stored in the variable $person, one would just use the following XPath expression: $person/address/postalCode.

When the same structure is expressed in JSON and parsed into an XDM map, XPath axes can no longer be used (their purpose is to traverse documents, ie nodes) and we need to use map functions: map:get(map:get($person, 'address'), 'postcalCode').

That’s not as bad as it sounds because maps can be invoked as functions and this can be rewritten as $person('address')('postalCode') but this gives a first idea of the deep differences between maps and nodes and things would become worse if I wanted to get the postal code of persons whose first name are « John »…

Another important difference is that node items are the only ones that have a context or an identity.

When I write <foo><bar>5</bar></foo><bat></bar>5</bar></bat> each of the two bar elements happen to have the same names and values but they are considered as two different elements and even the two text nodes that are their children are two different text nodes.

When I write foo: {bar: 5}, bat: {bar: 5} the two bar entries are actually the same thing and can’t be distinguished.

This difference is important because that means that XPath axes as we know them for nodes could never be implemented on maps: if an entry in a map can’t be distinguished from en identical entry else where in another map there is no hope to be able to determine its parent for instance.

Now, why is it important to be able to define axes on maps and map entries?

I think that this is important for XSLT and XQuery users to be able to traverse maps like they traverse XML fragments (with the same level of flexibility and syntaxes that are kept as close as possible). And yes, that means being able to apply templates over maps and be able to update maps using XQuery update…

But I also think that this will be important to other technologies that rely on XPath such as (to name those I know best) XForms, pipeline languages (XPL, XPROC, …) and Schematron.

Being able to use XForms to edit JSON object is an obvious need that XForms 2.0 is trying to address through a « hack » that has been presented at XML Prague 2012.

In a longer term we can hope that XForms will abandon this hack to rely on XDM maps XForms relies a lot on the notions of nodes and axes. XForms binds controls to instance nodes and the semantics of such bindings would be quite different to be applied to XDM map entries as currently proposed.

XML pipeline languages are also good candidates to support JSON objects. Both XPL and XProc have features to loop over document fragments and choose actions depending on the results of XPath expressions and again the semantics of these features would be affected if they had to support XDM maps as currently proposed.

Schematron could be a nice answer to the issue of validating JSON objects. Schematron relies on XPath at two different levels: its rules are defined as XPath expressions and it is often very convenient to be able to use XPath axes such as ancestor and its processing model is defined in term of traversing a tree. Again, an update of Schematron to support maps would be more difficult is maps are not similar to XML nodes.

Given the place of JSON on the web, I think that it is really important to support maps and the question we have to face is: « do we want a minimal level of support that may require hard work from developers and other standards to support JSON or do we want to make it as painless as possible for them? ».

And obviously, my preference is the later: if we add maps to the XDM, we need to give them full citizenship from the beginning!

Note: The fact that map entries are unordered (and they need to be because the properties of JSON objects are unordered) is less an issue to me. We already have two node types (namespaces nodes and attributes) which relative order are « stable but implementation-dependent ».

13 thoughts on “XDM Maps should be first class citizens”

Dimitre Novatchev dit :

février 25, 2012 à 6:42 pm

This post raises some very well thought and really necessary, useful requirements. Given that the W3C Working Drafts on XPath 3.0, XDM 3.0, XQuery 3.0 — are all in their « Last Call » status, would you recommend that the work on full maps definition be carried out in a next version, or that the documents be returned to a draft status and maps are included in version 3.0? Facts tell us that there is typically a 5 years period between two versions.

Répondre
1. Eric van der Vlist dit :
  
  février 25, 2012 à 7:06 pm
  
  Hi Dimitre,
  
  I must say I am rather confused by the current status anyway: XDM 3.O is in last call but XSLT 3.0 is behind (the last public version is still labeled XSLT 2.1) and I am not sure if that means that maps would be considered as specific to XSLT 3.0 or if XDM 3.0 would need to be revised…
  
  That being said, features added to these specs tend to become « legacy » and if maps are introduced we probably won’t be able to make the kind of changes I am asking afterwards. If the choice is between having a limited support for maps in 6 months and forever and a full support in 5 years I think I’d prefer the latest but that’s a tough question…
  
  Thanks for your comment!
  
  Eric
  
  Répondre
Jakub Malý dit :

février 25, 2012 à 7:05 pm

Maps indeed are a good candidate for JSON object properties representation.
How about arrays? Do you find the solution of representing array as a map from integers to something satisfactory?

Répondre
1. Eric van der Vlist dit :
  
  février 25, 2012 à 7:17 pm
  
  Jakub,
  
  Another good question!
  
  I have no access to the editors copy of XSLT 3.0 where arrays are probably described and that’s difficult to answer this question.
  
  One important feature for arrays is to be able to loop over their items. If arrays are nothing else than maps, you can do that indirectly by looping over integers between 1 and their size and getting the corresponding entry, but I would find it useful to also have the possibility to loop directly over their entries.
  
  If that’s the case, that kind of hides the fact that they are internally represented as maps, don’t you think so?
  
  Eric
  
  Répondre
  1. Jakub Malý dit :
    
    février 26, 2012 à 8:27 am
    
    Iteration is certainly possible (also using map:keys function). One would have to take special care with empty items – since creating an array[10] should really create an array with 10 slots, initially empty.
    
    But it is the whole idea that a dictionary is hidden behind an array seems just wrong to me – if it does not have the standard properties (random access in O(1), memory footprint O(n)), it is not « really » an array.
    
    Répondre
    1. Eric van der Vlist dit :
      
      février 26, 2012 à 9:04 am
      
      The idea that a dictionary is hidden behind an array does seem unusual to me, but I don’t see any practical downside!
      
      These arrays would still meet the definition given in wikipedia and would be similar to PHP arrays (except of course that in PHP they are ordered).
      
      Répondre
Michael Kay dit :

février 25, 2012 à 7:30 pm

We’ve had a working party looking at requirements and use cases for maps for the last several months, and these issues have been hotly debated. There’s a case for making maps look more like nodes (with identity semantics) and there’s a case for making them behave more like sequences (where it’s not meaningful to ask whether (1,2,3) and (1 to 3) are « the same sequence »). You seem to have come down in favour of the former but it’s not entirely clear why. In a functional language, identity gets in the way and prevents optimizations. Having parent navigation in a nested map seems to me very undesirable indeed: one of the aims of introducing maps is to have a lighter-weight data structure than XML node trees, and it’s identity and parent pointers that make the XML trees so heavyweight, with a frequent need for otherwise-unnecessary copying of subtrees.

Répondre
1. Eric van der Vlist dit :
  
  février 25, 2012 à 7:46 pm
  
  Hi Michael,
  
  Of course it’s a matter of requirements.
  
  If the requirements are to have « a lighter-weight data structure than XML node trees » then your proposal is very fine indeed.
  
  But is the requirement is to embrace JSON so that you do not have to say « Oh dear, yet another JSON-to-XML mapping coming… » I think that you need to provide the same full range of features over maps than we have over XML nodes.
  
  Eric
  
  Répondre
Jakub Malý dit :

février 28, 2012 à 1:28 pm

Coming back to this – can you explain more what you mean by full citizenship. Do you propose using axes to iterate items in a map? And for that you would like to use the existing axis like child etc.? E.g. using parent axis for a member of a map should return the map itself?

Répondre
1. Eric van der Vlist dit :
  
  février 28, 2012 à 3:27 pm
  
  Yes, axes are a fundamental feature of XPath and I think we need them for maps too if we’re serious about supporting JSON.
  
  Now, these axes need to be carefully defined. They are similar to the axes that operate on XML nodes, but some of them are meaningless because map entries are not ordered and there is no notion of following or preceding.
  
  The syntax can’t be exactly the same either: node names and map keys have different lexical and value spaces.
  
  That being said, the child and parent axes would be very similar and, yes, the parent of a map entry would be the map itself.
  
  Eric
  
  Répondre
  1. Jakub Malý dit :
    
    février 28, 2012 à 9:36 pm
    
    So, you consider parent and child axes. In that case, I will definitely occupy the other camp, because what you are suggesting, after inserting to a map, the item looses its place in the source tree.
    
    I like maps being similar to sequences. Both can be used as temporary collections, it is possible to ask for the item’s parent/sibling/descendant etc. even after the item was inserted to the collection.
    
    And I suppose that your approach would also result in copying of large pieces of the tree (at least that is what I think XSLT processor does, when you put an item under a new parent). If that is true, in the extreme case, e.g. when I put a root node to the map, it would make the memory footprint significantly bigger. (But maybe some XSLT processor optimize this scenario, that is a question for other debaters.)
    
    Can you think of a use case for your semantics of maps, which can not be achieved with current maps or temporary documents?
    
    Répondre
    1. Eric van der Vlist dit :
      
      février 28, 2012 à 9:51 pm
      
      I am writing a new post to try to clarify these aspects!
      
      Répondre
Ping : Fleshing the XDM chimera | Eric van der Vlist

13 thoughts on “XDM Maps should be first class citizens”

Laisser un commentaire Annuler la réponse