Introducing χίμαιραλ (chimeral), the Chimera Language

In the presentation I gave at XML Prague 2012 (see my paper), one of my conclusions was that the XML data model extended by the XPath/XQuery/XSLT Working Group to embrace other data models such as JSON was an important foundation of the whole XML ecosystem.

In her amazing keynote, Jeni Tennison warned us against chimeras,  “ugly, foolish or impossible fantasies” and I have thought that it would be useful to check to which extent the XPath/XQuery/XSLT 3.0 data model (aka XDM 3.0) deserves to be called a chimera.

The foundation of this data model is the XML infoset, but it also borrows informations items from the Post Schema Validation Infoset (the [in]famous PSVI) and adds its own abstract items such as sequences and, new in 3.0, functions and maps (needed to represent JSON objects).

I started to think more seriously about this, doing some researches and writing a proposal for Balisage and my plan was to wait until the conference to publish anything.

One of the things I planed to present is a simple XML serialization format for the XDM. My initial motivation to propose such a format was to have a visualization of the XDM: I find it difficult to represent it if its instances stay purely abstract and can’t be serialized and deserialized.

Working on this, I have soon discovered that this serialization can have other concrete benefits: the items that have been recently added to the XDM such as maps and even sequences are not treated as first class citizens by XPath/XQuery/XSLT and the data model can be easier to traverse using its serialization!

When for instance you have a complex map imported from JSON by the brand new parse-json() function, you can’t easily apply the templates on the map items and sub items. And of course, with a XML serialization that becomes trivial to do.

If such a serialization can be useful, there is no reason to wait until Balisage in August to discuss it and I’d like to introduce the very first version of  χίμαιραλ (chimeral), the Chimera Language.

The URL itself http://χίμαιραλ.com is a chimera composed of letters from two different alphabets and merging concepts from two different civilizations!

This first version is not complete. It already supports rather complex cases, but I need to think more how to deal with maps or sequences of nodes such as namespace nodes or attributes.

So far I am really impressed by XPath 3.0 but also surprised by many limitations in term of reflexion:

  • No built in function to determine the basic type of an item (node, attribute, sequence, map, function, …).
  • The dm:node-kind()accessor to determine the type of a node is abstract and XPath 3.0 does not expose it.
  • The behavior of the exslt:object-type() function is surprising.

I may have missed something, but in practice I have found quite difficult when you have a variable to browse its data model.

The other aspect that I don’t like in XPath/XQuery/XSLT 3.0 is the lack of homogeneity between the way the different types of items are manipulated. This strengthen the feeling that we have a real chimera!

In XSLT for instance, I’d like to be able to apply templates and match items in the same way for any item types. Unfortunately, the features that are needed to do so (node tests, axis, …) are reserved to XML nodes. I can’t define a template that matches a map (nor a sequence by the way), I can’t apply templates over map items, …

It may be too late for version 3.0, but I really think that we should incorporate these recent additions to make them first class citizens!

Going forward, we could reconsider the way these items mix and match. Currently you can have sequences of maps, functions, nodes and atomic values, maps which values are sequences, functions, nodes and atomic values but nodes are only composed of other nodes.  Even if the XML syntax doesn’t support this, I would really like to see more symmetry and be able to add sequences and maps within nodes!

In other words, I think that it would be much more coherent to treat maps and sequences like nodes…

Note: The χίμαιραλ website is currently “read only” but comments are very welcome on this blog of by mail.

 

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

12 thoughts on “Introducing χίμαιραλ (chimeral), the Chimera Language”

    1. Hi Jirka,

      We have function(), but map() seems to be missing, maybe because it’s a more recent addition.

      I have not found a way to test if a variable is a sequence (or a nodeset) either.

      I thought the review period had expired for these specs… Is it still time to submit feedback?

      Thanks,

      Eric

      1. Unfortunatelly latest XSLT 3.0 is draft is still inside working group, publicly available draft is rather old and doesn’t cover maps.

        Mike Kay published section about maps at his blog: http://dev.saxonica.com/blog/mike/2012/01/#000188
        There you can see that map is added as an additional item type.

        $var instance of node()* should test whether $var is nodeset.

        Feedback can be send (and is welcomed) at any time. For some spec you will miss deadline for last-call comments, but this only means that your comments doesn’t have to be processed as quickly as those received during LC period. But they will processed.

        1. Jirka,

          Saxon 9.4 doesn’t seem to recognize the map item type, I’ll report a bug for that.

          node()* doesn’t differentiate a nodeset from a sequence of nodesets… Do you know if/how I can differentiate these two item types?

          I’ll have to do some more researches and I’ll see what feedback I can send to the working group!

          Thanks for your help.

          Eric

  1. Eric, much of what you say is valid but unfortunately you don’t have the full picture as to what’s in 3.0 and what isn’t. Nor does anyone else, because it’s not finished yet. XQuery 3.0 and XPath 3.0 have gone to Last Call without support for maps, though they are wanted for XSLT 3.0; there are issues here that the working groups still have to work through.

    We’ve done a lot of work in XSLT that isn’t published yet. It’s been a long time since a published working draft; the blame for that rests to a considerable extent on me, though it’s a reflection of the capabilities/resources of the working group as a whole that we’re moving so slowly.

    We have done work to generalize patterns and allow apply-templates to be applied to any item. In the current model, however, the entries in a map are not items, so there’s limited scope for doing apply-templates over the entries in a map. It would be interesting to see a use case where this would be useful.

    Using non-ascii characters in a URL is brave. I was able to click on your link in Safari, though the URL then displayed in my browser is http://xn--kxaea5alc1b5b.com/. I’ve been wanting for years to use non-ASCII characters for new operators in the XPath grammar, but seeing the problems people have with transcoding errors in XML/HTML, and the problem we’ve had recently with transcoding errors moving Java code from one SVN repository to another, I fear it’s still a risky thing to do.

    As to the detail of your proposal, I’m not sure why you’ve made some of the choices you have. For example, your representation of a value (in a key-value pair) containing a single string is very different from the representation of a value containing two strings, which would seem to make programming against the model unnecessarily complicated.

    1. Michael,

      I think you have nailed down the reason of my uneasiness with maps: shouldn’t map entries be considered as items?

      A use case would be, of course, someone wanting to write a transformation to serialize a map like I am doing here.

      Regarding non ascii characters, I don’t know if I would recommend to the W3C (OTH, the W3C has been preaching I18N for some time now and using non ascii characters would be a good example!) but for a project that will probably only interest a few geeks the risk seemed to be limited!

      You’re right that I moved away from the principle that there is no distinction between a singleton sequence a an single item in my serialization…

      Would the following be closer to the principles of the map data model as you’re designing it:

      <map>
      <entry key=”a key value” keyType=”the key type”>
      <item type=”string”>foo</item>
      </entry>
      </map>

      ???

      When the entry is a sequence, the item element could just be repeated and the processing would become similar from single values and sequences.

      Thanks for your feedback.

      Eric

  2. I hadn’t made the link before, but now you talk about serializing JSON (amongst others) made me think about the presentation of Steven about converting JSON to XML back and forth to make them accessible within XForms. Making JSON object first class citizens makes this unnecessary.

    But perhaps that is a bit optimistic, since XForms often depends on XML standards support by browsers. Anyhow, would be curious to see how well JSON mapping in XForms works. Would be a clear case wether making JSON and alike first class citizens is worth the effort.

    Cheers

Leave a Reply to Jirka Kosek Cancel reply

Your email address will not be published. Required fields are marked *

Enter your OpenID as your website to log and skip name and email validation and moderation!