XML Amsterdam

XML Amsterdam 2015 is over

xmlamsterdam

This year again, XML Amsterdam did connect XML developers worldwide and it’s time to post the links to my presentations.

Many thanks to everyone involved in making this event happen!

XForms Generation (XForms pre-conference day).

Testing with XForms Unit (XForms pre-conference day).

Backtracking and XPDL (conference day)

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

The adventures of XSLT 2.0 in Browserland

See also:

Introduction

The purpose of my presentation at XML Amsterdam was to explore what might be done  with Saxon-CE in the domain of forms applications.

I won’t repeat here what I said during the presentation but focus on a few key points and give as much pointers as possible to explore if you want to go further.

For those of you who have missed my presentation or would like to hear it again, the next best thing is the screencast of my last rehearsal before the event:

The screencast of the last rehearsal of my presentation at XML Amsterdam

The presentation itself is powered by Saxon-CE and the best way if you want to try it by yourself is to download or clone it on our GitLab server.

An alternative is to browse it directly from this server. Note however that serving raw resources through GitLab is not very efficient and that the response times will be slow.

You may notice differences between the screencast, what I have presented, the demo showed by Geert Josten at the DemoJam and the latest version on GitLab.

A look at the network of changes should help you to understand what has happened! The version that I have presented is tagged as xmlamsterdam2013. The version used for the screencast was the previous one, before adding a page with metrics. I have wanted to play it safe and the features needed to implement Geert’s calculator have been developed separately in the grtjn branch which has been merged after the conference into the master branch…

Impedance mismatches

A key point to manipulate XML documents based on user interaction is the ability to store these documents in the browser between user actions and I have spent height slides, between slide 13 “Storing instance in an XML hostile environment” and slide 20 “Storing XML in the global (window) object” to explore different options.

It’s time to explain the issues we have to face to be able to store instances as JavaScript properties.

While Saxon-CE is fine within browsers and does a good job of acting as a good citizen speaking JavaScript almost as it if was its native language, there are still a number of glitches that reminds us that its isn’t really the case…

Among them, many castings which are implicit in XSLT don’t work when accessing JavaScript object properties or methods.

In XSLT, we use to write things such as:

    <xsl:template name="init">
      <xsl:variable name="element" as="element()">
        <element>foo</element>
      </xsl:variable>
      <div><xsl:value-of select="concat('Element value: ', $element)"/></div>
    </xsl:template>

In the <xsl:value-of/> instruction, the second parameter to the concat() function is an element node which as been implicitly casted into a string.

As a naive user, I was thinking that I could do the same when calling JavaScript methods, for instance:

      <xsl:variable name="element" as="element()">
        <element>foo</element>
      </xsl:variable>
      <div><xsl:value-of select="js:alert($element)"/></div>

But the browser strongly disagreed:

SaxonCE.XSLT20Processor 20:25:07.774
SEVERE: XPathException in invokeTransform: Cannot convert class client.net.sf.saxon.ce.tree.linked.ElementImpl to Javascript object

This message comes to remind you that two type of incompatible objects live within Saxon-CE and that native Java classes cannot always be converted into JavaScript objects!

In this simple case we can explicitly cast the element to a simple type which will be compatible with JavaScript object system:

      <xsl:variable name="element" as="element()">
        <element>foo</element>
      </xsl:variable>
      <div><xsl:value-of select="js:alert(string($element))"/></div>

That’s more difficult if you really need to pass an XML node as a JavaScript parameter or property. For instance if I want to save this element as a property, this will raise the same kind of errors:

      <xsl:variable name="element" as="element()">
        <element>foo</element>
      </xsl:variable>
      <div><ixsl:set-property name="element" select="$element"/></div>

Here I need to convert the element node into a JavaScript object and I would even say that ideally this conversion should be implicit.

The bad news is that Saxon-CE does not provide a function to perform this conversion.

The other bad news is that if you thought you could use Saxon-CE JavaScript js:Saxon.serializeXML() and js:Saxon.parseXML() to perform that task by writing:

      <xsl:variable name="element" as="element()">
        <element>foo</element>
      </xsl:variable>
      <div><ixsl:set-property name="element" select="js:Saxon.parseXML(js:Saxon.serializeXML($element))"/></div>

you’re out of luck: js:Saxon.serializeXML() is a JavaScript wrapper to a native JavaScript function that won’t accept your XML node as an argument.

Michael Kay has implemented an ixsl:serialize-xml() function in branch ce1.1plus but unfortunately this function works fine with Saxon’s tiny tree nodes but silently fails when its parameter is a JavaScript DOM object. Furthermore, ixsl:serialize-xml() does not always preserve comments and PIs.

This conversion is one of the basis of most of the things I have shown during my presentation and I ended up writing a “d:serializeXML()” function, based on Evan Lenz’ xml-to-string.xsl and wrap this function in a js:Saxon.parseXML() call to provide the d:convert-to-jsdom() function that I needed:

      <xsl:variable name="element" as="element()">
        <element>foo</element>
      </xsl:variable>
      <div><ixsl:set-property name="element" select="d:convert-to-jsdom($element)"/></div>

This is probably a most inefficient way of performing the conversion. A slightly more efficient way could be to write an XSLT implementation of this function that would use JavaScript methods to create the DOM tree from within XSLT templates rather than serializing to parse again.

At the end of the day, I think that Saxon-CE should really provide a built-in method to perform the conversion and that this conversion should be implicit!

See also:

One transformation is not enough!

 When you love you don’t count the cost

Saxon-CE analyses the page to find the first script element with @type='application/xslt+xml' and @language='xslt2.0' in the HTML page to create an XSLT transformer on which it calls the updateHTMLDocument() method.

This a great idea and that’s perfect to provide an easy way to run a transformation acting as a set of event handlers on the page!

However, why should us be limited to a single transformation?

In my presentation I have found out that I had several disjoint set of features to implement on each slide and in such case I much prefer to write a transformation for each set of features.

The bad news is that Saxon CE won’t do that automatically for you.

The good news is that you can do that using the JavaScript API provided by Saxon-CE. And since you can call JavaScript methods in XSLT you can even do that in XSLT.

That’s what I did in my presentation: the first transformation which is automatically invoked by Saxon-CE is called boot-saxon.xsl. This transformation matches the following scripts linking to XSLT 2.0 transformations, creates an XSLT transformer and invokes the updateHTMLDocument() method…

If you look at this transformation you’ll see that in addition it can also invoke dynamic transformations (transformations which are created by transforming the web page into XSLT). This is a feature I use in my proof of concept of an XForms implementation.

Multiple transformations are a great feature and I think they would really deserve to become a standard feature in Saxon-CE.

Dynamic transformations are probably much more a niche thing and I am not sure they should be integrated to the product.

See also:

The beauty of XSLT as an event handler self-modifying document

updateHTMLDocument() is a misnomer

Beside bringing XSLT 2.0 to the browser, I think that the main innovation of Saxon-CE is the way it uses XSLT to define event handlers which update the current page based on interaction events.

In slide 26, “A dream come true”, I am using this method on an XSLT transformation run against an XML fragment and as you can see it seems to be working pretty well!

Why would we need to run this method on anything else than the HTML page?

This was one of the things I have explained in detail during my presentation: I think that the new paradigm introduced by Saxon-CE to use XSLT “transformations” as even handlers able to update document fragments is really powerful and has use cases well beyond “typical” Saxon-CE applications.

The purpose of this slide is to show how nice it would be to run a client side MVC application implemented as two separate transformations communicating through events:

  • a first one acting on the page, reacting to user interactions
  • a second one acting on an instance and controlling its updates.

I see no reason why this shouldn’t be extended to other use cases.

For instance, such a feature would open interesting perspective in XML databases where XSLT transformations could be used to define event handlers on XML nodes…

To come back to Saxon-CE, I think that both the name of the method and the documentation are misleading:

  • updateHTMLDocument() could be renamed updateDocument()
  • The description of the method’s $target parameter should read “The Document object to update” instead of “The HTML Document object to update”.

Events seem (too) strongly limited

This proposal to generalize the usage of the updateHTMLDocument() method assumes that events can be used on XML documents.

Unfortunately the usage of events in Saxon-CE seem to be strictly limited.

The documentation differentiates two different types of events:

  • User input events: Event handlers for user input are written in the form of template rules. The match pattern of the template rule must match the element that receives the event, and the mode name reflects the type of event, for example ixsl:onclick.
  • Client system events: Saxon-CE also handles events raised by objects such as window that live outside the DOM. Event handlers for such objects are written in the form of template rules. The match pattern is different from that for conventional templates because there is no node to match. Instead, the pattern must be an ixsl function (e.g. ixsl:window() ) that returns the object whose event is to be handled.

Unfortunately both types of events suffer important restrictions:

  1. User input events seem limited to a subset of standard HTML events as listed for instance by MDN. This limitation isn’t enforced by Saxon-CE itself but the fact is that most events are just ignored. This includes not only any custom event but events such as the input event. If you define a template with a mode matching an event which is not supported (such as ixsl:oninput), Saxon-CE will silently ignore this definition and won’t define any event handler.
  2. Furthermore, when using updateHTMLDocument() with an XML document as a target Saxon-CE ignores any event handler that you define through ixsl:onXXX modes.
  3. Client system events on the contrary do support any type of event. However I have never been able to use them with any JavaScript object except the global window object!
  4. Custom events sent to the window object using standard dispatch methods do not seem to work and it is safer to directly call the event handler than to dispatch events.

A consequence of the first limitation is that I haven’t found any way to implement the equivalent of XForms incremental mode. This should be trivial to implement using input events, I have checked that these events are fired synchronously when  an input is changed but since Saxon-CE doesn’t install any event handler for them there seems to be no way to catch them.

Another consequence is that as you may have noticed that in hello-world-2xslt.xsl we define an event handler on a change event for the output:

    <!-- The output needs to be updated -->
    <xsl:template match="div[@id='output']" mode="ixsl:onchange">
        <xsl:message>The output needs to be updated</xsl:message>
        <xsl:variable name="instance" select="ixsl:get(ixsl:window(), 'instance')"/>
        <xsl:result-document href="#output" method="ixsl:replace-content">
            <xsl:text>Hello </xsl:text>
            <xsl:value-of select="$instance/data/person-given-name"/>
            <xsl:text>. We hope you like Saxon CE!</xsl:text>
        </xsl:result-document>
    </xsl:template>

You may wonder how a change event can be sent to an HTML <div/>. In fact this event is sent as a custom event by the transformation implementing the model and it would have been much less confusing to use a custom name but because custom events on DOM nodes are ignored by Saxon-CE I had to hijack an existing HTML event and have chosen one that should never be fired to avoid and conflict with user interaction.

At that point you probably wonder how slide 26 can work if we can’t define event handlers for XML nodes (this is the second restriction listed above).

I had to find a workaround and ended up simulating proper event handling in functions defined in events.xsl.

The most controversial pieces in this library are probably:

  <xsl:function name="d:new-custom-event">
    <xsl:param name="name" as="xs:string"/>
    <xsl:variable name="js-statement" as="element()">
      <root statement="new CustomEvent ( '{$name}',{{ detail: {{}} }})"/>
    </xsl:variable>
    <xsl:sequence select="ixsl:eval(string($js-statement/@statement))"/>
  </xsl:function>

  <xsl:function name="d:dispatch-event-to-instance">
    <xsl:param name="target"/>
    <xsl:param name="event"/>
    <ixsl:set-property object="$event" name="detail.target" select="$target"/>
    <ixsl:set-property name="dummy" select="ixsl:call(ixsl:window(), 'onEventWorkaround', $event)"/>
  </xsl:function>

Instead of creating an event with the type defined as a parameter, d:new-custom-event() creates an event and stores the type as a property.

Instead of really dispatching an event to a target, d:dispatch-event-to-instance() stores the target as a property of the event and performs a direct call to the onEventWorkaround() event handler on the window object.

This event handler is defined as:

  <xsl:template match="ixsl:window()" mode="ixsl:onEventWorkaround">
    <xsl:message>Got en event</xsl:message>
    <xsl:variable name="event" select="ixsl:event()"/>
    <xsl:choose>
      <xsl:when test="ixsl:get($event, 'type') = 'ModelUpdate'">
        <!--<xsl:apply-templates select="ixsl:get(event, 'detail.target')" mode="d:onModelUpdate">-->
        <xsl:apply-templates
          select="ixsl:get(ixsl:window(), 'instance')//*[d:path(.) = d:path(ixsl:get($event, 'detail.target'))]"
          mode="d:onModelUpdate">
          <xsl:with-param name="event" select="$event"/>
        </xsl:apply-templates>
      </xsl:when>
    </xsl:choose>
  </xsl:template>

This single event handler serves as a channel for events sent to the XML instance. To simulate the standard behavior of event handling in Saxon-CE, it applies the templates on the target passed as an event property. The biggest downside of this workaround is probably that this isn’t really extensible and that new <xsl:when/> need to be added for each supported event!

In find these four issues really problematic and think they deserve to be further analyzed and fixed!

See also:

Side effect

This point is briefly mentioned in slide 23, “Side effect”.

XSLT has been designed to be free of side effect and to develop this kind of user interface we rely very heavily on side effect.

Among other side effects, we often set properties on JavaScript objects.

In hello-world-mvc.xsl for instance (used in slide 21, “Complete example (when all we have is a hammer…)”) we write:

        <!-- Store the new value -->
        <ixsl:set-property object="ixsl:window()" name="instance"
            select="d:convert-to-jsdom($instance)"/>
        <!-- Rely on the instance to write the output -->
        <xsl:result-document href="#output" method="ixsl:replace-content">
            <xsl:text>Hello </xsl:text>
            <xsl:value-of select="$instance/data/person-given-name"/>
            <xsl:text>. We hope you like Saxon CE!</xsl:text>
        </xsl:result-document>

XSLT being side effect free, there is nothing in the recommendation that insures that the <xsl:result-document/> will be executed after the <ixsl:property/>.

It happens to work but we need to realize that we are here on the edges of what should be done with XSLT and, of course, hope that Saxon-CE won’t break our assumption that in most of the cases XSLT instructions will be executed in document order!

 

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Ittoqqortoormiit

Ittoqqortoormiit, Juillet 2058

La Gazette d’Ittoqqortoormiit : Professeur Sean Lafontaine, vous êtes responsable du laboratoire de technicologie de l’université d’Ittoqqortoormiit. Pouvez vous nous expliquer en quoi consiste votre activité et quelles sont ses particularités ?

Sean Lafontaine : La technicologie est la branche de l’archéologie consacrée à la civilisation technicienne. Bien que cette civilisation soit beaucoup plus récente que celles qui sont étudiées par les autres branches de l’archéologie les matériaux disponibles posent de tels problèmes d’interprétation qu’une véritable approche scientifique de type archéologique est nécessaire.

LGI : Quel type de problèmes ?

SL : À la fin du siècle dernier cette civilisation a amorcé un processus de « dématérialisation » de sa production intellectuelle. Cette dématérialisation se traduisait par un stockage dit « électronique » des documents (écrits, photos, sons, films, …). Les technologies employées pour ce stockage étaient variées mais dans tous les cas il était nécessaire de disposer de compétences technologiques et d’énergie électrique pour accéder aux informations « dématérialisées ».

LGI : Autant dire en effet que l’accès à ces ressources est désormais réservé à des spécialistes ! Comment faites vous pour résoudre ces problèmes ?

SL : Notre laboratoire a l’autorisation de produire, de manière strictement contrôlée bien entendu, l’énergie électrique dont il a besoin pour poursuivre ses recherches. En ce qui concerne les connaissances technologiques, c’est une discipline à part entière, au même titre que la sigillographie ou l’épigraphie pour prendre deux exemples plus connus. En technicologie nous avons la chance de pouvoir nous appuyer sur des témoins vivants ayant utilisé ces technologies. Ces survivants sont peu nombreux mais leurs témoignages ont été précieux pour établir les bases de notre discipline.

LGI : Parlez nous de cette mystérieuse boîte noire découverte le mois dernier…

SL : Nous avons découvert une petite boîte parallélépipédique, de 101 mm de largeur, 115 mm de longueur et 27 mm d’épaisseur, pesant moins de 400g dans une ruine située en zone rurale au nord ouest de la diagonale sèche. Cette boîte métallique portait des signes suggérant une utilisation de type « dématérialisation ».

LGI : Pouvez vous préciser quels sont ces signes pour que nos lecteurs puissent vous alerter s’ils trouvaient des objets de ce type ?

SL : Les zones que nous habitons maintenant étaient quasiment désertes avant le changement climatique et il est peu probable d’y trouver de tels objets, mais il peut tout de même être utile que vos lecteurs sachent reconnaître ces objets… Leur formes et leur dimensions sont très variées et la manière la plus fiable de les identifier est la présence de « connecteurs » . Les connecteurs sont de petits trous souvent rectangles ou ronds et dans lesquels on distingue parfois des éléments en cuivre.

LGI : Merci professeur. Je peux maintenant vous poser la question qui me brûle les lèvres… Cette boîte noire livrera t-elle ses secrets ?

SL : Oui, et ils sont fabuleux ! Hier notre équipe a enfin réussi à la faire parler. Il nous faudra plusieurs mois pour exploiter les documents qui nous sont dévoilés. Nous avons déjà identifié plus de 100 000 lettres électroniques, 50 000 photos, plusieurs heures de films mais ce n’est qu’une infime partie du contenu de la boîte et il y a beaucoup documents dont nous ne comprenons pas encore la nature.

LGI : Autant de documents dans un si faible volume !

SL : Oui, nous estimons que l’on pouvait stocker plus de mille milliards de caractères dans cette boîte.

LGI : Les prouesses techniques de cette civilisation sont vraiment surprenantes, quel dommage qu’elle n’ai pas su maîtriser les utilisations de ses technologies !

SL : L’historien que je suis se garde bien de porter un jugement de valeur sur l’objet de ses recherches, mais je dois avouer que je suis effectivement souvent surpris par les performances des objets que j’étudie.

LGI : Pour obtenir de telles prouesses, cette boîte noire devait produire beaucoup de CO2 !

SL : Pas de manière directe, non. Nos mesures montrent qu’elle consomme moins de 10W, ce qui représente, même avec un mode de production d’électricité à forte émission en CO2, moins de 250 g de CO2 par jour (quatre fois moins que la respiration humaine). Par contre pour être utile cette boîte a besoin d’être connectée à un ensemble d’équipements (pour visualiser les informations, les échanger avec d’autres machines, …). Si l’on considère l’ensemble de ces équipements, la production de CO2 est considérable.

LGI : Sait on à quoi servait cette boîte noire ?

SL : Nous pensons qu’elle servait d’archive. Les données matérialisées devaient être dupliquées et entreposées dans plusieurs équipements pour survivre à la destruction de l’un d’entre eux.

LGI : Dans ce cas, c’est un peu comme les livres dont la pérennité est assurée par la dissémination ! Que faisait cette boîte dans une zone rurale ?

SL : Nous pensons que son propriétaire avait deux résidences, une en ville et une à la campagne et qu’il avait des archives dans chacune de ses résidences.

LGI : Malin… Mais vous avez donc pu identifier son propriétaire ?

SL : Il s’appelait Éric van der Vlist et nous cherchons à cerner ses habitudes et sa personnalité.

LGI : Vous avez donc suffisamment d’informations pour cela ?

SL : Oui, nous avons retrouvé beaucoup de messages dont certains sont très personnels ainsi que des photos.

LGI : Quel était son métier ?

SL : Il travaillait dans le domaine de la dématérialisation de l’information. Les machines avaient pris une telle importance que les documents étaient écrits de manière à être compréhensibles par les machines avant de l’être par l’homme ce qui complique d’ailleurs notablement leur compréhension. Pour cela il existait plusieurs vocabulaires concurrents. Eric van der Vlist était spécialiste d’un de ces vocabulaires appelé XML. Son activité professionnelle principale était de conseiller des entreprises dans l’utilisation de ce vocabulaire. C’était un expert reconnu et il se déplaçait en avion pour rencontrer ses pairs lors de grandes réunions appelées conférences.

LGI : En avion ? Pour un motif aussi futile ? Il n’avait donc aucune conscience des enjeux environnementaux ?

SL : Effectivement, il faisait chaque année un voyage à Montréal (1400 kg de CO2) et un à Prague (300 kg de CO2) mais c’était une pratique courante à l’époque et ces réunions regroupaient des centaines de personnes. Pour en revenir à votre question, ce qui rend notre découverte particulièrement intéressante, c’est que contrairement à ce que vous pouvez penser, Eric avait un niveau de conscience des enjeux environnementaux plutôt élevé par rapport à ses contemporains.

LGI: Ce niveau de conscience ne le rend il pas d’autant plus coupable ?

SL : Mon activité est une activité d’historien et il ne m’appartient pas de juger les comportements. Par contre l’étude de sa personnalité peut nous aider à comprendre pourquoi cette civilisation a été incapable de stopper et même de freiner la catastrophe écologique qu’elle alimentait et dont elle avait conscience.

LGI : Et ce van der Vlist vous semble donc un sujet intéressant de ce point de vue ?

SL : Indéniablement. Notamment parce qu’il est né en 1958, au moment charnière où la civilisation technicienne a semblé perdre le contrôle de sa propre évolution. Sa vie coïncide avec le dernier acte de cette civilisation.

LGI : Si je comprend bien, c’est un témoin privilégié de la chute de la civilisation technicienne.

SL : Oui, mais malheureusement il semble avoir laissé peu de textes liés à l’environnement et son journal contient surtout des textes liés à son activité technicienne.

LGI : Comment savez vous qu’il avait conscience des enjeux environnementaux ?

SL :Il a tout de même laissé quelques textes sur ce thème ainsi qu’une courte auto biographie écrite à l’occasion de ses cinquante ans. Il y explique notamment qu’il a été impliqué dès 1981 à un programme de mesure du taux de CO2 dans l’atmosphère et qu’il a pris conscience des risques de réchauffement climatique dès cette période.

LGI:Le milieu scientifique était donc au courant dès 1981 ?

SL : Bien entendu. Si nous disposons de peu de documents de la période de dématérialisation massive, nous avons paradoxalement beaucoup plus de documents des périodes plus anciennes et nous avons retrouvé des travaux datant de 1895 établissant un lien entre la température moyenne sur notre planète et la concentration en C02 de son atmosphère.

LGI:1895 ? Vous voulez dire 1985 ?

SL : Non, c’est bien à la fin du dix neuvième siècle, en 1895 que ces travaux ont été publiés. Ils sont probablement passés quelque peu inaperçus et ce ne sont pas ces travaux qui ont valu son prix Nobel de Chimie à Svante Arrhenius.

LGI: Nos prédécesseurs connaissaient donc parfaitement les risques qu’ils courraient ?

SL : Dans les milieux scientifiques concernés, incontestablement. Il faudra pourtant attendre 2001 et le rapport du GIEC pour que ce sujet soit réellement porté à l’attention du grand public.

LGI : 2001 ! Que de temps perdu ! Plus d’un siècle après la découverte de l’influence de l’effet de serre sur le climat ! Vingt ans après que ce van der Vlist ait fait ses mesures de CO2 ! Qu’a t-il fait pendant ces vingt années pour prévenir ses contemporains ?

SL : Il explique dans son autobiographie qu’il a hésité mais qu’il a abandonné sa carrière de chercheur pour une activité technique mieux rémunérée. Il s’est également marié et au eu quatre enfants.

LGI : Quatre enfants ! N’avait il pas conscience du niveau de surpopulation de l’époque ? Combien d’habitants comptait notre planète à ce moment ?

SL : Environ cinq milliards et demi lorsqu’ils ont eu leur dernier enfant. Pour comprendre son comportement, il faut savoir que le monde était divisé en « pays » qui se concurrençaient. Eric vivait dans un pays à la population vieillissante qui encourageait les naissances pour conserver un niveau de population stable.

LGI : Eric van der Vlist fait donc des mesures de CO2 et prend conscience des risques de réchauffement climatique en 1981 sans que cela ne change son comportement ? C’est invraisemblable !

SL : Son comportement a été influencé par cette prise de conscience mais dans des proportions qui sont restées limitées.

LGI : Que faisait-il ?

SL : Quand il a quitté institut de recherche où il travaillait sur le CO2, il a rejoint une société pour travailler dans le domaine de la téléphonie mobile.

LGI : Une société ? Vous voulez dire qu’avant cela il était resté en marge de la société ?

SL : Non, les techniciens étaient organisés en groupements appelés « sociétés » ou « personnes morales » dont le but était de grossir et de gagner de l’argent et Eric a rejoint un de ces groupements.

LGI: Ces sociétés devaient être soumises à des impératifs moraux très forts pour mériter le nom de personne morale !

SL : Au contraire… Elles échappaient à tous les principes moraux ou religieux et leurs seules règles étaient d’obéir aux lois en vigueur et de gagner plus d’argent qu’elles n’en dépensaient.

LGI : Pouvez vous nous expliquer ce qu’était la téléphonie mobile ?

SL : Les techniciens avaient développé un besoin irraisonné de rester « connectés » les uns aux autres par l’intermédiaire de téléphones mobiles, qui étaient des petits boîtiers transmettant sons et images au moyen d’ondes électromagnétiques et fonctionnant à l’électricité.

LGI : Encore l’électricité… Ce besoin de se sentir connecté me semble être une curieuse manie ! Eric van der Vlist en était donc à la fois un des initiateurs et un adepte ?

SL : Avant d’en souligner les effets pervers, oui. Mais il ne faut pas exagérer son rôle dans le développement de cette technologie. Il a en effet changé de domaine d’activité alors qu’elle était encore balbutiante.

LGI : Il a donc rapidement perçu le caractère pathologique de ce type de comportement ?

SL : Non, son texte sur les effets pervers du téléphone mobile est assez tardif et date de 2011. Il a changé de domaine d’activité parce que le téléphone mobile était développé par des sociétés multinationales de taille importante dans lesquelles il ne se sentait pas à l’aise.

LGI : Vous nous avez expliqué que le but de ces sociétés était de grossir, les grosses sociétés devaient donc être les meilleures d’entre elles, pourquoi vouloir en partir ?

SL : La plupart des sociétés étaient des structures très hiérarchisées dans lesquelles l’initiative personnelle n’était pas encouragée et dans lesquelles la compétition était fortement encouragée.

LGI : Je pensais que la civilisation technicienne était organisée suivant des principes démocratiques, ce n’était pas le cas ?

SL : Certains pays étaient gouvernés suivant des principes démocratiques mais les sociétés n’étaient pas soumises à cette règle.

LGI : Votre homme a t-il pu échapper à leur contrôle ?

SL : Il a travaillé dans des sociétés de plus en plus petites avant créer une société uni-personnelle pour vendre le résultat de son travail. C’est à ce moment qu’il s’est spécialisé dans ce XML qui permettait d’écrire des textes compréhensibles par les machines.

LGI : Pourquoi ce choix ?

SL : Je suppose que cela devait correspondre à un sujet en vogue à cette époque. Il faut également dire que c’était un sujet « ouvert » dans lequel les contributions de petites structures indépendantes étaient bienvenues.

LGI : Il n’avait pas conscience de la fragilité des documents dématérialisés ?

SL : Si et il a même lancé un projet d’archivage de ces documents. Malheureusement, les archives étaient elles aussi dématérialisées ce qui réduit leur pérennité.

LGI : Je dirais même que ça les rend inutiles, non ?

SL : Non, pas totalement. La boite que nous venons de retrouver contient à la fois les textes d’Eric van der Vlist mais également les archives des documents auxquels il faisait référence ce qui s’avère très précieux à leur compréhension.

LGI : En tout cas, je ne vois pas ce qui évoque une conscience des problèmes environnementaux dans tout cela !

SL : En 1994 il acheté la ferme dans laquelle nous avons retrouvé la boîte noire et a tenté de vivre plus proche de la nature et de réduire son empreinte écologique.

LGI : Ce n’était pas trop tôt pour s’apercevoir qu’il fallait changer de vie !

SL : Non, mais ce changement a été partiel et il a commencé à mener une double vie : la semaine en ville consacrée à des activités techniques et les samedi et dimanche à la campagne à cultiver son jardin.

LGI : J’espère qu’il essayait au moins de minimiser le bilan carbone de ses transhumances hebdomadaires !

SL : Il n’en dit pas grand chose. D’après nos recherches, il utilisait un véhicule appelé « Renault Espace » qui rejetait près de 200g de CO2 par kilomètre et parcourait environ 350 km par semaine, soit 3500 kg de CO2 par an…

LGI : Et à part cela ?

SL : En 2007 et surtout en 2008 il a étendu son activité agricole en achetant un nouveau terrain et en plantant des arbres fruitiers. Il privilégié les variétés anciennes menacées de disparition à court terme. Il a également introduit des moutons et des oies de races à faible effectif pour entretenir ses terrains.

LGI : Cela aurait sans doute été très utile s’il s’était également préoccupé d’atténuer le changement climatique qui allait détruire ces arbres et ces animaux. Est-ce que ses arbres compensait ses émissions de CO2 ?

SL : Non. Il a planté 400 arbres pouvant capter chacun 5kg de CO2, cela ne fait guère que 2000 kg de CO2 par an, ce qui couvre juste 1700 kg de voyages en avion..

LGI : Son engagement s’arrêtait là ?

SL : Il a aussi aidé sa femme Catherine à créer deux magasins de produits dits « biologiques » cultivés et produits sans produits chimiques et avec un impact réduit sur l’environnement. Ces magasins appelés « le Retour à la Terre » essayaient de sensibiliser leurs clients et organisaient des réunions pour débattre de thèmes liés à l’environnement.

LGI : Tout cela semble ridiculement insuffisant ! On a l’impression d’une erreur d’analyse : ses archives sont dématérialisées sans qu’il ne s’interroge sur la pérennité des ressources électriques qui leur sont nécessaires de la même manière qu’il veut sauvegarder des espèces animales et végétales sans s’interroger sur la pérennité de leur écosystème dans son ensemble.

SL : Vous avez mis le doigt sur le cœur du problème et c’est ce que nous aimerions comprendre !

LGI : N’a t-il pas laissé d’indications à ce sujet ?

SL : Si, mais dans un texte difficile à analyser.

LGI : Que voulez vous dire ?

SL : Début 2013, il a écrit un texte atypique, présenté comme une fiction dans lequel il imaginait qu’on retrouve ses écrits après une catastrophe environnementale majeure.

LGI : Tient donc, il avait donc conscience du risque ! Et pourquoi ce texte est-il difficile à interpréter ?

SL : Comment distinguer la fiction du réel ?

LGI : Essayons tout de même ! Sait on ce qui a pu se passer début 2013 pour le pousser à écrire cette « fiction » ?

SL : Au tout début du vingt-et-unième siècle on a pu espérer que la raréfaction des ressources pétrolières allait contraindre la civilisation technicienne à limiter de manière drastique sa consommation d’hydrocarbures et donc sa génération de gaz à effet de serre.

LGI : Et c’est en 2013 qu’on s’est aperçu que ce ne serait pas le cas ?

SL : En tout cas, c’est fin 2012 qu’Éric van der Vlist semble en avoir pris conscience.

LGI : Que c’est il passé ?

SL : Il devenait de plus en plus évident que la civilisation technicienne allait régler, à son habitude, la raréfaction des ressources en hydrocarbures de manière purement technique sans se soucier des conséquences de cette décision.

LGI : Comment cela ? Quand il n’y a plus d’hydrocarbures il n’y en a plus !

SL : Sauf si on va chercher la moindre goutte là où elle se trouve quelque soient les conséquences. Seuls les gisements relativement faciles à exploiter l’avaient été et les techniciens ont développé des technologies permettant d’exploiter des gisements de gaz et de pétrole plus difficiles à exploiter et qualifiés de non conventionnels.

LGI : C’était possible ?

SL : Quand on est prêt à extraire du pétrole en forant le sol à plus de 1500 m sous le niveau de la mer à partir d’énormes radeaux ou à fracturer les roches jusqu’à provoquer des séismes il est toujours possible se trouver des hydrocarbures.

LGI : Ils étaient donc prêts à prendre tous les risques pour prolonger un mode de vie favorisant la production de gaz à effet de serre ?

SL : J’en ai bien l’impression, oui.

LGI : Eric van der Vlist était il isolé dans sa prise de conscience ?

SL : Pas vraiment. L’imminence d’un changement climatique était même assez généralement admise dans les pays européens.

LGI : Si c’est le cas, vu l’importance des enjeux, les hommes et femmes conscients de cette catastrophe imminente n’avaient ils pas de moyens d’actions plus efficaces ?

SL : Beaucoup de gens se sentaient impuissants à faire changer les choses.

LGI : Leur société n’était elle pas organisée de manière démocratique ?

SL : De nombreux pays étaient effectivement régis suivant des principes démocratiques, mais leur mode d’organisation ne semble pas lui avoir permis de prendre les mesures nécessaires.

LGI : Pourquoi ?

SL : C’est un sujet complexe sur lequel nous ne pouvons émettre que des hypothèses.

LGI : Lesquelles ?

SL : Les « sociétés » dont le seul but était de gagner de l’argent dont nous avons déjà parlé ont indéniablement été un des facteurs. Leur développement était tel qu’elles étaient devenues de véritables contre pouvoirs. Les plus grosses d’entre elles étaient « multinationales » et étendaient leur champ d’action sur toute la planète. Leur développement reposait sur la croissance de l’économie elle même liée à l’utilisation de ressources non renouvelables entraînant la production de gaz à effet de serre. Leur importance leur permettait d’exercer des pressions sur toutes les structures scientifiques et politiques et de bloquer les décisions qui leur étaient défavorables.

LGI : Les structures démocratiques ne donnaient elles pas suffisamment de pouvoir aux peuples pour contrer cela ?

SL : Si, mais à condition qu’ils soient correctement informés et que les choix qui leur soient présentés soient réellement ouverts.

LGI : Correctement informés ?

SL : Les sociétés contrôlaient partiellement la recherche scientifique et alimentaient des études dont le but étaient de brouiller les cartes en niant les effets de certains développements technologiques et parfois même la réalité de l’effet de serre et du réchauffement climatique.

LGI : Vous laissez entendre que les options proposées aux peuples n’étaient pas vraiment ouvertes, pourquoi ?

SL : Le pouvoir politique était exercé par une caste de professionnels dont un des buts était de se maintenir au pouvoir. Étant élus, ils ne pouvaient pas se permettre de prendre des décisions impopulaires et ceux d’entre eux qui remettaient en cause la poursuite de la « croissance » économique, nécessairement plus ou moins liée à une croissance des gaz à effet de serre étaient considérés comme des utopistes et leurs propositions écartées d’office.

LGI : Cette société est donc sciemment allé à la catastrophe pour pouvoir continuer à sur-exploiter la planète, cela semble hallucinant !

SL : C’est difficile à comprendre mais les faits sont pourtant là.

LGI : Merci Professeur. Je rappelle que cette semaine nous recevions le Professeur Sean Lafontaine, responsable du laboratoire de technicologie de l’université d’Ittoqqortoormiit.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

XML Prague 2013

XML Prague is over…


The XML Prague 2013 team

As usual it’s very difficult to go back to work after these three intense days of deeply technical friendly exchange. The feeling is well known by attendees and has been described by Alex Milowski has Post XML Prague depression!

Unlike previous years I will not publish a post covering the whole conference (I am too busy at the moment) but will rather publish several short articles on specific topics.
Stay tuned! In the meantime you may want to have a look at my pictures.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Toward χίμαιραλ/superset

Background

Up to now, my approach to explore possible solutions to support JSON in XDM has been to evaluate the proposal to introduce maps in XSLT 3.0, leading to the χίμαιραλ proposal.

Finding out that in the current XSLT 3.0 Working Draft, JSON objects would be second class XDM citizens, I think that it’s worth considering other approaches and consider Jürgen Rennau’s UDL proposal has a very good starting point.

In their comments after Rennau’s presentation, John Cowan and Steven DeRose stressed out the need to forget about the serialization and focus on the data model, at least as a first step, when proposing alternative solutions.

Following their advise, I’d like to propose an update to core XDM 3.0 that would natively support JSON objects without introducing any new item kind.

Because this proposal is a chimera and because it’s a superset of the current XDM (any current XDM instance would be compatible with the updates I am proposing), I’ll call this proposal χίμαιραλ/superset.

The XDM carefully avoids to draw class models and prefers to specify nodes through nodes kinds and accessors. However, I think fair to say that elements are defined as nodes with three basic properties:

  • A mandatory name which is a QName.
  • A map of attributes which keys are QName.
  • An array of children nodes.

On the other hand, the JSON data model is composed of arrays, objects and atomic values. JSON atomic values can be numbers, strings, booleans or null. JSON key arrays can be any atomic values.

Traditional approaches to bind JSON and XML use XML element’s children to represent JSON object properties and UDL is no different in that matter.

There are obvious good reasons to do, the main one being that XML attribute values can only be atomic values while JSON object properties can also be maps or arrays.

However, the end result is that the interface mismatch resulting from binding JSON objects (which are maps) into XML children (which are arrays) is at the origin of most of the complexity of these proposals.

Proposal ( χίμαιραλ/superset)

Since XML elements already have both a map (of attributes) and an array (of children nodes), why not use these native features to bind JSON maps and arrays and just update the data model to remove the restrictions that make attribute maps and children arrays unfit for being bound respectively to JSON objects and arrays?

A JSON object would then just be bound to an XML element with attributes and without children nodes and a JSON array would be bound to an XML element with children nodes and without attribute.

This seems especially easy if we focus on the data model and postpone the (optional) definition of a serialization syntax.

First modification: elements can be anonymous

Neither JSON objects nor JSON arrays have name but XML elements have names and this name is mandatory.

To fix this issue, element names should become optional (in other words, we introduce the notion of anonymous elements).

Second modification: attribute names should also possibly be strings, booleans or null

If we bind JSON object keys on attribute names, it should be possible to use all the basic types that JSON accept for its keys.

Additionally, we may want to consider supporting other XML Schema simple types, possibly each of them.

To make this possible, the definition of the dm:node-name() accessor should be updated to return a typed values rather than a QName. This modification should concern attribute nodes at minima but for maximum homogeneity, we should probably extend that to other node types.

Third and last modification: attributes should have (optional) attributes and children

JSON object values can be objects and arrays and since objects are bound to attributes and arrays are bound to children, attributes should support both.

Mapping JSON into χίμαιραλ/superset

With these updates, binding JSON into XML become quite straightforward:

  • A JSON object is mapped into an anonymous element without children and one attribute per key/value pair.
  • A JSON array is mapped into an anonymous element without attribute and a child element per item.

Let’s take the now famous (at least on this blog) JSON snippet borrowed from the XSLT 3.0 Working Draft:

{ "accounting" : [
      { "firstName" : "John",
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary",
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                
  "sales"     : [
      { "firstName" : "Sally",
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim", 
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

Becomes:

  • Anonymous element without children and two attributes:
    • Attribute “accounting” (as a string) with no attributes and the two following children:
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “John” (as a string)
        • Attribute “lastName” (as a string) and a value “Doe” (as a string)
        • Attribute “age” (as a string) and a value 23 (as a number)
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “Mary” (as a string)
        • Attribute “lastName” (as a string) and a value “Smith” (as a string)
        • Attribute “age” (as a string) and a value 32 (as a number)
    • Attribute “sales” (as a string) with no attributes and the two following children:
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “Sally” (as a string)
        • Attribute “lastName” (as a string) and a value “Green” (as a string)
        • Attribute “age” (as a string) and a value 27 (as a number)
      • Anonymous element with no children and the three following attributes:
        • Attribute “firstName” (as a string) and a value “Jim” (as a string)
        • Attribute “lastName” (as a string) and a value “Galley” (as a string)
        • Attribute “age” (as a string) and a value 41 (as a number)

What do you think (comments very welcome)!

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Fleshing the XDM chimera

Note: this article is derived from the presentation I have given at Balisage (precedings). See also the slides (and video) used during this presentation.

Abstract

The XQuery and XPath Data Model 3.0 (XDM) is the kernel of the XML ecosystem. XDM had been extended with foreign item types to embrace new data sources such as JSON, taking the risk to become a chimera. This talk explores some ways to move this fundamental piece of the XML stack forward.


Motivation

Chimera (mythology): The Chimera (also Chimaera or Chimæra) (Greek: Χίμαιρα, Khimaira, from χίμαρος, khimaros, “she-goat”) was, according to Greek mythology, a monstrous fire-breathing female creature of Lycia in Asia Minor, composed of the parts of multiple animals: upon the body of a lioness with a tail that ended in a snake’s head, the head of a goat arose on her back at the center of her spine. The Chimera was one of the offspring of Typhon and Echidna and a sibling of such monsters as Cerberus and the Lernaean Hydra. The term chimera has also come to describe any mythical animal with parts taken from various animals and, more generally, an impossible or foolish fantasy.
Wikipedia
Chimera (genetics): A chimera or chimaera is a single organism (usually an animal) that is composed of two or more different populations of genetically distinct cells that originated from different zygotes involved in sexual reproduction. If the different cells have emerged from the same zygote, the organism is called a mosaic. Chimeras are formed from at least four parent cells (two fertilized eggs or early embryos fused together). Each population of cells keeps its own character and the resulting organism is a mixture of tissues.
Wikipedia

During her opening keynote at XML Prague 2012, speaking about the relation between XML, HTML, JSON and RDF, Jeni Tennison warned us against the temptation to create chimeras: [chimera are usually ugly, foolish or impossible fantasies].

The next morning, Michael Kay and Jonathan Robie came to present new features in XPath/XQuery/XSLT 3.0. A lot of these features are directly based on the XQuery and XPath Data Model 3.0 (aka XDM):

The XPath Data Model is the abstraction over which XPath expressions are evaluated. Historically, all of the items in the data model could be derived directly (nodes) or indirectly (typed values, sequences) from an XML document. However, as the XPath expression language has matured, new features have been added which require additional types of items to appear in the data model. These items have no direct XML serialization, but they are never the less part of the data model.

XDM 3.0 is composed of items from a number of different technologies:

  • Items from the XML Infoset (nodes, attributes, …)
  • Datatype information borrowed from the Post Schema Validation Infoset
  • Sequences
  • Atomic values
  • Functions that can also be used to model JSON arrays
A note Note
The feature that will be introduced to model JSON arrays is called “maps” and it will be specified as a XSLT feature in the XSLT 3.0 recommendation (not published yet). The XSLT 3.0 editor, Michael Kay has published an early version of this feature in his blog. In this paper, XDM 3.0 will refer to the XSLT 3.0 data model (the XPath 3.0 data model augmented with maps).

XDM 3.0 being a single data model composed of items from different data models, it is fair to say that it is a chimera!

Following Jeni Tennison on stage, I have tried to show that in a world where HTML 5 on one hand and JSON on the other hand are gaining traction, XML has become an ecosystem in a competitive environment and that it’s data model is a major competitive advantage.

Among other factors, the continued success of XML will thus come from its ability to seamlessly integrate other data models such as JSON.

If we follow this conclusion, we must admit that this chimera is essential to the future of XML and do our best to make it elegant and smart.

XML Data Models

Whether it’s a bug or a feature could be debated endlessly, but a remarkable feature of the XML recommendation it’s all about syntax and parsing rule and does not really define a data model. The big advantage is that everyone can find pretty much what he wants in XML documents but for the sake of this paper we need to choose a well known -and well defined- data model to work on.

The most common XML data model is probably the data model defined by the trio XPath/XSLT/XQuery known as “XDM” since XPath version 2.0 and that’s the one we will choose.

XDM version 3.0, still work in progress, will be the third version of this data model. It’s important to understand its design and evolution to use its most advanced features and we’ll start our prospective by a short history of its versions.

XPath/XSLT 1.0

The XPath 1.0 data model is described as being composed of seven types of nodes (root, elements, text, attributes, namespaces, processing instructions and comments).

The XSLT 1.0 data model is defined as being the XPath 1.0 data model with:

  • Relaxed constraints on root node children to support well-formed external general parsed entities that are not well formed XML documents
  • An additional “base URI” property on every node.
  • An additional “unparsed entities” property on the root node.

It’s fair to say that these two -very close- data models are completely focused on XML, but is that all?

Not entirely and these two specifications introduce other notions that should be considered as related to the data model even if they are not described in their sections called “Data Model”…

XSLT 1.0 inadvertently mentions the four basic XPath data-types (string, number, boolean, node-set) to explicitly add a fifth one: result tree fragments”.

These four basic data-types are implicitly defined in XPath 1.0 in its section about its function library but no formal description of these types is given.

XDML 2.0: XPath 2.0/XSLT 2.0/XQuery 1.0

In version 2.0, the XDM is promoted to get its own specification.

XDM 2.0 keeps the same seven types of nodes as XPath 1.0 and integrates the additions from the XSLT 1.0 data model. A number of properties are added to these nodes to capture information that had been left outside the data model by the previous version and also to support the data-type system from the PSVI (Post Schema Validation Infoset).

The term “data-type” or simply “type” being now used to refer to XML Schema data-types, a new terminology is introduced where the data model is composed of “information items” (or items) being either XML nodes or “atomic values”.

The concept of “sequences” is also introduced. Sequences are not strictly considered as items but play a very important role in XDM. They are defined as an ordered collection of zero or more items”.

The data model is thus now composed of three different concepts:

  • nodes
  • atomic values
  • sequences

XDM 2.0 notes that an important difference between nodes and atomic values is that only nodes have identities:

Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)
XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition)

This is a crucial distinction that divides the data model into two different kind of items (those which have an identity and those which haven’t one). Let’s take an example:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <foo>5</foo>
    <foo>5</foo>
    <bar foo="5">
        <foo>5</foo>
    </bar>
</root>

The three <foo>5</foo> look similar and can be considered “deeply equal” but they are three different elements with three different identities. This is needed because some of their properties are different: the parent of the first two is <root/> while the parent of the third one is <bar/>, the preceding sibling of the second one is the first one while the first one has no preceeding sibling, …

The three “5” text nodes are similar but they still are different text nodes with different identities and this is necessary because they don’t have the same parent elements.

By contrast, the atomic values of the three <foo/> element (and the atomic value of the @foo attribute) are the same atomic value, the “5” (assuming they have all been declared with the same datatype). Among many other things, this means that when you manipulate their values, you can’t access back to the node that is holding the value).

XDM 3.0: XPath 3.0/XSLT 3.0/XQuery 3.0

A note Note
These specifications are still work on progress, currently divided between XQuery and XPath Data Model 3.0and data model extensions described in XSL Transformations (XSLT) Version 3.0.

XDM 3.0 adds functions as a third kind of items, transforming XQuery and XSLT into functional languages.

Like atomic values, functions have no identity:

Functions have no identity, cannot be compared, and have no serialization.
XQuery and XPath Data Model 3.0 – W3C Working Draft 13 December

2011

XSLT 3.0 adds to XDM 3.0 a fourth king of items: maps, derived from functions which, among many other use cases, can be used to model JSON objects:

Like atomic values and functions (from which they are derived), maps have no identity:

Like sequences, maps have no identity. It is meaningful to compare the contents of two maps, but there is no way of asking whether they are “the same map”: two maps with the same content are indistinguishable.
XSL Transformations (XSLT) Version 3.0 – W3C Working Draft 10 July 2012
A note Note
In this statement, the specification does acknowledge that sequences have no identity either. This is understandable but didn’t seem to be clearly specified elsewhere.

Of course, XSLT 3.0 is also adding functions to create, manipulate maps and serialize/deserialize them as JSON and a syntax to define map literals. It does not any new pattern to select of match maps or map entries, though.

Identity Crisis

Appolonius’ ship is a beautiful ship. Over the years it has been repaired so many times that there is not a single piece of the original materials remaining. The question is, therefore, is it really still Appolonius’ ship?
ObjectIdentity on c2.com

Object identity is often confused with mutability. The need for objects to have identities is more obvious when they are mutable, their identities being then used to track them despite their changes like Appolonius’ ship. However, XDM 3.0 gives us a good opportunity to explore the meaning and consequences of having (or not having) an identity for immutable object structures.

The definition of node identity in XDM 3.0 is directly copied from XDM 2.0:

Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)
XQuery and XPath Data Model 3.0 – W3C Working Draft 13 December 2011

I find this definition confusing:

  • Why should the value “5” as an integer be instantiated and why should we care? The value “5” as an integer is… the value “5” as an integer! It’s unique and being unique, doesn’t it have an identity?
  • A node, with all the properties defined in XDM (including its document-uri and parent accessors) would be unique if it had “previous-sibling” or “document-order” accessors.
A note Note
To find the previous siblings of a node relying only on the accessors defined in XDM (2.0 or 3.0), you’d have to access to the node’s parent and loop over it’s children until you find the current node that you would identify as such by checking its identity.

Rather than focussing on uniqueness, which for immutable information items does not really matter, a better differentiation could be between information items which have enough context information to “know where they belong” in the data model and those which don’t.

This differentiation has the benefit of highlighting the consequences of having or not having an identity: to be able to navigate between an information item and its ancestors or sibling this item must know where it belongs. When that’s not the case, it is still be possible to navigate between the item and its descendants but axis such as ancestor:: or sibling:: are not available.

A note Note
Identity can be seen as the price to pay for the ancestor:: and sibling:: axis.

Let’s take back a simple example:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <foo>5</foo>
    <foo>5</foo>
    <bar> 
        <foo>5</foo>
    </bar>
</root>

In an hypothetical data model where nodes have no identity, there would be only 3 elements:

  • The root element
  • The bar element
  • The foo element (referred twice has children of root end once as child of bar)

If we add identity (or context information) properties, the foo elements become three information different items since they defer by these properties.

The process of adding these properties to an information item looks familiar. Depending on your background, you can compare it to:

  • class/object instantiation in class based Object Oriented Programming
  • clones in prototype based Object Oriented Programming
  • RDF reification.

We’ve seen that XDM 3.0 acknowledges this difference between information items which have context information and those which don’t have. I don’t want to deny that both types of data models have their use cases: there are obviously many use cases where context information is needed and use cases where lightweight structures are a better fit.

That being said, if we are serious about the support of JSON in XDM, we should offer the same features to access data whether this data is stored in maps or in XML nodes.

Let’s consider this JSON object borrowed from the XSLT 3.0 Working Draft:

{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

This object could be represented in XML by the following document:

<?xml version="1.0" encoding="UTF-8"?>
<company>
    <department name="sales">
        <employee>
            <firstName>Sally</firstName>
            <lastName>Green</lastName>
            <age>27</age>
        </employee>
        <employee>
            <firstName>Jim</firstName>
            <lastName>Galley</lastName>
            <age>41</age>
        </employee>
    </department>
    <department name="accounting">
        <employee>
            <firstName>John</firstName>
            <lastName>Doe</lastName>
            <age>23</age>
        </employee>
        <employee>
            <firstName>Mary</firstName>
            <lastName>Smith</lastName>
            <age>32</age>
        </employee>
    </department>
</company>

The features introduced in the latest XSLT 3.0 Working Draft do allow to transform rather easily from one model to the other, but these two models do not have, bar far, the same features.

In the XML flavor, when the context item is the employee “John Doe”, you can easily find out what his department is because this is an element and element do carry context information.

In the map flavor by contrast when the context item is an employee map, this object has no context information and you can’t tell which is his department without looping within the containing map.

This important restriction is at a purely data model level. It is aggravated by the XPath syntax has not been extended to generalize axis so that they can work with maps. If I work with the XML version of this structure, it’s obvious to evaluate things such as the number of employees, the average age of employees, the number of departments, the number of employees by department, the average age by department, obvious to find out if there is an employee called “Mary Smith” in one of the departments, the employees who are more than 40, to get a list of employees from all the department sorted by age, … In the map flavor by contrast, I don’t have any XPath axis available and must do all these operations using a limited number of map functions (map:keys(), map:contains(), map:get()). In other words, while I can use XPath expressions with the XML version, I must use DOM like operations to access the map version!

To summarize, yes XDM 3.0 does support JSON but to do pretty much anything interesting with JSON objects, you’d better transform them into XML nodes first! XSLT 3.0 does give you the tools to do this transformation quite easily but the message to JSON users is that we don’t treat their data model as a first class citizen.

To make it worse, XPath is used by many other specifications, within and outside the W3C and the level of support for JSON provided by XDM and XPath will determine how these specifications will be able to support for JSON. Specifications that are impacted by this issue include XForms, XProc and Schematron. Supporting JSON would be really useful for these three specifications if and only if map items could have the same features than nodes.

Furthermore, the same asymmetry exists when you went to create these two structures from other sources: to create the XML structure you can use sequence constructors but to create the map structure, you have to use the map:new() and map:item() functions.

My proposal to solve this issue is:

  • To acknowledge the fact that any type of information item can be either “context independent” or include context information and explore the consequences of thisstatement.
  • To generalize XPath axis so that they can be used with map items.
  • To create sequence constructors for maps and map entries.

You are welcome to discuss this further:

Introducing χίμαιραλ (chimeral), the Chimera Language

When I started to work on χίμαιραλ a few months ago, my first motivation was to propose an XDM serialization for maps which would turn the rather abstract prose from the specification into concrete angle brackets that you could see and read.

The exercise has been very instructive and helped me a lot to understand the spec, however a more ambitious use pattern has emerged while I was making progress. The XSLT 3.0 Working Draft is part of a batch of Working Drafts which are far more advanced. My proposals to solve the “map identity crisis” are probably too intrusive and too late to be taken into account and the batch of specifications will most probably carry on with the current proposal.

If that’s the case, we’ve seen that it makes a lot of sense to convert maps into nodes to enable to use XPath axis and χίμαιραλ provides a generic target format for these conversions.

Example

Let’s take again the JSON object borrowed from the XSLT 3.0 Working Draft:

{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

Its χίμαιραλ representation is:

<?xml version="1.0" encoding="UTF-8"?>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
    <χ:map>
        <χ:entry key="sales" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
        <χ:entry key="accounting" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
    </χ:map>
</χ:data-model>

Granted, it’s much more verbose than the JSON version, but it’s the exact translation of the XDM corresponding to the JSON object in XML.

χίμαιραλ In a Nutshell

The design goals are:

  • Be as close as possible to the XDM and its terminology
  • Represent XML nodes as… XML nodes
  • Allow round-trips (an XDM model serialized as χίμαιραλ should give a XDM model identical to the original one when de-serialized)
  • Be easy to process using XPath/XQuery/XSLT
  • Support of the PSVI is not a goal

χίμαιραλ is not the only proposal to serialize XDM as XML. Two other notable ones are:

  • Zorba’s XDM serializationis a straight andaccurate XDM serialization which does support PSVI annotations. As a consequence, nodes are serialized as xdm:*elements (an element is anxdm:element, an attribute an xdm:attribute element, …). This does’n meet by second requirement to represent nodes as themselves.
  • XDML, presented by Rennau, Hans-Jürgen, and David A. Lee atBalisage 2011 is more than just an XDM serialization and also includes manipulation and processing definitions. It introduces its own terminology and concepts and is toofar away from XDM for my design goals.

A lot of attention has been given to the first design goal: the structure of a χίμαιραλ model and the name of its elements and attributes are directly derived from the specifications.

In XDM, map entries’ values can be arrays (an array beeing nothing else than a map with integer keys) but also sequences (which is not possible in JSON). χίμαιραλ respects the fact that in XDM there is no difference between a sequence composed of a single element and represents sequences by a repetition of values.

The map map{1:= 'foo'} is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:map>
      <χ:entry key="1" keyType="number">
         <χ:atomic-value type="string">foo</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

And the map map{1:= ('foo', 'bar')} is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:map>
      <χ:entry key="1" keyType="number">
         <χ:atomic-value type="string">foo</χ:atomic-value>
         <χ:atomic-value type="string">bar</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

We’ve seen that XDM makes a clear distinction between nodes which have identities and other item types (atomic values, functions and maps) which haven’t. XDM allows to use nodes as map entry values. χίμαιραλ allows this feature too, but copying the nodes would create new nodes with different identities.

To avoid that, documents to which these nodes belong are copied into χ:instance elements and references between map entries values and instances are made using XPath expressions.

The following $map variable:

<xsl:variable name="a-node">
    <foo/>
</xsl:variable>
<xsl:variable name="map" select="map{'a-node':= $a-node}"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4" kind="document">
      <foo/>
   </χ:instance>
   <χ:map>
      <χ:entry key="a-node" keyType="string">
         <χ:node kind="document" instance="d4" path="/"/>
      </χ:entry>
   </χ:map>
</χ:data-model>

Like XSLT variable, instances do not always contain document nodes and the following $map variable:

<xsl:variable name="a-node" as="node()">
    <foo/>
</xsl:variable>
<xsl:variable name="map" select="map{'a-node':= $a-node}"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4e0" kind="fragment">
      <foo/>
   </χ:instance>
   <χ:map>
      <χ:entry key="a-node" keyType="string">
         <χ:node kind="element" instance="d4e0" path="root()" name="foo"/>
      </χ:entry>
   </χ:map>
</χ:data-model>

Nodes can belong to more than one instances, and this $map variable:

<xsl:variable name="a-node" as="node()*">
    <foo/>
    <bar/>
</xsl:variable>
<xsl:variable name="map" select="map{'a-node':= $a-node}"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4e0" kind="fragment">
      <foo/>
   </χ:instance>
   <χ:instance id="d4e3" kind="fragment">
      <bar/>
   </χ:instance>
   <χ:map>
      <χ:entry key="a-node" keyType="string">
         <χ:node kind="element" instance="d4e0" path="root()" name="foo"/>
         <χ:node kind="element" instance="d4e3" path="root()" name="bar"/>
      </χ:entry>
   </χ:map>
</χ:data-model>

Nodes can be “deep linked”, a same node can be linked several times and nodes can be mixed with atomic values at wish. The following $map variable:

<xsl:variable name="doc">
    <department name="sales">
        <employee>
            <firstName>Sally</firstName>
            <lastName>Green</lastName>
            <age>27</age>
        </employee>
        <employee>
            <firstName>Jim</firstName>
            <lastName>Galley</lastName>
            <age>41</age>
        </employee>
    </department>
    <department name="accounting">
        <employee>
            <firstName>John</firstName>
            <lastName>Doe</lastName>
            <age>23</age>
        </employee>
        <employee>
            <firstName>Mary</firstName>
            <lastName>Smith</lastName>
            <age>32</age>
        </employee>
    </department>
</xsl:variable>
<xsl:variable name="map"
    select="map{
            'sales' := $doc/department[@name='sales'],
            'Sally' := $doc//employee[firstName = 'Sally'],
            'kids'  := $doc//employee[age &lt; 30],
            'dep-names-attributes' := $doc/department/@name,
            'dep-names' := for $name in $doc/department/@name return string($name)
            }"/>

Is serialized as:

<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
   <χ:instance id="d4" kind="document">
      <department name="sales">
         <employee>
            <firstName>Sally</firstName>
            <lastName>Green</lastName>
            <age>27</age>
         </employee>
         <employee>
            <firstName>Jim</firstName>
            <lastName>Galley</lastName>
            <age>41</age>
         </employee>
      </department>
      <department name="accounting">
         <employee>
            <firstName>John</firstName>
            <lastName>Doe</lastName>
            <age>23</age>
         </employee>
         <employee>
            <firstName>Mary</firstName>
            <lastName>Smith</lastName>
            <age>32</age>
         </employee>
      </department>
   </χ:instance>
   <χ:map>
      <χ:entry key="sales" keyType="string">
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[1]"
                 name="department"/>
      </χ:entry>
      <χ:entry key="Sally" keyType="string">
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[1]/&#34;&#34;:employee[1]"
                 name="employee"/>
      </χ:entry>
      <χ:entry key="kids" keyType="string">
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[1]/&#34;&#34;:employee[1]"
                 name="employee"/>
         <χ:node kind="element"
                 instance="d4"
                 path="/&#34;&#34;:department[2]/&#34;&#34;:employee[1]"
                 name="employee"/>
      </χ:entry>
      <χ:entry key="dep-names-attributes" keyType="string">
         <χ:node kind="attribute"
                 instance="d4"
                 path="/&#34;&#34;:department[1]/@name"
                 name="name">sales</χ:node>
         <χ:node kind="attribute"
                 instance="d4"
                 path="/&#34;&#34;:department[2]/@name"
                 name="name">accounting</χ:node>
      </χ:entry>
      <χ:entry key="dep-names" keyType="string">
         <χ:atomic-value type="string">sales</χ:atomic-value>
         <χ:atomic-value type="string">accounting</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

Remaining Issues

A collation property should be added to <χ:map/>, probably as an attribute, the transformation to serialize to χίμαιραλ should be cleaned up and the reverse transformation should be implemented.

These are pretty trivial issues and the biggest one is probably to find a way to cleanly serialize references to nodes that are not contained within an element, such as the following $map variable:

<xsl:variable name="attribute" as="node()">
    <xsl:attribute name="foo">bar</xsl:attribute>
</xsl:variable>
<xsl:variable name="map"
    select="map{
            'attribute' := $attribute
            }"/>

Support of functions should also be considered.

χίμαιραλ and the identity crisis

To some extend, χίμαιραλ can be considered as a solution to the XDM identity crisis:

  • Serializing an XDM model as χίμαιραλ creates elements for maps, map entries and atomic values and these elements, being nodes, have identities. The serialization istherefore also an instantiation of XDM information items as defined above.
  • De-serializing a χίμαιραλ to create an XDM data model is also a de-instantiation– except of course that the identity of XML nodes is not “removed”.

However, χίμαιραλ does keep a strong difference between nodes which are kept in <χ:instance> elements and maps and atomic values.

Moving the chimera forward

χίμαιραλ is a good playground to explore the new possibilities offered by XDM 3.0. Here is a (non exhaustive) list of a few directions that seem interesting…

A note Note
Don’t expect to find fully baked proposals in this section which contains, on the contrary very early drafts of ideas to follow to support XDM maps as “first class citizens”!

Embracing RDF

If you had the opportunity to enjoy the sunny weather of Orlando in December 2001, you may remember “The Syntactic Web” a provocative talk where Jonathan Robie has shown how XQuery 1.0 could be used to query normalized XML/RDF documents.

The gap between RDF triples and the versatility of its XML representation was a big issue, but the new features brought by this new version of the XPath/XQuery/XSLT package should help us.

The basic data model of RDF is based on triples, a triple being a composed of a subject, a predicate and an object. In XDM, a triple can now be represented by either a sequence, an array or a map of three items.

XDM sequences have the property that they cannot include other sequences and representing triples as sequences would mean that you couldn’t define sequences of triples. For that reason it is probably better to define triples as maps or arrays. An array being a map indexed by integers, that doesn’t make a huge difference at a conceptual level, but I find it cleaner to access to the subject of a triple using a QName (such as rdf:subject) rather than an index. Following this principle, we could define a triple as:

map {
    xs:QName('rdf:subject')   := xs:anyURI('http://www.example.org/index.html'),
    xs:QName('rdf:predicate') := xs:anyURI('http://purl.org/dc/elements/1.1/creator'),
    xs:QName('rdf:object')    := xs:anyURI('http://www.example.org/staffid/85740')
}

The χίμαιραλ serialization of this map is:

<χ:data-model xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:χ="http://χίμαιραλ.com#"
              xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <χ:map>
      <χ:entry key="rdf:object"
               keyType="xs:QName">
         <χ:atomic-value type="xs:anyURI">http://www.example.org/staffid/85740</χ:atomic-value>
      </χ:entry>
      <χ:entry key="rdf:predicate"
               keyType="xs:QName">
         <χ:atomic-value type="xs:anyURI">http://purl.org/dc/elements/1.1/creator</χ:atomic-value>
      </χ:entry>
      <χ:entry key="rdf:subject"
               keyType="xs:QName">
         <χ:atomic-value type="xs:anyURI">http://www.example.org/index.html</χ:atomic-value>
      </χ:entry>
   </χ:map>
</χ:data-model>

What can we do with such triples? Using higher order functions, it should not be too difficult to define triple stores with basic query features!

Is this lightweight enough? Or does RDF support deserve new information item types to be supported by XDM?

Syntactical sugar

We’ve seen that this JSON object

{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}

Is serialized in χίμαιραλ as:

<?xml version="1.0" encoding="UTF-8"?>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#">
    <χ:map>
        <χ:entry key="sales" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
        <χ:entry key="accounting" keyType="string">
            <χ:map>
                <χ:entry key="1" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <χ:map>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </χ:entry>
                    </χ:map>
                </χ:entry>
            </χ:map>
        </χ:entry>
    </χ:map>
</χ:data-model>

We can work with that, but wouldn’t it be nice if we had a native syntax that does not use XML elements and attributes to represent maps?

Depending on the requirements, many approaches are possible.

A first option would be to define pluggable notation parsers within XML and write:

<χ:notation mediatype="application/json"><![CDATA[
{ "accounting" : [ 
      { "firstName" : "John", 
        "lastName"  : "Doe",
        "age"       : 23 },

      { "firstName" : "Mary", 
        "lastName"  : "Smith",
        "age"       : 32 }
                 ],                                 
  "sales"     : [ 
      { "firstName" : "Sally", 
        "lastName"  : "Green",
        "age"       : 27 },

      { "firstName" : "Jim",  
        "lastName"  : "Galley",
        "age"       : 41 }
                  ]
}                  
]]></χ:notation>

The meaning of the <χ:notation/> element would be to trigger a parser supporting the application/json datatype. This is less verbose, more natural to JSON users, but doesn’t allow to add XML nodes in maps or sequences.

Another direction would be to extend the syntax of XML itself. To do so, again, there are many possibilities. The markup in XML is based on angle brackets and the distinction between the different XML productions is usually done through the character following the bracket in the opening tags.

This principle leaves a lot of possibilities. For instance, maps could be identified by the tags <{> and </}> to follow the characters used by XDM map literals and JSON objects:

<χ:data-model>
    <{>
        <χ:entry key="sales" keyType="string">
            <{>
                <χ:entry key="1" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
            </}>
        </χ:entry>
        <χ:entry key="accounting" keyType="string">
            <{>
                <χ:entry key="1" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
                <χ:entry key="2" keyType="number">
                    <{>
                        <χ:entry key="lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </χ:entry>
                        <χ:entry key="firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </χ:entry>
                    </}>
                </χ:entry>
            </}>
        </χ:entry>
    </}>
</χ:data-model>

Map entries are not ordered and in that respect they are similar to XML attributes. We could use this similarity and use the character @ to identify map entries:

<χ:data-model>
    <{>
        <@"sales" keyType="string">
            <{>
                <@"1" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"1">
                <@"2" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"2">
            </}>
        </@"sales">
        <@"accounting" keyType="string">
            <{>
                <@"1" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"1">
                <@"2" keyType="number">
                    <{>
                        <@"lastName" keyType="string">
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </@"lastName">
                        <@"age" keyType="string">
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </@"age">
                        <@"firstName" keyType="string">
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </@"firstName">
                    </}>
                </@"2">
            </}>
        </@"accounting">
    </}>
</χ:data-model>

The key names have been enclosed between quotes because map keys can include any character including whitespaces, but they can be made optional when they are not needed. We could also give to the keyType a default value of “string”:

<χ:data-model>
    <{>
        <@sales>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Green</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">27</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">Sally</χ:atomic-value>
                        </@firstName
                    </}>
                </@1
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Galley</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">41</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">Jim</χ:atomic-value>
                        </@firstName
                    </}>
                </@2
            </}>
        </@sales
        <@accounting>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Doe</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">23</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">John</χ:atomic-value>
                        </@firstName
                    </}>
                </@1
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <χ:atomic-value type="string">Smith</χ:atomic-value>
                        </@lastName
                        <@age>
                            <χ:atomic-value type="number">32</χ:atomic-value>
                        </@age
                        <@firstName>
                            <χ:atomic-value type="string">Mary</χ:atomic-value>
                        </@firstName
                    </}>
                </@2
            </}>
        </@accounting
    </}>
</χ:data-model>

Atomic values could be identified by <=> and </=> and the same default value applied to its type attribute:

<χ:data-model>
    <{>
        <@sales>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <=>Green</=>
                        </@lastName>
                        <@age>
                            <= type="number">27</=>
                        </@age>
                        <@firstName>
                            <=>Sally</=>
                        </@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <=>Galley</=>
                        </@lastName>
                        <@age>
                            <= type="number">41</=>
                        </@age>
                        <@firstName>
                            <=>Jim</=>
                        </@firstName>
                    </}>
                </@2>
            </}>
        </@sales>
        <@accounting>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>
                            <=>Doe</=>
                        </@lastName>
                        <@age>
                            <= type="number">23</=>
                        </@age>
                        <@firstName>
                            <=>John</=>
                        </@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>
                            <=>Smith</=>
                        </@lastName>
                        <@age>
                            <= type="number">32</=>
                        </@age>
                        <@firstName>
                            <=>Mary</=>
                        </@firstName>
                    </}>
                </@2>
            </}>
        </@accounting>
    </}>
</χ:data-model>

The tags that surround atomic values are useful when these values are within a sequence but look superfluous when the item has a single value. The next step could be to define that in that case as a shortcut the value and its type attribute could be directly included in the item:

<χ:data-model>
    <{>
        <@sales>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>Green</@lastName>
                        <@age type="number">27</@age>
                        <@firstName>Sally</@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>Galley</@lastName>
                        <@age type="number">41</@age>
                        <@firstName>Jim</@firstName>
                    </}>
                </@2>
            </}>
        </@sales>
        <@accounting>
            <{>
                <@1 keyType="number">
                    <{>
                        <@lastName>Doe</@lastName>
                        <@age type="number">23</@age>
                        <@firstName>John</@firstName>
                    </}>
                </@1>
                <@2 keyType="number">
                    <{>
                        <@lastName>Smith</@lastName>
                        <@age type="number">32</@age>
                        <@firstName>Mary</@firstName>
                    </}>
                </@2>
            </}>
        </@accounting>
    </}>
</χ:data-model>

XPath

The χίμαιραλ serialization being XML, it is possible to use XPath path expressions to query its structure. For instance, to get a list of employees which are less than 30, we can

write:

χ:map/χ:entry/χ:map/χ:entry/χ:map[χ:entry[@key='age'][χ:atomic-value < 30]]

Or, if we’re feeling lucky:

//χ:map[χ:entry[@key='age'][χ:atomic-value < 30]]

Again, that’s good as long we work on a χίμαιραλ serialization but it would be good to be able to use path expressions directly on map data structures. To do so we would need at minima to define steps to match maps and entries.

XSLT 3.0 introduces a new map() item type which could be used as a kind test to identify maps.

If we follow the idea that map entries are similar to XML attributes, we could use the @ notation to identify them. The XPath expression would then become:

map()/@*/map()/@*/map()[@age < 30]]

Or, if we’re feeling lucky:

//map()[@age < 30]]

Validation

These data models can be complex. Wouldn’t it be useful to be able to validate them with schema languages? This would give us a way to validate JSON maps!

Of course we can already serialize them in χίμαιραλ and validate the serialization using any schema language, but again it would be good to be able to validate these structures directly.

A RELAX NG schema to validate the χίμαιραλ serialization of our example would be:

namespace χ = "http://χίμαιραλ.com#"

start = element χ:data-model { top-level-map }

# Top level map: departments
top-level-map =
    element χ:map {
        element χ:entry {
            attribute key { xsd:NMTOKEN },
            attribute keyType { "string" },
            emp-array
        }*
    }

# List of employees
emp-array =
    element χ:map {
        element χ:entry {
            attribute key { xsd:positiveInteger },
            attribute keyType { "number" },
            emp-map
        }*
    }

# Description of an employee
emp-map = element χ:map { (age | firstName | lastName) + }

age =
    element χ:entry {
        attribute key { "age" },
        attribute keyType { "string" },
        element χ:atomic-value {
            attribute type { "number" },
            xsd:positiveInteger
        }
    }

firstName =
    element χ:entry {
        attribute key { "firstName" },
        attribute keyType { "string" },
        element χ:atomic-value {
            attribute type { "string" },
            xsd:token
        }
    }

lastName =
    element χ:entry {
        attribute key { "lastName" },
        attribute keyType { "string" },
        element χ:atomic-value {
            attribute type { "string" },
            xsd:token
        }
    }
A note Note
In the description of the maps used to describe employees, we cannot use interleave patterns because of the restriction on interleave and the schema is approximate. In this specific case, we could

enumerate the six possible combinations but the exercise would quickly become verbose if the number of items

grew:

emp-map = element χ:map { 
      (age, firstName, lastName)  
    | (age, lastName, firstName) 
    | (firstName, age, lastName)  
    | (firstName, lastName, age) 
    | (lastName, age, firstName)  
    | (lastName, firstName, age) 
}

A Schematron schema for the χίμαιραλ serialization could be developed based on XPath expressions similar to those that have been shown in the previous section.

Again, it would be interesting to support maps directly as first class citizens in XML schema languages.

The ability to use Schematron on XDM maps depends directly on the ability to browse maps using patterns and path expressions in XPath and XSLT (see above)…

The main impact on RELAX NG would be to add map and item patterns and the schema could look like:

namespace χ = "http://χίμαιραλ.com#"

start = element χ:data-model { top-level-map }

# Top level map: departments
top-level-map =
    map  {
        entry xsd:NMTOKEN {
            emp-array
        }*
    }

# List of employees
emp-array =
    map {
        entry xsd:positiveInteger {
            emp-map
        }*
    }

# Description of an employee
emp-map = map { age, firstName, lastName  }

age =
    entry age {
            xsd:positiveInteger
        }
    }

firstName =
    entry firstName {
             xsd:token
        }
    }

lastName =
    entry lastName {
            xsd:token
        }
    }

Sequences could probably be supported without adding a new pattern but would require to relax some restrictions to allow the description of sequences mixing atomic values, maps and nodes (in Relax NG, sequences of atomic values are already possible in list datatypes, sequences of nodes are of course available to describe node contents but these two type of sequences cannot be mixed).

Conclusion

According to the definition of chimeras in genetics from Wikipedia quoted in the introduction, [chimeras are formed from at least four parent cells (two fertilized eggs or early embryos fused together). Each population of cells keeps its own character and the resulting organism is a mixture of tissues].

The current XDM proposals have added to the XML data model a foreign model to represent maps. This new model is a superset of the JSON data model. The two data models keep their own character and the resulting model is a mixture of information items.

It’s far to say that the current XDM proposal is a chimera, something described as [usually ugly, foolish or impossible fantasies] by Jeni Tennison.

I hope that the proposals sketched in this paper will help to address this situation and fully integrate these new information items in the XML echosystem.

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Test driven XML development

Can test driven development be applied to XML technologies? to XSLT? to XML schema languages? XML pipelines?

This is the topic of the poster I am presenting this year at the Balisage conference.

For those of you who are not lucky enough to attend, here is a copy of this poster: Test driven development in XML Balisage 2012 poster

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

Schema test driven development

Note: this article is derived from the presentation I have given at the International Symposium on Quality Assurance and Quality Control in XML (precedings).

Abstract

Ever modified an XML schema? Ever broken something while fixing a bug or adding a new feature? As withany piece of engineering, the more complex a schema is, the harder it is to maintain. In other domains, unit tests dramatically reduce the number of regressions and thus provide a kind of safety net for maintainers. Can we learn from these techniques and adapt them to XML schema languages? In this workshop session, we develop a schema using unit test techniques, to illustrate their benefits in this domain.


The workshop is run as an exchange between a customer (played by Tommie Usdin) and a schema expert (played by Eric van der Vlist).

The customer, needs a schema for her to list XML application, is puzzled by the “test first programming” technique imposed by the schema expert.

At the end of the day (or workshop), will she be converted to this well known agile or extreme programming technique adapted to the development of XML schemas?

Step 1: Getting started

Hi Eric, can you help me to write a schema?
Customer
Hi Tommie, yes, sure, what will the schema be about?
Expert
I need a vocabulary for my todo lists, with todo ite…
Customer
OK, you’ve told me enough, let’s get started
Expert (interrupting his customer)
Get started? but I haven’t told you anything about it!
Customer
Right, but it’s never too soon to write tests when you do test first programming!
Expert
A note Note
Test first programming (also called test driven development) developers create test case (usually unit test cases) before implementing a function. The test suite is run, code is written based on the result of these tests and the test suite and code are updated untill all the tests pass.

Test suite

(suite.xml):

<tf:suite xmlns:tf="http://xmlschemata.org/test-first/" xmlns:todo="http://balisage.net/todo#" title="Basic tests">
    <tf:case title="Root element" expected="valid" id="root">
        <todo:list/>
    </tf:case>
</tf:suite>
A note Note
The vocabulary used to define these test cases has been inspired by the SUT (XML Schema Unit Test) project. It’s a simple vocabulary (composed of only three different element) allowing to pack several XML instances together with the outcome validation result. It uses conventions that you’ll discover during the course of this workshop.

Figure 1. Test results

Test results

Test results

 

A note Note
The test suite is run using a simple Orbeon Forms application. The rendering relies on Orbeon Forms XForms’ implementation while the test suite is run using an Orbeon Forms’ XPLpipeline.

Step 2: Adding a schema

You see, you can already write todo lists!
Expert
Hold on, we don’t have any schema!
Customer
That’s true, but you don’t have to write a schema to write XML documents.
Expert
I know, but you’re here to write a schema! Furthermore right now we accept anything. I don’t want

to have XML documents with anything as a root element!

Customer
That’s a good reason to write a schema but before that we need to add a test in our suite

first.

Expert

Test suite

(suite.xml):

<?xml version="1.0" encoding="UTF-8"?>
<tf:suite xmlns:tf="http://xmlschemata.org/test-first/" xmlns:todo="http://balisage.net/todo#" title="Basic tests">
    <tf:case title="TODO list toot element" expected="valid" id="root">
        <todo:list/>
    </tf:case>
    <tf:case title="Other root element" expected="error" id="other-root">
        <todo:title>A title</todo:title>
    </tf:case>
</tf:suite>
Now that we’ve updated the test suite, we run it again.
Expert

Figure 2. Test results

Test results

Test results

 

This result was expected and we can now proceed to create a schema and attach it to the test suite.
Expert
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified"
    targetNamespace="http://balisage.net/todo#"
    xmlns="http://balisage.net/todo#">

    <xs:element
        name="list"/>
</xs:schema>

todo.xsd

<?xml version="1.0" encoding="UTF-8"?>
<tf:suite
    xmlns:tf="http://xmlschemata.org/test-first/"
    xmlns:todo="http://balisage.net/todo#"
    title="Basic tests">
    <tf:validation
        href="todo.xsd"
        type="xsd"/>
    <tf:case
        title="TODO list toot element"
        expected="valid"
        id="root">
        <todo:list/>
    </tf:case>
    <tf:case
        title="Other root element"
        expected="error"
        id="other-root">
        <todo:title>A title</todo:title>
    </tf:case>
</tf:suite>

suite.xml

It’s time to test again what we’ve done.
Expert

Figure 3. Test results

Test results

Test results

 

Step 3: Adding list title elements

I am happy to see some progress, at last, but I don’t want to accept any content in the todo list element. Can you add list title elements?
Customer
Sure, back to the test suite…
Expert

Test suite

(suite.xml):

<?xml version="1.0" encoding="UTF-8"?>
<tf:suite
    xmlns:tf="http://xmlschemata.org/test-first/"
    xmlns:todo="http://balisage.net/todo#"
    title="Basic tests">
    <tf:validation
        href="todo.xsd"
        type="xsd"/>
    <tf:case
        title="TODO list root element"
        expected="valid"
        id="root">
        <todo:list/>
    </tf:case>
    <tf:case
        title="TODO list with a title"
        expected="valid"
        id="list-title">
        <todo:list>
            <todo:title/>
        </todo:list>
    </tf:case>
    <tf:case
        title="Other root element"
        expected="error"
        id="other-root">
        <todo:title>A title</todo:title>
    </tf:case>
</tf:suite>
Now that we’ve updated the test suite, we run it again.
Expert

Figure 4. Test results

Test results

Test results

 

You see? We do already support list title elements!
Expert
Sure, but I don’t want to accept any content in my todo list. And the title element should be mandatory. And it should not be empty by have at least one character!
Customer
Back to the test suite, then…
Expert

Test suite

(suite.xml):

<?xml version="1.0" encoding="UTF-8"?>
<tf:suite
    xmlns:tf="http://xmlschemata.org/test-first/"
    xmlns:todo="http://balisage.net/todo#"
    title="Basic tests">
    <tf:validation
        href="todo.xsd"
        type="xsd"/>
    <todo:list>
        <tf:case
            title="Empty list element"
            expected="error"
            id="root-empty"/> 
        <todo:title>
            <tf:case title="empty title" expected="error" id="empty-title"/>
            <tf:case title="non empty title" expected="valid" id="non-empty-title">A title</tf:case>
        </todo:title>
        <tf:case
            title="Un expected element"
            expected="error"
            id="unexpected">
            <todo:foo/>
        </tf:case>
    </todo:list>
    <tf:case
        title="Other root element"
        expected="error"
        id="other-root">
        <todo:title>A title</todo:title>
    </tf:case>
</tf:suite>
A note Note
This is the first example with non top level tf:case elements. To understand how this works, we must look in more detail to the algorithm used by the framework to split a test suite into instances. The algorithm consists in two steps:

  • Loop over each tf:case element
  • Suppression of the tf:caseelements and of the top level elements which arenot ancestors of the current tf:case element.

This description may look complex, but the result is a rather intuitive way to define sub-trees that are common to several test cases.

Now that we’ve updated the test suite, we run it again.
Expert

Figure 5. Test results

Test results

Test results

 

Sounds good, now we can update the schema.
Expert
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified"
    targetNamespace="http://balisage.net/todo#"
    xmlns="http://balisage.net/todo#">

    <xs:element
        name="list">
        <xs:complexType>
            <xs:sequence>
                <xs:element
                    name="title">
                    <xs:simpleType>
                        <xs:restriction
                            base="xs:token">
                            <xs:minLength
                                value="1"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

todo.xsd

And run the test suite again.
Expert

Figure 6. Test results

Test results

 

Step 4: Adding todo item elements

Good. Now I want to add todo items. And lists should have at least one of them, by the way.
Customer
Sure, back to the test suite…
Expert

Test suite

(suite.xml):

<?xml version="1.0" encoding="UTF-8"?>
<tf:suite
    xmlns:tf="http://xmlschemata.org/test-first/"
    xmlns:todo="http://balisage.net/todo#"
    title="Basic tests">
    <tf:validation
        href="todo.xsd"
        type="xsd"/>
    <tf:case
        title="Empty list element"
        expected="error"
        id="root-empty">
        <todo:list/>
    </tf:case>
    <todo:list>
        <!-- Testing title elements -->
        <todo:title>
            <tf:case
                title="empty title"
                expected="error"
                id="empty-title"/>
            <tf:case
                title="non empty title"
                expected="valid"
                id="non-empty-title">A title</tf:case>
        </todo:title>
        <todo:item>
            <todo:title>A title</todo:title>
        </todo:item>
        <tf:case
            title="Un expected element"
            expected="error"
            id="unexpected">
            <todo:foo/>
        </tf:case>
    </todo:list>
    <todo:list>
        <!-- Testing todo items -->
        <todo:title>Testing todo items</todo:title>
        <tf:case
            title="No todo items"
            expected="error"
            id="no-items"/>
        <todo:item>
            <tf:case
                title="empty item"
                expected="error"
                id="empty-item"/>
            <tf:case
                title="item with a title"
                expected="valid"
                id="item-title">
                <todo:title>A title</todo:title>
            </tf:case>
        </todo:item>
    </todo:list>
    <tf:case
        title="Other root element"
        expected="error"
        id="other-root">
        <todo:title>A title</todo:title>
    </tf:case>
</tf:suite>
Let’s see what we get before any update to the schema:
Expert

Figure 7. Test results

Test results

Test results

 

It’s time to update the schema and fix what needs to be fixed:
Expert
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified"
    targetNamespace="http://balisage.net/todo#"
    xmlns="http://balisage.net/todo#">

    <xs:element
        name="list">
        <xs:complexType>
            <xs:sequence>
                <xs:element
                    name="title">
                    <xs:simpleType>
                        <xs:restriction
                            base="xs:token">
                            <xs:minLength
                                value="1"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>
                <xs:element
                    maxOccurs="unbounded"
                    name="item">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element
                                name="title">
                                <xs:simpleType>
                                    <xs:restriction
                                        base="xs:token">
                                        <xs:minLength
                                            value="1"/>
                                    </xs:restriction>
                                </xs:simpleType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

todo.xsd

And now we can check if we get it right.
Expert

Figure 8. Test results

Test results

Test results

 

Step 5: Modularizing the schema

Eric, you should be ashamed, it’s a pure Russian doll schema, not modular at all! Why not define the title and list elements globally?
Customer
Sure, we can do that! If we just change the structure of the schema, we don’t need to update the test suite and can work directly on the schema:
Expert
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified"
    targetNamespace="http://balisage.net/todo#"
    xmlns="http://balisage.net/todo#">

    <xs:element
        name="list">
        <xs:complexType>
            <xs:sequence>
                <xs:element
                    ref="title"/>
                <xs:element
                    maxOccurs="unbounded"
                    ref="item"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:element
        name="title">
        <xs:simpleType>
            <xs:restriction
                base="xs:token">
                <xs:minLength
                    value="1"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:element>
    <xs:element
        name="item">
        <xs:complexType>
            <xs:sequence>
                <xs:element
                    ref="title"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

todo.xsd

But of course, each time we update the schema we must check if we’ve not introduced any bug:
Expert

Figure 9. Test results

Test results

Test results

 

Waoo, what’s happening now?
Customer
Now that our elements are global in the schema, we accept a valid title as a root element. Is that what you want?
Expert
No way, a title is not a valid list!
Customer
Then we have a number of options… We can go back to local elements and we can also add a schematron schema to check this constraint.
Expert
Schematron is fine, we’ll probably find many other constraints to check in my todo lists anyway…
Customer
OK. We still don’t have to update the test suite since we’ve not changed our requirement. Let’s write this Schematron schema then:
Expert
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <ns uri="http://balisage.net/todo#" prefix="todo"/>
    <pattern>
        <rule context="/*">
            <assert test="self::todo:list">The root element should be a todo:list</assert>
        </rule>
    </pattern>
</schema>

todo.sch

Saying that we don’t have to update the test suite wasn’t totally accurate because the schemas are referenced in ths test suite:
Expert

Test suite

(suite.xml):

<?xml version="1.0" encoding="UTF-8"?>
<tf:suite
    xmlns:tf="http://xmlschemata.org/test-first/"
    xmlns:todo="http://balisage.net/todo#"
    title="Basic tests">
    <tf:validation
        href="todo.sch"
        type="sch"/>
    <tf:validation
        href="todo.xsd"
        type="xsd"/>
    <tf:case
        title="Empty list element"
        expected="error"
        id="root-empty">
        <todo:list/>
    </tf:case>
    <todo:list>
        <todo:title>
            <tf:case
                title="empty title"
                expected="error"
                id="empty-title"/>
            <tf:case
                title="non empty title"
                expected="valid"
                id="non-empty-title">A title</tf:case>
        </todo:title>
        <todo:item>
            <todo:title>A title</todo:title>
        </todo:item>
        <tf:case
            title="Un expected element"
            expected="error"
            id="unexpected">
            <todo:foo/>
        </tf:case>
    </todo:list>
    <todo:list>
        <todo:title>Testing todo items</todo:title>
        <tf:case
            title="No todo items"
            expected="error"
            id="no-items"/>
        <todo:item>
            <tf:case
                title="empty item"
                expected="error"
                id="empty-item"/>
            <tf:case
                title="item with a title"
                expected="valid"
                id="item-title">
                <todo:title>A title</todo:title>
            </tf:case>
        </todo:item>
    </todo:list>
    <tf:case
        title="Other root element"
        expected="error"
        id="other-root">
        <todo:title>A title</todo:title>
    </tf:case>
</tf:suite>
Time to see if we’ve fixed our issue!
Expert

Figure 10. Test results

Test results

Test results

 

Great, we’ve made it, thanks!
Customer

Want to try it?

The application used to run the test suite and display its result is available at http://svn.xmlschemata.org/repository/downloads/tefisc/.

If you just want to understand how the test suite is split into XML instances, you can have a look at http://svn.xmlschemata.org/repository/downloads/tefisc/orbeon-resources/apps/tefisc/ .

In this directory:

  • split-tests.xslis the XSLT transformation that splits a test suite into top levelelement test cases. This transformation has no dependence on Orbeon Forms and can be manuallyrun against a test suite.
  • run-test.xpl is the XPL pipeline that runs a test case.
  • list-suites.xpl is the XPL pipeline that gives the list avaible test cases.
  • view.xhtml is the XForms application that displays the results.

To install this application:

  • InstallOrbeon Forms
  • copy the orbeon-resources/ directory under /WEB-INF/resources/apps/in yourorbeon webapp directory
  • or, alternatively, copy the tefisc/ directory wherever you want, edit web.xml.savto replace<param-value>/home/vdv/projects/tefisc/orbeon-resources</param-value>by the location of this directory on your filesystem, replace /WEB-INF/web.xml by this file and restart your application server.
Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

XML Prague 2011 : XML à l’attaque du web

Salle de conférence pendant la pause café

Après une période un peu folle entre 2000 et 2008 pendant laquelle j’ai participé à un nombre impressionnant de conférences, je m’étais mis un peu en retrait et n’avais plus participé à aucune conférence depuis XTech 2008.

XML Prague 2011 était donc pour moi l’occasion de rencontrer à nouveau la communauté des experts XML internationaux et j’étais curieux de voir comment elle avait évolué pendant ces trois dernières années.

MURATA Makoto (EPUB3: Global Language and Comic)A côté des aspects plus techniques, je n’oublierai pas l’image de Murata Makoto exprimant sobrement sa peine pour les victimes du tremblement de terre au Japon.

La tagline de XML Prague 2011 était “XML devait être l’espéranto du Web. Pourquoi n’est-ce pas le cas?” (“XML as new lingua franca for the Web. Why did it never happen?”).

Michael Sperberg-McQueen (Closing keynote)Le contenu de la conférence est resté proche de cette ligne mais il a été résumé de manière plus exacte par Michael Sperberg-McQueen lors de sa clôture : “Mettons du XML dans le navigateur, qu’ils le veuillent ou non!”

Norman Walsh (HTML+XML: The W3C HTML/XML Task Force)Le ton a été donné par Norman Walsh dès la toute première présentation: la convergence entre HTML et XML n’aura pas lieu.

XML a tenté d’être un format neutre convenant aussi bien aux documents qu’aux données sur le web. On peut dire aujourd’hui que cet objectif n’a pas été atteint et que les formats les plus populaires sur le web sont HTML pour les documents et JSON pour les données.

Cela ne semble pas préoccuper plus que mesure le public de XML Prague composé d’aficionados des langages à balises : si la “masse des développeurs web” n’est pas intéressée par XML c’est son problème. Les bénéfices liés à XML sont bien connus et cela signifie simplement que la communauté XML devra développer les outils nécessaires pour utiliser XML dasn le navigateur aussi bien que sur le serveur.

Sur ce thème, beaucoup de présentations couvraient le support de XML dans le navigateur ainsi que les passerelles entre JSON et XML :

  • Validation XML Schema côté client par Henry S. Thompson and Aleksejs Goremikins
  • JSON pour XForms par Alain Couthures
  • XSLT dans le navigateur par Michael Kay
  • Traitement de XML efficace dans les navigateurs par Alex Milowski
  • XQuery dans le navigateur par Peter Fischer

Les outils côté serveurs ont fait l’objet de moins de sessions, peut être parce que le sujet est plus ancien :

  • Une façade JSON pour le serveur MarkLogic par Jason Hunter
  • CXAN: étude de cas pour Servlex, un framework XML pour le web par Florent Georges
  • Akara – “Spicy Bean Fritters” et services de données XML par Uche Ogbuji

Bien entendu, les standards étaient aussi au programme :

  • HTML+XML: la task force W3C HTML/XML (déjà mentionnée) par Norman Walsh
  • Standards update: XSLT 3.0 par Michael Kay
  • Standards update: XML, XQuery, XML Processing Profiles, DSDL par Liam Quin, Henry S. Thompson, Jirka Kosek

Ainsi que les applications de XML :

  • Configuration d’équipements réseau avec NETCONF et YANG par Ladislav Lhotka
  • Développements XML – XML Projects par George Bina
  • EPUB3: le langage et les bandes dessinées par Murata Makoto
  • EPUB: Chapitres et versets par Tony Graham
  • DITA NG – une implémentation Relax NG de DITA par George Bina

Sans oublier quelques présentations techniques sur les implémentations elles mêmes :

  • Traduction de SPARQL et SQL en XQuery par Martin Kaufmann
  • Réécritures déclaratives de XQuery pour le profit et le plaisir par John Snelson

Et la séance de clôture par le roi de cet excercice, Michael Sperberg-McQueen.

Ma présentation, “injection XQuery”, était assez atypique dans cet ensemble et il a fallu tout le talent de Michael Sperberg-McQueen pour lui trouver un point commun en faisant remarquer que pour avoir une chance de mettre XML sur le web il faudrait se préoccuper un peu plus de sécurité.

J’avais été impressionné lors des conférences XTech par l’évolution des techniques de présentation, la plupart des intervenants rejetant les traditionnelles présentation powerpoint et leurs “transparents” surchargés pour des alternatives plus légères et beaucoup plus imagées.

John Snelson (Declarative XQuery Rewrites for Profit or Pleasure)Je pensais ce mouvement inéluctable et ai été bien surpris de voir qu’il n’avait guère atteint les intervenants de XML Prague 2011 qui (à l’exception très notable de John Snelson) continuaient à utiliser powerpoint de manière très traditionnelle.

J’avais conçu ma présentation en suivant ce que je croyais être la technique de présentation devenue classique. Utilisant Slidy, j’avais pas moins de 35 pages très concises à présenter en 25 minutes. Chaque page avait une photo différente en arrière plan et ne comprenait que quelques mots.

Les commentaires ont été plutôt positifs bien que certaines photos d’injections aient choqué quelques participants.

Ma présentation étant du HTML standard, j’avais jugé plus sur d’utiliser l’ordinateur mis à disposition par les organisateurs. C’était sans compter sur les 74 Moctets d’images à charger pour les fonds de pages qui ont mis à mal cet ordinateur un peu poussif et les pages étaient un peu lentes à l’affichage (note personnelle : la prochaine fois, utilise ton ordinateur)!

The twitter wall (and Norman Walsh)Le “mur twitter” projeté au moyen d’un second vidéo projecteur a eu également beaucoup de succès.

Ce mur a été bien pratique pour communiquer pendant les sessions et il remplace avantageusement les canaux IRC que nous utilisions auparavant.

Twitter ne permet malheureusement pas de rechercher dans ses archives et, alors que j’écris ces mots, je ne peux déjà plus accéder aux tweets du premier jour de la conférence!

Avec un peu de recul, si j’essaye d’analyser ce qui s’est dit à XML Prague 2011, j’ai des sentiments mitigés à propos de ce fossé qui se creuse entre communautés Web et XML.

Le rêve que XML puisse être accepté par l’ensemble de la communauté des développeurs web était une vision très forte et nous ne devons pas oublier que XML a été conçu pour mettre “SGML sur le web“.

Ceci dit, il faut bien reconnaître que les développeurs web ont toujours été réticents devant la complexité additionnelle (réelle ou perçue) de XHTML. Ce fossé a toujours existé et après que XML ait manqué le virage du Web 2.0 il était trop tard pour espérer le combler.

XML sur le web restera donc une niche et continuera à être utilisé par une minorité, mais la créativité et le dynamisme de la communauté qui s’est manifesté à Prague est impressionnant et encourageant : il y a encore place pour beaucoup d’innovations et XML est, plus que jamais, une technologie de choix pour développer des applications web.

Photos

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

XML Prague 2011: XML against the web

Coffee break

After a frenzy period between 2000 and 2008 where I have spoken at an impressive number of conferences, I temporally retired and hadn’t been at a conference since XTech 2008.

For me, XML Prague 2011 was the first opportunity to meet again face to face with the community of XML core techies and I was curious to find out what the evolution had been during the past three years.

MURATA Makoto (EPUB3: Global Language and Comic)Aside from all the technical food for thought, an image of the conference that I won’t forget is Murata Makoto expressing his grief for the victims of the earthquake in Japan with simple and sober terms.

The tag line of XML Prague 2011 was “XML as new lingua franca for the Web. Why did it never happen?”.

Michael Sperberg-McQueen (Closing keynote)The actual content of the conference has been close to this tag line but was better summarized by Michael Sperberg-McQueen during his closing keynotes: “Let’s put XML in the browser, whether they want it there or not!”

Norman Walsh (HTML+XML: The W3C HTML/XML Task Force)The tone was given by Norman Walsh during the very first session: the convergence between HTML and XML will not happen.

XML has been trying hard to be an application neutral format for the web that could be used both for documents and data. It is fair to say that it has failed to reach this goal and that the preferred formats on the web are HTML for documents and JSON for data.

That doesn’t seem to bother that much the XML Prague attendees who are markup language addicts anyway: if the “mass of web developers” do not care about XML that’s their problem. The benefits of using XML is well known and that just means that we have to develop the XML tools we need on the server as well as on the browser.

Following this line, many sessions were about developing XML support on the browser and bridging the gaps between XML and HTML/JSON:

  • Client-side XML Schema validation by Henry S. Thompson and Aleksejs Goremikins
  • JSON for XForms by Alain Couthures
  • XSLT on the browser by Michael Kay
  • Efficient XML Processing in Browsers by Alex Milowski
  • XQuery in the Browser reloaded by Peter Fischer

By contrast, server side tools have been less represented, maybe because the domain had been better covered in the past:

  • A JSON Facade on MarkLogic Server by Jason Hunter
  • CXAN: a case-study for Servlex, an XML web framework by Florent Georges
  • Akara – Spicy Bean Fritters and XML Data Services by Uche Ogbuji

Of course, standard updates were also on the program:

  • HTML+XML: The W3C HTML/XML Task Force (already mentioned) by Norman Walsh
  • Standards update: XSLT 3.0 by Michael Kay
  • Standards update: XML, XQuery, XML Processing Profiles, DSDL by Liam Quin, Henry S. Thompson, Jirka Kosek

We also had talks about XML applications:

  • Configuring Network Devices with NETCONF and YANG by Ladislav Lhotka
  • Advanced XML development – XML Projects by George Bina
  • EPUB3: Global Language and Comic by Murata Makoto
  • EPUB: Chapter and Verse by Tony Graham
  • DITA NG – A Relax NG implementation of DITA by George Bina

Without forgetting a couple of implementation considerations:

  • Translating SPARQL and SQL to XQuery by Martin Kaufmann
  • Declarative XQuery Rewrites for Profit or Pleasure by John Snelson

And the traditional and always impressive closing keynote by Michael Sperberg-McQueen.

My own presentation, “XQuery injection”, was quite atypical and it took all the talent of Michael Sperberg-McQueen to kindly relate it to “XML on the web” by noticing that security would have to be taken more seriously to make it happen.

One of the things that had impressed me during XTech conferences was the shift in presentation styles, most speakers moving away from heavy bullet points stuffed traditional powerpoint presentations to lighter and better illustrated shows.

John Snelson (Declarative XQuery Rewrites for Profit or Pleasure)I had expected the move to continue and have been surprised to see that the movement doesn’t seem to have caught XML Prague presenters whom continued to do with traditional bullet points with only a couple of exceptions (John Snelson being a notable exception).

I had worked my presentation to use what I thought would be a common style. Using Slidy, I had created no less than 35 short pages to present in 25 minutes. Each page had a different high resolution picture as a background and contained only a few words.

The comments have been generally good even though some pictures chosen to represent injections seem to have hurt the feelings of some attendees.

Since my presentation is just standard HTML, I had been brave enough to use the shared computer. Unfortunately, the presentation loads 74 Megs of background pictures and that was a little bit high for the shared computer that took several seconds to change pages (note to self: next time, use your own laptop)!

The twitter wall (and Norman Walsh)Another interesting feature of this conference was the “twitter wall” that was projected in the room using a second video projector.

This wall has proven to be very handy to communicate during the sessions and it can be seen like a more modern incarnation of the IRC channels used in earlier conferences.

Unfortunately, twitter doesn’t allow to search in archives and while I am writing these words, I can no longer go back in the past and read the tweets of the first day of the conference.

Looking backward at the conference, I have mixed feelings about this gap that now seems to be widely accepted on both sides between the XML and the web developers communities.

The dream that XML could be accepted by the web community at large was a nice vision and we should not forget that XML has been designed to be “SGML on the web“.

Web developers have always been reluctant to accept the perceived additional complexity of XHTML and the gap has been there from the beginning and after XML missed the train of Web 2.0 it was too late to close it.

XML on the web will stay a niche and will be used by a minority but the creativity and dynamism of the community shown at Prague is inspiring and encouraging: there is still room for a lot of innovation and XML is more than ever the technology of choice to power web applications.

All the pictures

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites