English – Page 42 – Eric van der Vlist

XDM Maps should be first class citizens

Note: This issue has been submitted to the W3C as #16118.

The XPath/XQuery/XSLT 3.0 Data model distinguishes three types of information items:

Nodes that directly relate to the XML Infoset with some information borrowed from the PSVI.
Functions
Atomic types.

Michael Kay has recently proposed to add maps as a fourth item type derived from functions.

The main motivation for this addition is to support JSON objects that can be considered as a subset of maps items.

However, in the current proposal map items are treated very differently from XML nodes and this has deep practical consequences.

Take for instance the following simple JSON sample borrowed from Wikipedia:

{
     "firstName": "John",
     "lastName" : "Smith",
     "age"      : 25,
     "address"  :
     {
         "streetAddress": "21 2nd Street",
         "city"         : "New York",
         "state"        : "NY",
         "postalCode"   : "10021"
     },
     "phoneNumber":
     [
         {
           "type"  : "home",
           "number": "212 555-1234"
         },
         {
           "type"  : "fax",
           "number": "646 555-4567"
         }
     ]
 }

To get the postalCode from an equivalent structure expressed as XML and stored in the variable $person, one would just use the following XPath expression: $person/address/postalCode.

When the same structure is expressed in JSON and parsed into an XDM map, XPath axes can no longer be used (their purpose is to traverse documents, ie nodes) and we need to use map functions: map:get(map:get($person, 'address'), 'postcalCode').

That’s not as bad as it sounds because maps can be invoked as functions and this can be rewritten as $person('address')('postalCode') but this gives a first idea of the deep differences between maps and nodes and things would become worse if I wanted to get the postal code of persons whose first name are « John »…

Another important difference is that node items are the only ones that have a context or an identity.

When I write <foo><bar>5</bar></foo><bat></bar>5</bar></bat> each of the two bar elements happen to have the same names and values but they are considered as two different elements and even the two text nodes that are their children are two different text nodes.

When I write foo: {bar: 5}, bat: {bar: 5} the two bar entries are actually the same thing and can’t be distinguished.

This difference is important because that means that XPath axes as we know them for nodes could never be implemented on maps: if an entry in a map can’t be distinguished from en identical entry else where in another map there is no hope to be able to determine its parent for instance.

Now, why is it important to be able to define axes on maps and map entries?

I think that this is important for XSLT and XQuery users to be able to traverse maps like they traverse XML fragments (with the same level of flexibility and syntaxes that are kept as close as possible). And yes, that means being able to apply templates over maps and be able to update maps using XQuery update…

But I also think that this will be important to other technologies that rely on XPath such as (to name those I know best) XForms, pipeline languages (XPL, XPROC, …) and Schematron.

Being able to use XForms to edit JSON object is an obvious need that XForms 2.0 is trying to address through a « hack » that has been presented at XML Prague 2012.

In a longer term we can hope that XForms will abandon this hack to rely on XDM maps XForms relies a lot on the notions of nodes and axes. XForms binds controls to instance nodes and the semantics of such bindings would be quite different to be applied to XDM map entries as currently proposed.

XML pipeline languages are also good candidates to support JSON objects. Both XPL and XProc have features to loop over document fragments and choose actions depending on the results of XPath expressions and again the semantics of these features would be affected if they had to support XDM maps as currently proposed.

Schematron could be a nice answer to the issue of validating JSON objects. Schematron relies on XPath at two different levels: its rules are defined as XPath expressions and it is often very convenient to be able to use XPath axes such as ancestor and its processing model is defined in term of traversing a tree. Again, an update of Schematron to support maps would be more difficult is maps are not similar to XML nodes.

Given the place of JSON on the web, I think that it is really important to support maps and the question we have to face is: « do we want a minimal level of support that may require hard work from developers and other standards to support JSON or do we want to make it as painless as possible for them? ».

And obviously, my preference is the later: if we add maps to the XDM, we need to give them full citizenship from the beginning!

Note: The fact that map entries are unordered (and they need to be because the properties of JSON objects are unordered) is less an issue to me. We already have two node types (namespaces nodes and attributes) which relative order are « stable but implementation-dependent ».

Introducing χίμαιραλ (chimeral), the Chimera Language

In the presentation I gave at XML Prague 2012 (see my paper), one of my conclusions was that the XML data model extended by the XPath/XQuery/XSLT Working Group to embrace other data models such as JSON was an important foundation of the whole XML ecosystem.

In her amazing keynote, Jeni Tennison warned us against chimeras, “ugly, foolish or impossible fantasies” and I have thought that it would be useful to check to which extent the XPath/XQuery/XSLT 3.0 data model (aka XDM 3.0) deserves to be called a chimera.

The foundation of this data model is the XML infoset, but it also borrows informations items from the Post Schema Validation Infoset (the [in]famous PSVI) and adds its own abstract items such as sequences and, new in 3.0, functions and maps (needed to represent JSON objects).

I started to think more seriously about this, doing some researches and writing a proposal for Balisage and my plan was to wait until the conference to publish anything.

One of the things I planed to present is a simple XML serialization format for the XDM. My initial motivation to propose such a format was to have a visualization of the XDM: I find it difficult to represent it if its instances stay purely abstract and can’t be serialized and deserialized.

Working on this, I have soon discovered that this serialization can have other concrete benefits: the items that have been recently added to the XDM such as maps and even sequences are not treated as first class citizens by XPath/XQuery/XSLT and the data model can be easier to traverse using its serialization!

When for instance you have a complex map imported from JSON by the brand new parse-json() function, you can’t easily apply the templates on the map items and sub items. And of course, with a XML serialization that becomes trivial to do.

If such a serialization can be useful, there is no reason to wait until Balisage in August to discuss it and I’d like to introduce the very first version of χίμαιραλ (chimeral), the Chimera Language.

The URL itself http://χίμαιραλ.com is a chimera composed of letters from two different alphabets and merging concepts from two different civilizations!

This first version is not complete. It already supports rather complex cases, but I need to think more how to deal with maps or sequences of nodes such as namespace nodes or attributes.

So far I am really impressed by XPath 3.0 but also surprised by many limitations in term of reflexion:

No built in function to determine the basic type of an item (node, attribute, sequence, map, function, …).
The dm:node-kind()accessor to determine the type of a node is abstract and XPath 3.0 does not expose it.
The behavior of the exslt:object-type() function is surprising.

I may have missed something, but in practice I have found quite difficult when you have a variable to browse its data model.

The other aspect that I don’t like in XPath/XQuery/XSLT 3.0 is the lack of homogeneity between the way the different types of items are manipulated. This strengthen the feeling that we have a real chimera!

In XSLT for instance, I’d like to be able to apply templates and match items in the same way for any item types. Unfortunately, the features that are needed to do so (node tests, axis, …) are reserved to XML nodes. I can’t define a template that matches a map (nor a sequence by the way), I can’t apply templates over map items, …

It may be too late for version 3.0, but I really think that we should incorporate these recent additions to make them first class citizens!

Going forward, we could reconsider the way these items mix and match. Currently you can have sequences of maps, functions, nodes and atomic values, maps which values are sequences, functions, nodes and atomic values but nodes are only composed of other nodes. Even if the XML syntax doesn’t support this, I would really like to see more symmetry and be able to add sequences and maps within nodes!

In other words, I think that it would be much more coherent to treat maps and sequences like nodes…

Note: The χίμαιραλ website is currently « read only » but comments are very welcome on this blog of by mail.

XML Prague 2012: The web would be so cool without the web developers

Note: XML Prague is also a very interesting pre-conference day, a traditional dinner, posters, sponsors announcements, meals, coffee breaks, discussions and walks that I have not covered in article for lack of time.

When I was a child, I used to say that I was feeling Dutch when I was in France and French when I was in the Netherlands. That was nice to feel slightly different and I liked to analyze the differences between Dutch people who seemed to be more adult and civilized and French people who seemed to me more spontaneous and fierce.

I have found back this old feeling of being torn between two different culture very strongly this week end at XML Prague. Of course, that was no longer between French and Dutch but between the XML and Web communities.

The conference also reminded me the old joke of the Parisian visiting Corsica and saying « Corsica would be so cool without Corsicans! » and for me the tag line could have been « the web would be so cool without web developers! ».

Jeni Tennison’s amazing opening keynote was of course more subtle than that!

She started by acknowledging that the web was split into no less than four different major formats: HTML, JSON, XML and RDF.

Her presentation has been a set of clever considerations over how we can deal with these different formats and cultures, concluding that we should accept the fact than « the web is varied, complex, dynamic, beautiful ».

I was then giving my talk « XML, the eX Markup Language » (read also my paper of this blog) where I have analyzed the reasons of the failure of XML to become the one major web format and given my view on where XML should be heading.

While Jeni had explained why « chimera are usually ugly, foolish or impossible fantasies », my conclusion has been that we should focus on the data model and extend or bridge it to embrace JSON and HTML like the XPath 3.0 data model is proposing to do.

I am still thinking so, but what is such a data model if not a chimera? Is it ugly, foolish or impossible then? There is a lot to think about beyond what Hans-Jürgen Rennau and David Lee have proposed at Balisage 2011 and I think I’ll submit a proposal at Balisage 2012 on this topic!

Robin Berjon and Norman Walsh tried then to bridge the gap with their presentation « XML and HTML Cross-Pollination: A Bridge Too Far?« , an interesting talk where they’ve tried to show how interesting ideas could be shared between these two communities: « forget about angle brackets, look at the ideas ». This gave a nice list of things that do work in the browser (did you know that you could run JavaScript against any XML document?) and fascinating new ideas such as JS-SLT, a JavaScript transformation library or CSS-Schema, a CSS based schema assertion language.

Anne van Kesteren had chosen a provocative title for his talk: « What XML can learn from HTML; also known as XML5« . Working for Opera, Anne was probably the only real representative of the web community at this conference. Under that title, his presentation was an advocacy for releasing the strictness of the XML parsing rules and defining an error recovery mechanism in XML as they exist in HTML5.

His talk was followed by a panel discussion on HTML/XML convergence and this subject of error recovery has monopolized the full panel! Some of the panelists (Anne van Kesteren, Robin Berjon and myself) were less hostile but the audience did unanimously reject the idea to change anything in the well-formedness rules of the XML recommendation.

Speaking of errors may be part of the problem: errors have a bad connotation and if a syntactical construct is allowed by the error recovery mechanism with a well defined meaning, why should we still consider it an error?

However, a consensus was found to admit that it could be useful to specify an error recovery mechanism that could be used when applications need to read non well formed XML documents that may be found on the wide web. This consensus has led to the creation of the W3C XML Error Recovery Community Group.

The reaction if the room that didn’t accept to even consider a discussion on what XML well-formedness means seems rather irrational to me. Michael Sperberg-McQueen reinforced this feeling in his closing keynote when he pleaded to define this as « a separate add-on rule rather than as a spec that changes the fundamental rules of XML ».

What can be so fundamental with the definition of XML well-formedness? These reactions made me feel like we were discussing kashrut rules rather than parsing rules and the debate often looked more religious than technical!

The next talk, XProc: Beyond application/xml by Vojtěch Toman was again about bridging technologies but was less controversial, probably because the technologies to bridge with were not seen as XML competitors.

Taking a look at the workarounds used by XML pipelines to support non XML data (either encoding the data or storing it out of the pipeline), Vojtěch proposed to extend the data model flowing in the pipelines to directly support non XML content. That kind of proposal looks so obvious and simple that you wonder why it hasn’t been done before!

George Bina came next to present Understanding NVDL – the Anatomy of an Open Source XProc/XSLT implementation of NVDL. NVDL is a cool technology to bridge different schema languages and greatly facilitates the validation of compound XML documents.

Next was Jonathan Robie, presenting JSONiq: XQuery for JSON, JSON for XQuery. JSONiq is both a syntax and a set of extensions to query JSON documents in an XQuery flavor that looks like JSON. Both the syntax and the extensions look both elegant and clever.

The room was usually very quiet during the talks, waiting for the QA sessions at the end of the talks to ask questions or give comments, but as soon as Jonathan displayed the first example, Anne van Kesteren couldn’t help gasping: « what? arrays are not zero based! »

Having put JSON clothes on top of an XPath data model, JSONiq has a base index equal to one for its arrays while JavaScript and most programming languages use zero for their base indexes.

Proposing zero based arrays inside a JSONic syntax to web developers is like wearing a kippah to visit an orthodox Jew and bring him baked ham: if you want to be kosher you need to be fully kosher!

Norman Walsh came back on stage to present Corona: Managing and querying XML and JSON via REST, a project to « expose the core MarkLogic functionality—the important things developers need— as a set of services callable from other languages » in an format agnostic way (XML and JSON can be used interchangeably).

The last talk of this first day was given by Steven Pemberton, Treating JSON as a subset of XML: Using XForms to read and submit JSON. After a short introduction to XForms, Steven explained how the W3C XForms Working Group is considering supporting JSON in XForms 2.0.

While Steven was speaking, Michael Kay twitted what many of us were thinking: « Oh dear, yet another JSON-to-XML mapping coming…« . Unfortunately, until JSON finds its way into the XML Data Model, every application that wants to expose JSON to XML tool has to propose a mapping!

The first sessions of the second day were spent by Jonathan Robie and Michael Kay to present What’s New in XPath/XSLT/XQuery 3.0 and XML Schema 1.1.

A lot of good things indeed! XML Schema 1.1 in particular that will correct the biggest limitations of XML Schema 1.0 and borrow some features to Schematron, making XML Schema an almost decent schema language!

But the biggest news are for XPath/XSLT/XQuery 3.0, bringing impressive new features that will turn these languages into fully functional programming languages. And of course new types in the data model to support the JSON data model.

One of these new features are annotations and Adam Retter gave a good illustration of how these annotations can be used in his talk RESTful XQuery – Standardised XQuery 3.0 Annotations for REST. XQuery being used to power web applications, these annotations can be used to define how stored queries are associated to HTTP requests and Adam proposes to standardize them to insure interoperability between implementations.

For those of use whose head was not spinning yet, Alain Couthures came to explain how he is Compiling XQuery code into Javascript instructions using XSLT 1.0 for his XSLTForms implementation. If we can use XSLT 1.0 to compile XQuery into JavaScript, what are the next steps? XSLT 2.0?

After the lunch, Evan Lenz came to present Carrot, « an appetizing hybrid of XQuery and XSLT » which was first presented at Balisage 2011. This hybrid is not a chimera but a nice compromise for those of us who can’t really decide if they prefer XSLT or XQuery: Carrot extends the non XML syntax of XQuery to expose the templating system of XSLT.

It can be seen as yet another non XML syntax for XSLT, a templating extension for XQuery and borrows their best features to both languages!

Speaking of defining templates in XQuery, John Snelson came next to present Transform.XQ: A Transformation Library for XQuery 3.0. Taking profit of the functional programming features of XQuery 3.0, Transform.XQ is an XQuery library that implements templates in XQuery. These templates are not exactly similar to XSLT templates (the priority system is different) but like in XSLT you’ll find template definitions,apply templates methods, modes, priorities and other goodies.

Java had not been mentioned yet and Charles Foster came to propose Building Bridges from Java to XQuery. Based on XQuery API for Java (XQJ), this bridges rely on Java annotations to map Java classes and XQuery stored queries and of course POJOs are also mapped to XML to provide a very sleek integration.

The last talk was a use case by Lorenzo Bossi presenting A Wiki-based System for Schema and Data Evolution, providing a good summary of the kind of problem you have when you need to update schemas and corpus’s of documents.

Everyone was then holding their breath waiting for Michael Sperberg-McQueen’s closing keynote that has been brilliant as usual and almost impossible to summarize and should be watched on video!

Michael choose to use John Amos Comenius as an introduction for his keynote. Comenius has been the last bishop of Unity of the Brethren and became a religious refugee. That gave Michael an opportunity to call for tolerance and diversity in document formats like in real life. Comenius is also one of the earliest champions of universal education and Michael pointed out that structured markup languages were the new champions of this noble goal in his final conclusion.

Of course, there has been much more than that in his keynote, Michael taking care to mention each presentation, but this focus on Comenius confirmed my feeling of the religious feeling toward XML.

I agree with most what Michael said in his keynote except maybe when he seems to deny that the XML adoption can be considered disappointing. When he says that the original goal of XML to be able to use SGML on the web has been achieved because he, Michael Sperberg-McQueen, can use XML on his web sites, that’s true of course, but was the goal really to allow SGML experts to use SGML on the web?

It’s difficult for me to dissent because he is the one who was involved in XML at that time when I had never heard of SGML, but I would still argue that SGML was usable on the web by SGML experts and that I don’t understand the motivation of the simplification that gave birth to XML if that was not to lower the price to entry so that web developers could use XML.

The consequences of this simplifications have been very heavy: the whole stack of XML technologies had to be reinvented and SGML experts have lost a lot of time before these technologies could be considered to be at the same level as they were. And even now some features of SGML that have been stripped down could be very useful for experts on the web such as for instance DTDs powerful enough to describe wiki syntaxes.

Similarly, when discussing during lunch with Liam Quin about my talk, he said that he had always thought that XHTML would never replace HTML. I have no reason to contradict Liam, but the vision of the W3C Markup Activity was clearly to « Deliver the Web of the Future Today: Recasting HTML in XML » like it can be seen on this archive.

It’s not pleasant to admit that we’ve failed, but replacing HTML with XHTML so that XML became dominant on the browser was clearly the official vision of the W3C shared by a lot of us and this vision has failed!

We need to acknowledge that we’ve lost this battle and make peace with the web developers that have won…

Curiously, there seems to be much less aggressiveness toward JSON than toward HTML5 in the XML community as can be shown by the number of efforts to bridge XML and JSON. Can we explain this by the fact that many XML purists considered data oriented XML as less interesting and noble than document oriented XML?

Anyway, the key point is that very strong ecosystem has been created with an innovative, motivated and almost religious community and a technology stack which is both modern and mature.

XML, the eX Markup Language?

Note: this article is a copy of the paper that I have presented at XML Prague 2012.

Abstract

Revisiting the question that was the tag line of XML Prague last year: « XML as new lingua franca for the Web. Why did it never happen? », Eric tries to answer to other questions such as: « where is XML going? » or « is XML declining, becoming an eX Markup Language? ».

XML as new lingua franca for the Web. Why did it never happen?

This was the tagline of XML Prague 2011, but the question hasn’t really been answered last year and I’ll start this talk to give my view on that question.

Flashback

February 1998 is a looong time ago, a date from another century and for those of you who were not born or don’t remember, here is a small summary of what did happen in February 1998:

	February The United States Senate passes Resolution 71, urging U.S. President Bill Clinton to « take all necessary and appropriate actions to respond to the threat posed by Iraq‘s refusal to end its weapons of mass destruction programs. » February 3 – Cavalese cable car disaster: a United States Military pilot causes the deaths of 20 people near Trento, Italy, when his low-flying plane severs the cable of a cable-car. February 4 – An earthquake measuring 6.1 on the Richter scale in northeast Afghanistan kills more than 5,000 people. February 7–February 22 – The 1998 Winter Olympics are held in Nagano, Japan. February 16 – China Airlines Flight 676 crashes into a residential area near Chiang Kai-shek International Airport, killing 202 people (all 196 on board and 6 on the ground). February 20 – Iraq disarmament crisis: Iraqi President Saddam Hussein negotiates a deal with U.N. Secretary General Kofi Annan, allowing weapons inspectors to return to Baghdad, preventing military action by the United States and Britain.
	—Wikipedia

While the Iraq disarmament crisis was raging, the World Wide Web Consortium waited until the third day of the Winter Olympics held in Nagano to make the following announcement:

	Advancing its mission to lead the Web to its full potential, the World Wide Web Consortium (W3C) today announced the release of the XML 1.0 specification as a W3C Recommendation. XML 1.0 is the W3C’s first Recommendation for the Extensible Markup Language, a system for defining, validating, and sharing document formats on the Web
	—W3C Press Release (February 1998)

People curious enough to click on the second link of the announcement could easily double check that beyond the marketing bias XML was something to be used over the Internet:

	The design goals for XML are: XML shall be straightforwardly usable over the Internet. XML shall support a wide variety of applications. XML shall be compatible with SGML. It shall be easy to write programs which process XML documents. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. XML documents should be human-legible and reasonably clear. The XML design should be prepared quickly. The design of XML shall be formal and concise. XML documents shall be easy to create. Terseness in XML markup is of minimal importance.
	—W3C Recommendation (February 1998)

And the point was reinforced by the man who had led the « Web SGML » initiative and is often referred to as the father of XML:

	XML arose from the recognition that key components of the original web infrastructure — HTML tagging, simple hypertext linking, and hardcoded presentation — would not scale up to meet the future needs of the web. This awareness started with people like me who were involved in industrial-strength electronic publishing before the web came into existence.
	—Jon Bosak

This has often been summarized saying that XML is about « putting SGML on the Web ».

Among the design goals the second one (« XML shall support a wide variety of applications ») has been especially successful and by the end of 98, Liora Alschuler reported that the motivations of the different players pushing XML forward were very diverse:

	The big-gun database vendors, IBM and Oracle, see XML as a pathway into and out of their data management tools. The big-gun browser vendors, Netscape and Microsoft, see XML as the e-commerce everywhere technology. The big-gun book and document publishers, for all media, are seeing a new influx of tools, integrators, and interest but the direction XML publishing will take is less well-defined and more contingent on linking and style specs still in the hands of the W3C.
	—Liora Alschuler for XML.com (December 1998)

One thing these « big-gun » players that were pushing XML to different directions did achieve has been to develop an incredible hype that rapidly covered everything and in 2001 the situation had become hardly bearable:

	Stop the XML hype, I want to get offAs editor of XML.com, I welcome the massive success XML has had. But things prized by the XML community — openness and interoperability — are getting swallowed up in a blaze of marketing hype. Is this the price of success, or something we can avoid?
	—Edd Dumbill (March 2001)

Marketers behind the hype being who they were, the image of XML that they promoted was so shiny that the XML gurus didn’t recognize their own technology and tried to fight against the hype:

	I’ve spent years learning XML / I like XML / This is why www.XmlSuck.com is here
	—PaulT (January 2001)

The attraction was high and people rushed to participate to the W3C working groups:

	Working Group size – so many people means it is difficult to gain consensus, or even know everyone’s face. Conference calls are difficult.
	—Mark Nottingham, about the SOAP W3C WG (May 2000)

Huge working groups with people pushing to different directions is not the best recipe to publish high quality standards and even though XML itself was already baked, the perception of XML depends on the full « stack »:

	This is a huge responsibility for the Schema Working Group since it means that the defects of W3C XML Schema will be perceived by most as defects of XML.
	—Eric van der Vlist on xml-dev (April 2001)

The hype was so huge that XML geeks rapidly thought that they had won the war and that XML was everywhere:

	XML is now as important for the Web as HTML was to the foundation of the Web. XML is everywhere.
	—connet.us (February 2001)

Why this hype? My guess is that the IT industry had such a desperate need for a data interchange format that any one of them could have been adopted at that time and that XML happened to be the one that went through the radar screen at the right moment:

	When the wind is strong enough, even flatirons can fly.
	—Anonymous (February 2012)

The W3C had now to maintain:

XML, a SGML subset
HTML, a SGML application that did not match the XML subset

Technically speaking, the thing to do was to refactor HTML to meet the XML requirements. Given the perceived success of XML, it seemed obvious that everyone would jump into the XML wagon and be eager to adopt XHTML.

Unfortunately from a web developer perspective the benefits of XHTML 1.0 were not that obvious:

	The problem with XHTML is :a) it’s different enough from HTML to create new compatibility problems.b) it’s not different enough from HTML to bring significant advantages.
	—Eric van der Vlist on XHTML-DEV (May 2000)

It is fair to say that Microsoft had been promoting XML since the beginning:

	XML, XML, EverywhereThere’s no avoiding XML in the .NET world. XML isn’t just used in Web applications, it’s at the heart of the way data is stored, manipulated, and exchanged in .NET systems.
	— Rob Macdonald for MSDN (February 2001)

However, despite their strong commitment to XML, Microsoft had frozen new developments on Internet Explorer. The browser has never been updated to support the XHTML media type, meaning that the few web sites using XHTML had to serve their pages as HTML!

By 2001, the landscape was set:

XML had become a dominant buzzword giving a false impression that it had been widely adopted
Under the hood, many developers were deeply upset by this hype even among the XML community
Serving XHTML web pages as such was not an option for most web sites

The landscape was set, but the hype was still high and XML was still gaining traction as a data interchange format.

In the meantime, another hype was growing…

Wikipedia has tracked the origin of the term Web 2.0 back to 1999:

The Web we know now, which loads into a browser window in essentially static screenfuls, is only an embryo of the Web to come…./…Ironically, the defining trait of Web 2.0 will be that it won’t have ant visible characteristics at all. The Web will be identified only by its underlying DNA structure– TCP/IP (the protocol that controls how files are transported across the Internet); HTTP (the protocol that rules the communication between computers on the Web), and URLs (a method for identifying files).

…/…

The Web will be understood not as screenfuls of text and graphics but as a transport mechanism, the ether through which interactivity happens.

—Darcy DiNucci (1999)

The term became widely known with the first Web 2.0 conferences in 2003 and 2004 and XML was an important piece of the Web 2.0 puzzle through Ajax (Asynchronous JavaScript and XML), coined and defined by Jesse James Garrett in 2005 as:

	Ajax isn’t a technology. It’s really several technologies, each flourishing in its own right, coming together in powerful new ways. Ajax incorporates: standards-based presentation using XHTML and CSS; dynamic display and interaction using the Document Object Model; data interchange and manipulation using XML and XSLT; asynchronous data retrieval using XMLHttpRequest; and JavaScript binding everything together.
	—Jesse James Garrett (February 2005)

This definition shows how, back in 2005, some of us still thought that XML could dominate the Web and be used both to exchange documents (in XHTML) and data.

Unfortunately, this vision defended by the W3C, has been rapidely torpedoed by Ian Hickson and Douglas Crockford.

Founded in 1994 for that purpose, the W3C had been the place where HTML had been normalized. Among other things, the W3C had been the place where the antagonists of the first browser war could meet and discuss in a neutral field.

In 2004, Netscape had disappeared, Microsoft had frozen the development of their browser and browser innovation moved into the hand of new players: Mozilla, Apple/Safari and Opera who was starting to gain traction.

Complaining that the W3C did not meet their requirements and that HTML needed to be updated urgently to meet the requirements what would be soon known as Web 2.0, they decided to fork the development of HTML:

	Software developers are increasingly using the Internet as a software platform, with Web browsers serving as front ends for server-based services. Existing W3C technologies — including HTML, CSS and the DOM — are used, together with other technologies such as JavaScript, to build user interfaces for these Web-based applications.However, the aforementioned technologies were not developed with Web Applications in mind, and these systems often have to rely on poorly documented behaviors. Furthermore, the next generation of Web Applications will add new requirements to the development environment — requirements these technologies are not prepared to fulfill alone. The new technologies being developed by the W3C and IETFcan contribute to Web Applications, but these are often designed to address other needs and only consider Web Applications in a peripheral way.The Web Hypertext Applications Technology working group therefore intends to address the need for one coherent development environment for Web Applications. To this end, the working group will create technical specifications that are intended for implementation in mass-market Web browsers, in particular Safari, Mozilla, and Opera.
	—WHATWG (June 2004)

The W3C was behind a simple choice: either push XHTML recommendations that would never be implemented in any browsers or ditch XHTML and ask the WHATWG to come back and continue their work toward HTML5 as a W3C Working Group. The later option was eventually chosen and HTML work resumed within W3C in 2007.

JSON was around since 2001. It took a few years of Douglas Crockford’s energy to popularize this JavaScript subset but around 2005, JSON rapidly became a technology of choice as a « “Fat-Free Alternative to XML” » in Ajax applications.

There is no direct link between HTML5 and JSON but the reaction against XML, its hype and its perceived complexity is a strong motivation in both cases.

Why?

A number of reasons can be found for this failure:

Bad timing between the XML and HTML specifications (see Adam Retter’s presentation at XML Amsterdam 2011).
Lack of quality of some XML recommendations (XML Namespaces, XML Schema, …).
Lack of pedagogy to explain why XML is the nicer technology on the earth.
Dumbness of Web developers who not use XML.
…

There is some truth in all these explanations, but the main reason is that from the beginning we (the XML crowd) have been arrogant, over confident and have made a significant design error.

When we read this quote:

	XML arose from the recognition that key components of the original web infrastructure — HTML tagging, simple hypertext linking, and hardcoded presentation — would not scale up to meet the future needs of the web. This awareness started with people like me who were involved in industrial-strength electronic publishing before the web came into existence.
	—Jon Bosak

We all understand what Jon Bosak meant and we probably all agree that HTML is limited and that something more extensible makes our lives easier, but we must also admit that we have been proven wrong and that HTML has been enough to scale up to the amazing applications we see today.

Of course, the timing was wrong and everything would have been easier if Tim Berners-Lee had came up with a first version of HTML that would have been a well formed XML document but on the other hand, the web had to exist before we could put SGML on the web and there had to be a prior technology.

In 1998 it was already clear that HTML was widespread and the decision to create XML as a SGML subset that would be incompatible with HTML has been a bad one:

Technically speaking because that meant that millions of existing pages would be non well formed XML (« “the first Google index in 1998 already had 26 million pages”« ).
Tactically speaking because that could be understood as « what you’ve done so far was crappy, now you must do what we tell you to do ».

To avoid this deadly risk, the first design goal of XML should have been that existing valid HTML documents were well formed XML documents. The result might have been a more complex format and specification, but this risk to create a gap between XML and HTML communities would have been minimized.

Another reason to explain this failure is that XML is about extensibility. This is both its main strength and weakness: extensibility comes at a price and XML is more complex than domain specific languages.

Remove the need for extensibility and XML will always loose against DSLs, we’ve seen a number of examples in the past:

RELAX NG compact syntax
JSON
HTML
N3
CSS
…

Is it a time to refactor XML? Converge or convert?

Hmmm… It’s time to address the questions asked this year by XML Prague!

We’ve failed to establish XML as the format to use on the web but we’ve succeeded in creating a strong toolbox which is very powerful to power websites and exchange information.

I don’t know if it’s to compensate the ecosystems that we are destructing on our planet, but one of the current buzzwords among developers is « ecosystem »: dominant programming languages such as Java and JavaScript are becoming « ecosystems » that you can use to run a number of applications that may be written using other programming languages.

What we’ve built with XML during the past 14 years is a very strong ecosystem.

The XML ecosystem is based on an (almost) universal data model that can not only represent well formed XML documents but also HTML5 documents and (with an impedance mismatch that may be reduced in future versions) JSON objects.

	Note
	Notable exceptions that cannot be represented by the XML data model include overlapping structures and graphs.

On top of this data model, we have a unique toolbox that includes:

transformation and query languages
schema languages
processing (pipeline) languages
databases
web forms
APIs for traditional programming languages
signature and encryption standards
a text based serialization syntax
binary serialization syntaxes

We can truly say that what’s important in XML is not the syntax but that:

	Angle Brackets Are a Way of Life
	—Planet XMLHack

Rather than fighting fights that we’ve already lost we need to develop our ecosystem.

The number one priority is to make sure that our data model embraces the web that is taking shape (which means HTML5 and JSON) as efficiently as possible. Rather than converge or convert we must embrace, the actual syntax is not that important after all!

To grow our ecosystem, we could also consider embracing more data models, such as graphs (RDF), name/value pairs (NOSQL), relations (SQL), overlaps (LMNL).

I am more skeptical about refactoring XML at that stage.

It’s always interesting to think about what could be done better, but refactoring a technology as widespread as XML is tough and needs to be either backward compatible or provide a huge benefit to compensate the incompatibilities.

Will we see a proposal that will prove me wrong during the conference?

SOPA, PIPA, Owark and long term preservation

We use to distinguish things that are eternal and those that are ephemeral, but how valid is our judgment?

Take Wikipedia for instance. I used to take for granted that Wikipedia was here for ever, up and running and ready to send me any version of any page in any language and I was wondering how useful it is for my Owark project to archive Wikipedia pages.

Here come SOPA and PIPA and suddenly we realize that Wikipedia is threatened:

Wikipedia would be threatened in many ways. For example, in its current form, SOPA could require Wikipedia to actively monitor every site we link to, to ensure it doesn’t host infringing content. Any link to an infringing site could put us in jeopardy of being forced offline. The trust and openness that underlies the entire Wikipedia project would be threatened, and new, restrictive policies would make it harder for us to be open to new contributors.

If we can read the Odyssey today, it’s not because its original « editor » has been able to preserve it, but because « Lots of Copies Keep Stuff Safe » and enough copies had been spread to insure its transmission.

If Wikipedia (or any other website) are weaker than we use to think and can be closed down, we need to spread as many copies as possible and this is really what Owark is about.

Now, is that enough?

I have mixed feelings when I read (twitted by Karl Dubost and written by Sarah Lacy) that:

Long-term there’s no future in printed books.

I understand the point and there may be no future in printed books medium term, but electronic books depends on cheap and ubiquitous electricity and I wouldn’t bet that this will be the case long term!

We know that sooner or later we will have to dramatically reduce our power consumption and we don’t know how smooth or brutal will be the transition.

If we are wise enough to manage a smooth transition, the industry might be able to adapt itself.

If not, there is a serious risk is that many books or web pages, digital photos, songs, music, videos that rely on cheap energy are simply lost forever!

Should we print web pages to archive them?

Happy New Year 2012

As usual, it’s been tough to select a picture…

This one has been taken in the small woodland that isolates our orchard from the chemical crops grown by our neighbors.

I like the contrast between the yellow of the ginkgo biloba, and the brown and green of the other leaves.

It’s also a good illustration of the concept of « Retour à la Terre » (return to the earth or soil) that we promote elsewhere and if you look carefully, you’ll see a small woodlouse and a redworm which are major actors of soil ecology.

That being said, this year I have decided to show you the other pictures that I had selected!

Dear oreilly.com, cool URIs don’t change, please!

Dear oreilly.com, I hope you will not mind if I am doing some buzz on your name, but that’s for owark, the Open Web Archive, a project that have been launched at OSCON 2011 and that could make good use of some additional visibility.

Owark is currently implemented as a WordPress plugin that runs in three of my websites and replaces broken links by links to local archives.

As I was presenting owark today during a workshop at Paris Web 2011, I noticed that http://oreilly.com/catalog/9780596529321/index.html was one of these broken links and used this example to demonstrate the usefulness of this project.

Now that I have made my point, explaining to my attendees that even websites as geeky as oreilly.com could not be trusted to avoid linkrot, can I suggest that you add redirections from these old URLs to the new ones?

Of course, I know that you know that « cool URIs don’t change » and I have noticed that you are already redirecting http://oreilly.com/catalog/9780596529321/ (without the trailing « index.html ») to its new location but that’s not enough ;) …

Thanks,

Eric

PS: owark needs your help. If you’re interested, please leave a comment (if you log using an OpenID you’ll get the bonus of not being moderated), drop me a mail at vdv@dyomedea.com or contact me on Identica, Twitter, Skype, Yahoo or MSN where my pseudo is « evlist ».

$Owark\'s dashboard on this blog$

$Error 404 when clicking on the link on O\'Reilly\'s website$

$That shouldn\'t happen!$

Owark WordPress plugin v0.1

I am proud to announce that I have installed the very first version of my owark WordPress plugin on this blog.

Note: the plugin is still in an early stage and I wouldn’t recommend to install it on your blog!

Standing for Open Web Archive, owark is a project that I’ll be presenting at OSCON 2011.

This first version is only a small piece in the bigger vision I’ll be presenting at OSCON, but I find it already pretty cool…

The plugin relies on the Broken Link Checker to harvest the links in the blog content and check for broken links and on GNU Wget to perform the archiving itself.

I had been archiving links for a while with a bash script and I have already some stuff in my archive database so that this plugin doesn’t start from scratch and can take advantage of this history.

When Norman Walsh Internet went down a couple of days ago, the link checker noticed it and the owark plugin replaces all the links to his blog by archives copies.In « Validating microformats » for instance, the link to Norm Walsh’s essay as been replaced by a link to its archive.
In the comments section of « YUI and XHTML » the links to the web site of M. David Peterson which is down have been replaced by links to their archive.

These are just a couple of simple examples, but I am happy with the progress so far…

What’s next for XML?

Seems to be what’s next for XML time again!

As far as I remember working with XML, people have been discussing what was next for XML…

Back in 1999 -yes, that was last century!- when XML was only one year old there have been SML, announced on XML-DEV, developed on SML-DEV, forked into YAML which can be seen as a superset of JSON.

SML inspired Simon St.Laurent who published in 2000 the specification of a profile of XML called Common XML.

Refactoring suggestions blossomed again in late 2004 and I felt the need to write why I thought this was a bad idea.

The discussions went over and over and in 2008, Norman Walsh explained why he thought that was a bad idea.

At some point, I lost interest in such discussions and was awakened by James Clark’s MicroXML suggestion in December last year.

For Norman Walsh, the trigger appears to have been the MicroXML poster at XML Prague and he has answered with his own suggestions « XML v.next » as soon as he’s been back home from the conference.

These new proposals are cool and very tempting, but I don’t see what’s different from all those we’ve seen in the past and still think such efforts are very unlikely to succeed.

I feel comforted in this judgment by the feeling I had in XML Prague that the XML community has lost faith in the idea of « XML as new lingua franca for the Web »: in this context, these efforts look like desperate attempts to reshape an old technology so that it looks sexy again!

Does that mean that I see no future for XML?

In his first post about MicroXML, James Clark distinguishes three kind of works on XML (XML 2.0, XML.next and MicroXML).

XML.next (« something that is intended to be a more functional replacement for XML, but is not designed to be compatible« ) seems to be both more interesting and more likely to happen to me.

James Clark also says:

XML.next is a big project, because it needs to tackle not just XML but the whole XML stack. It is not something that can be designed by a committee from nothing; there would need to be one or more solid implementations that could serve as a basis for standardization. Also given the lack of compatibility, the design will have to be really compelling to get traction. I have a lot of thoughts about this, but I will leave them for another post.

Last time he made this kind of promises, he was speaking of a better schema language and came with TREX, an ancestor of RELAX NG.

I am really curious and excited to see what he has in mind for replacing XML!

XML Prague 2011: XML against the web

Coffee break

After a frenzy period between 2000 and 2008 where I have spoken at an impressive number of conferences, I temporally retired and hadn’t been at a conference since XTech 2008.

For me, XML Prague 2011 was the first opportunity to meet again face to face with the community of XML core techies and I was curious to find out what the evolution had been during the past three years.

Aside from all the technical food for thought, an image of the conference that I won’t forget is Murata Makoto expressing his grief for the victims of the earthquake in Japan with simple and sober terms.

The tag line of XML Prague 2011 was « XML as new lingua franca for the Web. Why did it never happen? ».

The actual content of the conference has been close to this tag line but was better summarized by Michael Sperberg-McQueen during his closing keynotes: « Let’s put XML in the browser, whether they want it there or not! »

The tone was given by Norman Walsh during the very first session: the convergence between HTML and XML will not happen.

XML has been trying hard to be an application neutral format for the web that could be used both for documents and data. It is fair to say that it has failed to reach this goal and that the preferred formats on the web are HTML for documents and JSON for data.

That doesn’t seem to bother that much the XML Prague attendees who are markup language addicts anyway: if the « mass of web developers » do not care about XML that’s their problem. The benefits of using XML is well known and that just means that we have to develop the XML tools we need on the server as well as on the browser.

Following this line, many sessions were about developing XML support on the browser and bridging the gaps between XML and HTML/JSON:

Client-side XML Schema validation by Henry S. Thompson and Aleksejs Goremikins
JSON for XForms by Alain Couthures
XSLT on the browser by Michael Kay
Efficient XML Processing in Browsers by Alex Milowski
XQuery in the Browser reloaded by Peter Fischer

By contrast, server side tools have been less represented, maybe because the domain had been better covered in the past:

A JSON Facade on MarkLogic Server by Jason Hunter
CXAN: a case-study for Servlex, an XML web framework by Florent Georges
Akara – Spicy Bean Fritters and XML Data Services by Uche Ogbuji

Of course, standard updates were also on the program:

HTML+XML: The W3C HTML/XML Task Force (already mentioned) by Norman Walsh
Standards update: XSLT 3.0 by Michael Kay
Standards update: XML, XQuery, XML Processing Profiles, DSDL by Liam Quin, Henry S. Thompson, Jirka Kosek

We also had talks about XML applications:

Configuring Network Devices with NETCONF and YANG by Ladislav Lhotka
Advanced XML development – XML Projects by George Bina
EPUB3: Global Language and Comic by Murata Makoto
EPUB: Chapter and Verse by Tony Graham
DITA NG – A Relax NG implementation of DITA by George Bina

Without forgetting a couple of implementation considerations:

Translating SPARQL and SQL to XQuery by Martin Kaufmann
Declarative XQuery Rewrites for Profit or Pleasure by John Snelson

And the traditional and always impressive closing keynote by Michael Sperberg-McQueen.

My own presentation, « XQuery injection », was quite atypical and it took all the talent of Michael Sperberg-McQueen to kindly relate it to « XML on the web » by noticing that security would have to be taken more seriously to make it happen.

One of the things that had impressed me during XTech conferences was the shift in presentation styles, most speakers moving away from heavy bullet points stuffed traditional powerpoint presentations to lighter and better illustrated shows.

I had expected the move to continue and have been surprised to see that the movement doesn’t seem to have caught XML Prague presenters whom continued to do with traditional bullet points with only a couple of exceptions (John Snelson being a notable exception).

I had worked my presentation to use what I thought would be a common style. Using Slidy, I had created no less than 35 short pages to present in 25 minutes. Each page had a different high resolution picture as a background and contained only a few words.

The comments have been generally good even though some pictures chosen to represent injections seem to have hurt the feelings of some attendees.

Since my presentation is just standard HTML, I had been brave enough to use the shared computer. Unfortunately, the presentation loads 74 Megs of background pictures and that was a little bit high for the shared computer that took several seconds to change pages (note to self: next time, use your own laptop)!

Another interesting feature of this conference was the « twitter wall » that was projected in the room using a second video projector.

This wall has proven to be very handy to communicate during the sessions and it can be seen like a more modern incarnation of the IRC channels used in earlier conferences.

Unfortunately, twitter doesn’t allow to search in archives and while I am writing these words, I can no longer go back in the past and read the tweets of the first day of the conference.

Looking backward at the conference, I have mixed feelings about this gap that now seems to be widely accepted on both sides between the XML and the web developers communities.

The dream that XML could be accepted by the web community at large was a nice vision and we should not forget that XML has been designed to be « SGML on the web« .

Web developers have always been reluctant to accept the perceived additional complexity of XHTML and the gap has been there from the beginning and after XML missed the train of Web 2.0 it was too late to close it.

XML on the web will stay a niche and will be used by a minority but the creativity and dynamism of the community shown at Prague is inspiring and encouraging: there is still room for a lot of innovation and XML is more than ever the technology of choice to power web applications.