XML Prague 2011 : XML à l’attaque du web

Salle de conférence pendant la pause café

Après une période un peu folle entre 2000 et 2008 pendant laquelle j’ai participé à un nombre impressionnant de conférences, je m’étais mis un peu en retrait et n’avais plus participé à aucune conférence depuis XTech 2008.

XML Prague 2011 était donc pour moi l’occasion de rencontrer à nouveau la communauté des experts XML internationaux et j’étais curieux de voir comment elle avait évolué pendant ces trois dernières années.

MURATA Makoto (EPUB3: Global Language and Comic)A côté des aspects plus techniques, je n’oublierai pas l’image de Murata Makoto exprimant sobrement sa peine pour les victimes du tremblement de terre au Japon.

La tagline de XML Prague 2011 était “XML devait être l’espéranto du Web. Pourquoi n’est-ce pas le cas?” (“XML as new lingua franca for the Web. Why did it never happen?”).

Michael Sperberg-McQueen (Closing keynote)Le contenu de la conférence est resté proche de cette ligne mais il a été résumé de manière plus exacte par Michael Sperberg-McQueen lors de sa clôture : “Mettons du XML dans le navigateur, qu’ils le veuillent ou non!”

Norman Walsh (HTML+XML: The W3C HTML/XML Task Force)Le ton a été donné par Norman Walsh dès la toute première présentation: la convergence entre HTML et XML n’aura pas lieu.

XML a tenté d’être un format neutre convenant aussi bien aux documents qu’aux données sur le web. On peut dire aujourd’hui que cet objectif n’a pas été atteint et que les formats les plus populaires sur le web sont HTML pour les documents et JSON pour les données.

Cela ne semble pas préoccuper plus que mesure le public de XML Prague composé d’aficionados des langages à balises : si la “masse des développeurs web” n’est pas intéressée par XML c’est son problème. Les bénéfices liés à XML sont bien connus et cela signifie simplement que la communauté XML devra développer les outils nécessaires pour utiliser XML dasn le navigateur aussi bien que sur le serveur.

Sur ce thème, beaucoup de présentations couvraient le support de XML dans le navigateur ainsi que les passerelles entre JSON et XML :

  • Validation XML Schema côté client par Henry S. Thompson and Aleksejs Goremikins
  • JSON pour XForms par Alain Couthures
  • XSLT dans le navigateur par Michael Kay
  • Traitement de XML efficace dans les navigateurs par Alex Milowski
  • XQuery dans le navigateur par Peter Fischer

Les outils côté serveurs ont fait l’objet de moins de sessions, peut être parce que le sujet est plus ancien :

  • Une façade JSON pour le serveur MarkLogic par Jason Hunter
  • CXAN: étude de cas pour Servlex, un framework XML pour le web par Florent Georges
  • Akara – “Spicy Bean Fritters” et services de données XML par Uche Ogbuji

Bien entendu, les standards étaient aussi au programme :

  • HTML+XML: la task force W3C HTML/XML (déjà mentionnée) par Norman Walsh
  • Standards update: XSLT 3.0 par Michael Kay
  • Standards update: XML, XQuery, XML Processing Profiles, DSDL par Liam Quin, Henry S. Thompson, Jirka Kosek

Ainsi que les applications de XML :

  • Configuration d’équipements réseau avec NETCONF et YANG par Ladislav Lhotka
  • Développements XML – XML Projects par George Bina
  • EPUB3: le langage et les bandes dessinées par Murata Makoto
  • EPUB: Chapitres et versets par Tony Graham
  • DITA NG – une implémentation Relax NG de DITA par George Bina

Sans oublier quelques présentations techniques sur les implémentations elles mêmes :

  • Traduction de SPARQL et SQL en XQuery par Martin Kaufmann
  • Réécritures déclaratives de XQuery pour le profit et le plaisir par John Snelson

Et la séance de clôture par le roi de cet excercice, Michael Sperberg-McQueen.

Ma présentation, “injection XQuery”, était assez atypique dans cet ensemble et il a fallu tout le talent de Michael Sperberg-McQueen pour lui trouver un point commun en faisant remarquer que pour avoir une chance de mettre XML sur le web il faudrait se préoccuper un peu plus de sécurité.

J’avais été impressionné lors des conférences XTech par l’évolution des techniques de présentation, la plupart des intervenants rejetant les traditionnelles présentation powerpoint et leurs “transparents” surchargés pour des alternatives plus légères et beaucoup plus imagées.

John Snelson (Declarative XQuery Rewrites for Profit or Pleasure)Je pensais ce mouvement inéluctable et ai été bien surpris de voir qu’il n’avait guère atteint les intervenants de XML Prague 2011 qui (à l’exception très notable de John Snelson) continuaient à utiliser powerpoint de manière très traditionnelle.

J’avais conçu ma présentation en suivant ce que je croyais être la technique de présentation devenue classique. Utilisant Slidy, j’avais pas moins de 35 pages très concises à présenter en 25 minutes. Chaque page avait une photo différente en arrière plan et ne comprenait que quelques mots.

Les commentaires ont été plutôt positifs bien que certaines photos d’injections aient choqué quelques participants.

Ma présentation étant du HTML standard, j’avais jugé plus sur d’utiliser l’ordinateur mis à disposition par les organisateurs. C’était sans compter sur les 74 Moctets d’images à charger pour les fonds de pages qui ont mis à mal cet ordinateur un peu poussif et les pages étaient un peu lentes à l’affichage (note personnelle : la prochaine fois, utilise ton ordinateur)!

The twitter wall (and Norman Walsh)Le “mur twitter” projeté au moyen d’un second vidéo projecteur a eu également beaucoup de succès.

Ce mur a été bien pratique pour communiquer pendant les sessions et il remplace avantageusement les canaux IRC que nous utilisions auparavant.

Twitter ne permet malheureusement pas de rechercher dans ses archives et, alors que j’écris ces mots, je ne peux déjà plus accéder aux tweets du premier jour de la conférence!

Avec un peu de recul, si j’essaye d’analyser ce qui s’est dit à XML Prague 2011, j’ai des sentiments mitigés à propos de ce fossé qui se creuse entre communautés Web et XML.

Le rêve que XML puisse être accepté par l’ensemble de la communauté des développeurs web était une vision très forte et nous ne devons pas oublier que XML a été conçu pour mettre “SGML sur le web“.

Ceci dit, il faut bien reconnaître que les développeurs web ont toujours été réticents devant la complexité additionnelle (réelle ou perçue) de XHTML. Ce fossé a toujours existé et après que XML ait manqué le virage du Web 2.0 il était trop tard pour espérer le combler.

XML sur le web restera donc une niche et continuera à être utilisé par une minorité, mais la créativité et le dynamisme de la communauté qui s’est manifesté à Prague est impressionnant et encourageant : il y a encore place pour beaucoup d’innovations et XML est, plus que jamais, une technologie de choix pour développer des applications web.

Photos

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

XML Prague 2011: XML against the web

Coffee break

After a frenzy period between 2000 and 2008 where I have spoken at an impressive number of conferences, I temporally retired and hadn’t been at a conference since XTech 2008.

For me, XML Prague 2011 was the first opportunity to meet again face to face with the community of XML core techies and I was curious to find out what the evolution had been during the past three years.

MURATA Makoto (EPUB3: Global Language and Comic)Aside from all the technical food for thought, an image of the conference that I won’t forget is Murata Makoto expressing his grief for the victims of the earthquake in Japan with simple and sober terms.

The tag line of XML Prague 2011 was “XML as new lingua franca for the Web. Why did it never happen?”.

Michael Sperberg-McQueen (Closing keynote)The actual content of the conference has been close to this tag line but was better summarized by Michael Sperberg-McQueen during his closing keynotes: “Let’s put XML in the browser, whether they want it there or not!”

Norman Walsh (HTML+XML: The W3C HTML/XML Task Force)The tone was given by Norman Walsh during the very first session: the convergence between HTML and XML will not happen.

XML has been trying hard to be an application neutral format for the web that could be used both for documents and data. It is fair to say that it has failed to reach this goal and that the preferred formats on the web are HTML for documents and JSON for data.

That doesn’t seem to bother that much the XML Prague attendees who are markup language addicts anyway: if the “mass of web developers” do not care about XML that’s their problem. The benefits of using XML is well known and that just means that we have to develop the XML tools we need on the server as well as on the browser.

Following this line, many sessions were about developing XML support on the browser and bridging the gaps between XML and HTML/JSON:

  • Client-side XML Schema validation by Henry S. Thompson and Aleksejs Goremikins
  • JSON for XForms by Alain Couthures
  • XSLT on the browser by Michael Kay
  • Efficient XML Processing in Browsers by Alex Milowski
  • XQuery in the Browser reloaded by Peter Fischer

By contrast, server side tools have been less represented, maybe because the domain had been better covered in the past:

  • A JSON Facade on MarkLogic Server by Jason Hunter
  • CXAN: a case-study for Servlex, an XML web framework by Florent Georges
  • Akara – Spicy Bean Fritters and XML Data Services by Uche Ogbuji

Of course, standard updates were also on the program:

  • HTML+XML: The W3C HTML/XML Task Force (already mentioned) by Norman Walsh
  • Standards update: XSLT 3.0 by Michael Kay
  • Standards update: XML, XQuery, XML Processing Profiles, DSDL by Liam Quin, Henry S. Thompson, Jirka Kosek

We also had talks about XML applications:

  • Configuring Network Devices with NETCONF and YANG by Ladislav Lhotka
  • Advanced XML development – XML Projects by George Bina
  • EPUB3: Global Language and Comic by Murata Makoto
  • EPUB: Chapter and Verse by Tony Graham
  • DITA NG – A Relax NG implementation of DITA by George Bina

Without forgetting a couple of implementation considerations:

  • Translating SPARQL and SQL to XQuery by Martin Kaufmann
  • Declarative XQuery Rewrites for Profit or Pleasure by John Snelson

And the traditional and always impressive closing keynote by Michael Sperberg-McQueen.

My own presentation, “XQuery injection”, was quite atypical and it took all the talent of Michael Sperberg-McQueen to kindly relate it to “XML on the web” by noticing that security would have to be taken more seriously to make it happen.

One of the things that had impressed me during XTech conferences was the shift in presentation styles, most speakers moving away from heavy bullet points stuffed traditional powerpoint presentations to lighter and better illustrated shows.

John Snelson (Declarative XQuery Rewrites for Profit or Pleasure)I had expected the move to continue and have been surprised to see that the movement doesn’t seem to have caught XML Prague presenters whom continued to do with traditional bullet points with only a couple of exceptions (John Snelson being a notable exception).

I had worked my presentation to use what I thought would be a common style. Using Slidy, I had created no less than 35 short pages to present in 25 minutes. Each page had a different high resolution picture as a background and contained only a few words.

The comments have been generally good even though some pictures chosen to represent injections seem to have hurt the feelings of some attendees.

Since my presentation is just standard HTML, I had been brave enough to use the shared computer. Unfortunately, the presentation loads 74 Megs of background pictures and that was a little bit high for the shared computer that took several seconds to change pages (note to self: next time, use your own laptop)!

The twitter wall (and Norman Walsh)Another interesting feature of this conference was the “twitter wall” that was projected in the room using a second video projector.

This wall has proven to be very handy to communicate during the sessions and it can be seen like a more modern incarnation of the IRC channels used in earlier conferences.

Unfortunately, twitter doesn’t allow to search in archives and while I am writing these words, I can no longer go back in the past and read the tweets of the first day of the conference.

Looking backward at the conference, I have mixed feelings about this gap that now seems to be widely accepted on both sides between the XML and the web developers communities.

The dream that XML could be accepted by the web community at large was a nice vision and we should not forget that XML has been designed to be “SGML on the web“.

Web developers have always been reluctant to accept the perceived additional complexity of XHTML and the gap has been there from the beginning and after XML missed the train of Web 2.0 it was too late to close it.

XML on the web will stay a niche and will be used by a minority but the creativity and dynamism of the community shown at Prague is inspiring and encouraging: there is still room for a lot of innovation and XML is more than ever the technology of choice to power web applications.

All the pictures

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

HTML 5 turns documents into applications

 Voir aussi la version française de cet article sur XMLfr.

HTML 5 is not just HTML 4 + 1

This announcement has been already widely commented and I won’t come back on the detail of the differences between HTML 4.1 and HTML 5 which are detailed in one of the documents published with the Working Draft. What I find unfortunate is that this document and much of the comments about HTML 5 focus on the detail of the syntactical differences between these versions rather than commenting more major differences.

These differences are clearly visible as soon as you read the introduction:

The World Wide Web’s markup language has always been HTML. HTML was primarily designed as a language for semantically describing scientific documents, although its general design and adaptations over the years has enabled it to be used to describe a number of other types of documents.

The main area that has not been adequately addressed by HTML is a vague subject referred to as Web Applications. This specification attempts to rectify this, while at the same time updating the HTML specifications to address issues raised in the past few years.

This introduction does a good job in setting the context and expectations: the goal of HTML 5 is to move from documents to applications and this is confirmed in many other places, such as for instance the section titled “Relationship to XUL, Flash, Silverlight, and other proprietary UI languages”:

This specification is independent of the various proprietary UI languages that various vendors provide. As an open, vender-neutral language, HTML provides for a solution to the same problems without the risk of vendor lock-in.

To understand this bold move, we need to set this back into context.

Nobody denies that HTML has been created to represent documents, but its success comes from its neutrality: even if it is fair to say that Web 2.0 is the web has it was meant to be, the makers of HTML couldn’t imagine everything that can be done in modern web applications. If these applications are possible in HTML, this is because HTML has been designed to be neutral enough to describe yesterday’s, today’s and probably tomorrow’s applications.

If on the contrary, HTML 4.01 had attempted to describe, in 1999, what was a web application, it is pretty obvious that this description would have had to be in the best case worked around and that it might even have slowed down the development of Web 2.0.

This is the reason why I would make to HTML 5 the same kind of criticism I made to W3C XML Schema: over specifying how to use a document is a risk to block creativity and increase the coupling between applications.

While many people agree that web applications should be designed as documents, HTML 5 appears to propose to move from documents to applications. This seems to me to be a major step… backward!

Flashback on HTML’s history

Another point that needs to be highlighted are the relations between HTML 5 and XML in general and XHTML in particular.

HTML 5 presents itself as the sibling of both HTML 4.01 and XHTML 1.1 and as a competitor of XHTML 2.0.

To understand why the W3C is developing two competing standards, we need a brief reminder of the history of HTML.

HTML has been originally designed as a SGML vocabulary and uses some of its features to reduce the verbosity of its documents. This is the case for instance of tags such as <img> or <link> that do not need to be closed in HTML.

XML has been designed to be a simplification of SGML and this simplification does not allow to use the features used by HTML to reduce its verbosity.

When XML has been published, W3C found themselves with a SGML application in one hand (HTML) and a simplification of SGML in the other hand (XML) and these two recommendations were incompatible.

To make these recommendations compatible, they decided to create XHTML 1.0 which is a revamping of HTML to be compatible with the XML recommendation while keeping th exact same features. This lead to XHTML 1.0 and then XHTML 1.1 which is roughly the same thing cut into modules that can be used independently.

One of the weaknesses of HTML being its forms, W3C did also work on XForms, a new generation of web forms and started to move forward working on a new version of XHTML with new features, XHTML 2.0 still work in progress.

The approach looked so obvious that W3C has probably neglected to check that the community was still following its works. With the euphoria that followed the publication of XML 1.0 many people were convinced that the browsers war was over, the interest for HTML which was partly fueled by this war started to decline and the W3C works in this domain didn’t seem to raise that much interest compared to let’s say XML Schema languages or Web Services.

It is also fair to say that the practical interest to move from HTML to XHTML wasn’t (and still isn’t) obvious for web site developers since the features are the same. Migrating a site from HTML to XHTML involves an additional work which is only compensated by the joy of displaying a “W3C XHTML 1.x compliant” logo!

This is also the moment when Microsoft stopped any development on Internet Explorer and Netscape transferred their development to Mozilla.

The old actors from the browsers war, well represented at the W3C which was one of their battle fields led the way to new actors, Mozilla, Opera and Apple/Safari younger and less keen to accept the heaviness of W3C procedures.

At the same time, the first Web 2.0 applications sparkled a new wave of creativity among web developers and all this happened outside the W3C. This is not necessarily a bad thing since the mission of standard bodies such as W3C is to standardize rather than innovate, but the W3C doesn’t appear to have correctly estimated the importance of these changes and seems to have lost the contact with their users.

And when these users, led by Opera, Mozilla and Safari decided that it was time to move HTML forward, rather than jump into the XHTML 2.0 wagon, they decided to create their own Working Group, WHATWG, outside the W3C. This is where the first versions of HTML 5 have been drafted together with Web Forms 2.0, a sister documentation designed to be an enhancement of HTML forms simpler than Xforms.

Microsoft was still silent on this subject and the W3C saw themselves as editor of a promising new specification, XHTML 2.0 which didn’t seem to attract much attention while, outside, a new specification claiming to be the true successor of HTML was being developed by the most promising outsiders in the browser market.

At XTech 2007, I had a chance to measure the depth of the channel that separates the two communities by attending to a debate between both working groups.

Tim Berners-Lee must have found that this channel was too deep when he took the decision to invite the WHATWG to continue their work within the W3C in a Working Group created for this purpose and distinct from the XHTML 2.0 Working Group that continues their work as if nothing has changed.

HTML 5 or XHTML 2.0?

So, the W3C has now two distinct and competing Working Groups.

Missons are very close

The XHTML 2.0 Working Group develops an extensible vocabulary based on XML:

The mission of the XHTML2 Working Group is to fulfill the promise of XML for applying XHTML to a wide variety of platforms with proper attention paid to internationalization, accessibility, device-independence, usability and document structuring. The group will provide an essential piece for supporting rich Web content that combines XHTML with other W3C work on areas such as math, scalable vector graphics, synchronized multimedia, and forms, in cooperation with other Working Groups.

The HTML Working Group focuses on the continuity with previous HTML versions:

The mission of the HTML Working Group, part of the HTML Activity, is to continue the evolution of HTML (including classic HTML and XML syntaxes).

The conciseness of this sentence doesn’t imply that the HTML Working Group isn’t worried about extensibility and cross platform support since the list of deliverables says “there is a single specification deliverable for the HTML Working Group, the HTML specification, a platform-neutral and device-independent design”and later on “the HTML WG is encouraged to provide a mechanism to permit independently developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and RDFa to be mixed into HTML documents”.

The policy is thus clearly, taking the risk to see a standards war develop, to develop two specifications and let user choose.

XHTML 5 is a weak alibi

We find this policy within the HTML 5 specification that proposes to choose between two syntaxes:

This specification defines an abstract language for describing documents and applications, and some APIs for interacting with in-memory representations of resources that use this language.

The in-memory representation is known as “DOM5 HTML”, or “the DOM” for short.

There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.

The first such concrete syntax is “HTML5”. This is the format recommended for most authors. It is compatible with all legacy Web browsers. If a document is transmitted with the MIME type text/html, then it will be processed as an “HTML5” document by Web browsers.

The second concrete syntax uses XML, and is known as “XHTML5”. When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is processed by an XML processor by Web browsers, and treated as an “XHTML5” document. Authors are reminded that the processing for XML and HTML differs; in particular, even minor syntax errors will prevent an XML document from being rendered fully, whereas they would be ignored in the “HTML5” syntax.

This section, which by chance is non-normative, appears to exclude that a browser might accept any other HTML document than HTML5 or any XHTML other than XHTML5!

Furthermore, with such a notice, I wonder who would want to choose XHTML 5 over HTML5…

This notice relies on a frequent misunderstanding of the XML recommendation. It is often said that XML parsing must stop after the first error, but the recommendation is much more flexible than that and distinguishes two types of errors:

  • An error is “a violation of the rules of this specification; results are undefined. Unless otherwise specified, failure to observe a prescription of this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT, SHALL and SHALL NOT is an error. Conforming software MAY detect and report an error and MAY recover from it.”
  • A fatal errors is “an error which a conforming XML processor MUST detect and report to the application. After encountering a fatal error, the processor MAY continue processing the data to search for further errors and MAY report such errors to the application. In order to support correction of errors, the processor MAY make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document’s logical structure to the application in the normal way).”

We see that on the contrary, the XML recommendation specifies that a XML processor can correct simple errors.

One may argue that what XML considers as a fatal error can be considered by users as simple errors, this would be the case for instance of a <img> tag that wouldn’t be closed. But even for fatal errors, the recommendation doesn’t stipulate that the browser should not display the document. It does require that the parser report the error to the browser but doesn’t say anything about how the browser should react. Similarly, the recommendation imposes that normal processing should stop because the parser would be unable to reliability report the structure of the document but doesn’t say that the browser shouldn’t switch to a recovery mode where it could try to correct this error.

In fact, if browsers are so strict when they display XML documents, this isn’t to be conform to the XML recommendation but because there was a consensus that they should be strict at the time when they implemented their XML support.

At that time, everyone had in mind the consequence of the browsers war that was one of the reasons why browsers accepted pretty much anything that pretended to be HTML. While this can be considered a good thing in some cases, this also means implementing a lot of undocumented algorithms and this leads to major interoperability issues.

The decision to be strict when displaying XML documents came as a new era good resolution and nobody seemed to dissent at that time.

If this position needs to be revisited, it would be ridiculous to throw away XML since we have seen that it isn’t imposed by the recommendation.

The whole way in which the two HTML5 syntaxes are presented is a clear indication thet the XML syntax which was not mentioned in the first HTML5 drafts has been added as a compromise so that the W3C doesn’t look like if they rejected XML, but that the idea is to maintain and promote a non XML syntax.

HTML 5 gets rid of its SGML roots

Not only does HTML 5 rejects XML, but it also abandons any kind of compatibility with SGML and says clearly “while the HTML form of HTML5 bears a close resemblance to SGML and XML, it is a separate language with its own parsing rules”.

This sentence is symptomatic of the overall attitude of the specification that seems to pretend to build on the experience of the web and ignore the experience of markup languages, taking the risk once again, to freeze the web to its current status.

The attitude of the XHTML Working Group is better balanced. Of course, XHTML 2.0 is about building on the most recent web development, but it doesn’t do so without keeping the experience acquired while developing XML and SGML vocabularies.

Technical approaches radically different

Without entering into a detailed comparison, two points are worth mentioning.

XHTML 2.0 is more extensible

Both specifications acknowledge the need to take into account the requirements that have appeared since HTML has been created when these are not correctly supported, but the method to do so is totally different.

HTML 5 has adopted a method that looks simple: if a new need is considered important enough, a new element is added. Since many pages contain articles, a new <article> element is added. And since most pages have navigation bars, a new <nav> element is added…

We have seen with the big vocabularies in document applications what are the limits of this approach: this leads to an explosion of the number of elements and the simplicity turns into complexity. It becomes difficult to choose between elements and pick the right one and since these elements are specialized, they never meet exactly your needs

Using this approach with HTML is more or less a way to transform it into a kind of DocBook clone for the web in the long term.

XHTML 2.0 has taken an opposite approach. The idea is, on the contrary, to start with a clean up and remove any element from XHTML that isn’t absolutely necessary.

It relies then on current practices: how do we do to represent an article or a navigation bar? The most common approach is to use a standard element, often a <div> and hijack the class attribute to apply a CSS style or a JavaScript animation.

The downside is that the values of the class attribute aren’t standardised and that the class attribute is used to convey information about the meaning of an element rather than define the way it should be displayed. This kind of hijack is pretty common since this is also the foundation of microformats.

To avoid this hijack while keeping the flexibility if this approach, XHTML 2.0 proposes to add a role attribute that defines the role of XHTML elements. This attribute can take a set of predefined values together with ad hoc values differentiated by their namespaces.

This method is a way to introduce the same kind of features that will be added to HTML 5 without adding new elements. This is more flexible since anyone can create new values in new namespaces. This also gives microformats a way to build upon something more solid than the class attribute that can be used again to define how elements should be presented.

Documents versus applications

Another important point that differentiate these two specification is their balance between data and applications or treatments.

XHTML 2.0 is built upon the XML stack:

  • The lower level is syntactical and consists of the XML and namespaces recommendations.
  • On top of this layer, the XML infoset defines a data model independent of any kind of treatment.
  • APIs, specific languages (XPath, XQuery, …) and schema languages are built on to of this data model.

It took some few years to build this architecture and things haven’t always been that clear and simple, but its big benefit is to separate data and treatments and be just the right one for a weak coupling between applications.

We’ve seen that HTML 5 has cut all its links to XML and SGML and that means that it doesn’t rely on this architecture. On the contrary, this specification mixes everything, syntax, data model and API (DOM) in a single specification.

This is because, as we’ve already seen, HTML 5 is a vocabulary to develop web applications rather than a vocabulary to write documents.

The difference seems important to me in not only in term of architecture but also in term of sustainability. Everyone agrees that XML is one of the best formats for long term preservation of documents. What is true of documents is probably not true of applications and I don’t think a HTML 5 application with a good proportion of JavaScript will be as sustainable as a XHTML 2.0 document.

The architecture on which XHTML 2.0 is built doesn’t prevent people from developing applications, but it dissociates more clearly these applications from the content.

Furthermore, XHTML 2.0 is also trying to develop and promote declarative alternatives such as XForms to define web applications that should be a better fit than JavaScript for documents.

Will the best specification win?

For all these reasons, HTML 5 looks to me as a big step backward and XHTML 2.0 seems to be a much better alternative.

Does that mean that XHTML 2.0 will be the winner or on the contrary, does the fact that HTML 5 is written by those who develop web browsers mean that XHTML 2.0 is doomed?

XHTML 2.0 has a strong handicap, but the battle isn’t lost yet. The HTML Working Group doesn’t expect that HTML 5 becomes a recommendation before Q3 20010 and before that date everything can happen.

It is up to us, the users, to vote with our feet and pen and start by boycotting the HTML 5 features that are already implemented in some browsers.

And short term, certifying that a page us XHTML 1.x valid is a good way to certify that it doesn’t contain HTML 5 features!

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites