XML, the eX Markup Language?

Note: this article is a copy of the paper that I have presented at XML Prague 2012.

Abstract

Revisiting the question that was the tag line of XML Prague last year: “XML as new lingua franca for the Web. Why did it never happen?”, Eric tries to answer to other questions such as: “where is XML going?” or “is XML declining, becoming an eX Markup Language?”.

XML as new lingua franca for the Web. Why did it never happen?

This was the tagline of XML Prague 2011, but the question hasn’t really been answered last year and I’ll start this talk to give my view on that question.

Flashback

February 1998 is a looong time ago, a date from another century and for those of you who were not born or don’t remember, here is a small summary of what did happen in February 1998:

February

Wikipedia

While the Iraq disarmament crisis was raging, the World Wide Web Consortium waited until the third day of the Winter Olympics held in Nagano to make the following announcement:

Advancing its mission to lead the Web to its full potential, the World Wide Web Consortium (W3C) today announced the release of the XML 1.0 specification as a W3C Recommendation. XML 1.0 is the W3C’s first Recommendation for the Extensible Markup Language, a system for defining, validating, and sharing document formats on the Web
W3C Press Release (February 1998)

People curious enough to click on the second link of the announcement could easily double check that beyond the marketing bias XML was something to be used over the Internet:

The design goals for XML are:

  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.
W3C Recommendation (February 1998)

And the point was reinforced by the man who had led the “Web SGML” initiative and is often referred to as the father of XML:

XML arose from the recognition that key components of the original web infrastructure — HTML tagging, simple hypertext linking, and hardcoded presentation — would not scale up to meet the future needs of the web. This awareness started with people like me who were involved in industrial-strength electronic publishing before the web came into existence.
Jon Bosak

This has often been summarized saying that XML is about “putting SGML on the Web”.

Among the design goals the second one (“XML shall support a wide variety of applications”) has been especially successful and by the end of 98, Liora Alschuler reported that the motivations of the different players pushing XML forward were very diverse:

The big-gun database vendors, IBM and Oracle, see XML as a pathway into and out of their data management tools. The big-gun browser vendors, Netscape and Microsoft, see XML as the e-commerce everywhere technology. The big-gun book and document publishers, for all media, are seeing a new influx of tools, integrators, and interest but the direction XML publishing will take is less well-defined and more contingent on linking and style specs still in the hands of the W3C.
Liora Alschuler for XML.com (December 1998)

One thing these “big-gun” players that were pushing XML to different directions did achieve has been to develop an incredible hype that rapidly covered everything and in 2001 the situation had become hardly bearable:

Stop the XML hype, I want to get offAs editor of XML.com, I welcome the massive success XML has had. But things prized by the XML community — openness and interoperability — are getting swallowed up in a blaze of marketing hype. Is this the price of success, or something we can avoid?
Edd Dumbill (March 2001)

Marketers behind the hype being who they were, the image of XML that they promoted was so shiny that the XML gurus didn’t recognize their own technology and tried to fight against the hype:

I’ve spent years learning XML / I like XML / This is why www.XmlSuck.com is here
PaulT (January 2001)

The attraction was high and people rushed to participate to the W3C working groups:

Working Group size – so many people means it is difficult to gain consensus, or even know everyone’s face. Conference calls are difficult.
Mark Nottingham, about the SOAP W3C WG (May 2000)

Huge working groups with people pushing to different directions is not the best recipe to publish high quality standards and even though XML itself was already baked, the perception of XML depends on the full “stack”:

This is a huge responsibility for the Schema Working Group since it means that the defects of W3C XML Schema will be perceived by most as defects of XML.
Eric van der Vlist on xml-dev (April 2001)

The hype was so huge that XML geeks rapidly thought that they had won the war and that XML was everywhere:

XML is now as important for the Web as HTML was to the foundation of the Web. XML is everywhere.
connet.us (February 2001)

Why this hype? My guess is that the IT industry had such a desperate need for a data interchange format that any one of them could have been adopted at that time and that XML happened to be the one that went through the radar screen at the right moment:

When the wind is strong enough, even flatirons can fly.
Anonymous (February 2012)

The W3C had now to maintain:

  • XML, a SGML subset
  • HTML, a SGML application that did not match the XML subset

Technically speaking, the thing to do was to refactor HTML to meet the XML requirements. Given the perceived success of XML, it seemed obvious that everyone would jump into the XML wagon and be eager to adopt XHTML.

Unfortunately from a web developer perspective the benefits of XHTML 1.0 were not that obvious:

The problem with XHTML is :a) it’s different enough from HTML to create new compatibility problems.b) it’s not different enough from HTML to bring significant advantages.
Eric van der Vlist on XHTML-DEV (May 2000)

It is fair to say that Microsoft had been promoting XML since the beginning:

XML, XML, EverywhereThere’s no avoiding XML in the .NET world. XML isn’t just used in Web applications, it’s at the heart of the way data is stored, manipulated, and exchanged in .NET systems.
Rob Macdonald for MSDN (February 2001)

However, despite their strong commitment to XML, Microsoft had frozen new developments on Internet Explorer. The browser has never been updated to support the XHTML media type, meaning that the few web sites using XHTML had to serve their pages as HTML!

By 2001, the landscape was set:

  • XML had become a dominant buzzword giving a false impression that it had been widely adopted
  • Under the hood, many developers were deeply upset by this hype even among the XML community
  • Serving XHTML web pages as such was not an option for most web sites

The landscape was set, but the hype was still high and XML was still gaining traction as a data interchange format.

In the meantime, another hype was growing…

Wikipedia has tracked the origin of the term Web 2.0 back to 1999:

The Web we know now, which loads into a browser window in essentially static screenfuls, is only an embryo of the Web to come…./…Ironically, the defining trait of Web 2.0 will be that it won’t have ant visible characteristics at all. The Web will be identified only by its underlying DNA structure– TCP/IP (the protocol that controls how files are transported across the Internet); HTTP (the protocol that rules the communication between computers on the Web), and URLs (a method for identifying files).

…/…

The Web will be understood not as screenfuls of text and graphics but as a transport mechanism, the ether through which interactivity happens.

Darcy DiNucci (1999)

The term became widely known with the first Web 2.0 conferences in 2003 and 2004 and XML was an important piece of the Web 2.0 puzzle through Ajax (Asynchronous JavaScript and XML), coined and defined by Jesse James Garrett in 2005 as:

Ajax isn’t a technology. It’s really several technologies, each flourishing in its own right, coming together in powerful new ways. Ajax incorporates:

Jesse James Garrett (February 2005)

This definition shows how, back in 2005, some of us still thought that XML could dominate the Web and be used both to exchange documents (in XHTML) and data.

Unfortunately, this vision defended by the W3C, has been rapidely torpedoed by Ian Hickson and Douglas Crockford.

Founded in 1994 for that purpose, the W3C had been the place where HTML had been normalized. Among other things, the W3C had been the place where the antagonists of the first browser war could meet and discuss in a neutral field.

In 2004, Netscape had disappeared, Microsoft had frozen the development of their browser and browser innovation moved into the hand of new players: Mozilla, Apple/Safari and Opera who was starting to gain traction.

Complaining that the W3C did not meet their requirements and that HTML needed to be updated urgently to meet the requirements what would be soon known as Web 2.0, they decided to fork the development of HTML:

Software developers are increasingly using the Internet as a software platform, with Web browsers serving as front ends for server-based services. Existing W3C technologies — including HTML, CSS and the DOM — are used, together with other technologies such as JavaScript, to build user interfaces for these Web-based applications.However, the aforementioned technologies were not developed with Web Applications in mind, and these systems often have to rely on poorly documented behaviors. Furthermore, the next generation of Web Applications will add new requirements to the development environment — requirements these technologies are not prepared to fulfill alone. The new technologies being developed by the W3C and IETFcan contribute to Web Applications, but these are often designed to address other needs and only consider Web Applications in a peripheral way.The Web Hypertext Applications Technology working group therefore intends to address the need for one coherent development environment for Web Applications. To this end, the working group will create technical specifications that are intended for implementation in mass-market Web browsers, in particular Safari, Mozilla, and Opera.
WHATWG (June 2004)

The W3C was behind a simple choice: either push XHTML recommendations that would never be implemented in any browsers or ditch XHTML and ask the WHATWG to come back and continue their work toward HTML5 as a W3C Working Group. The later option was eventually chosen and HTML work resumed within W3C in 2007.

JSON was around since 2001. It took a few years of Douglas Crockford’s energy to popularize this JavaScript subset but around 2005, JSON rapidly became a technology of choice as a “Fat-Free Alternative to XML” in Ajax applications.

There is no direct link between HTML5 and JSON but the reaction against XML, its hype and its perceived complexity is a strong motivation in both cases.

Why?

A number of reasons can be found for this failure:

  • Bad timing between the XML and HTML specifications (see Adam Retter’s presentation at XML Amsterdam 2011).
  • Lack of quality of some XML recommendations (XML Namespaces, XML Schema, …).
  • Lack of pedagogy to explain why XML is the nicer technology on the earth.
  • Dumbness of Web developers who not use XML.

There is some truth in all these explanations, but the main reason is that from the beginning we (the XML crowd) have been arrogant, over confident and have made a significant design error.

When we read this quote:

XML arose from the recognition that key components of the original web infrastructure — HTML tagging, simple hypertext linking, and hardcoded presentation — would not scale up to meet the future needs of the web. This awareness started with people like me who were involved in industrial-strength electronic publishing before the web came into existence.
Jon Bosak

We all understand what Jon Bosak meant and we probably all agree that HTML is limited and that something more extensible makes our lives easier, but we must also admit that we have been proven wrong and that HTML has been enough to scale up to the amazing applications we see today.

Of course, the timing was wrong and everything would have been easier if Tim Berners-Lee had came up with a first version of HTML that would have been a well formed XML document but on the other hand, the web had to exist before we could put SGML on the web and there had to be a prior technology.

In 1998 it was already clear that HTML was widespread and the decision to create XML as a SGML subset that would be incompatible with HTML has been a bad one:

  • Technically speaking because that meant that millions of existing pages would be non well formed XML (“the first Google index in 1998 already had 26 million pages“).
  • Tactically speaking because that could be understood as “what you’ve done so far was crappy, now you must do what we tell you to do”.

To avoid this deadly risk, the first design goal of XML should have been that existing valid HTML documents were well formed XML documents. The result might have been a more complex format and specification, but this risk to create a gap between XML and HTML communities would have been minimized.

Another reason to explain this failure is that XML is about extensibility. This is both its main strength and weakness: extensibility comes at a price and XML is more complex than domain specific languages.

Remove the need for extensibility and XML will always loose against DSLs, we’ve seen a number of examples in the past:

  • RELAX NG compact syntax
  • JSON
  • HTML
  • N3
  • CSS

Is it a time to refactor XML? Converge or convert?

Hmmm… It’s time to address the questions asked this year by XML Prague!

We’ve failed to establish XML as the format to use on the web but we’ve succeeded in creating a strong toolbox which is very powerful to power websites and exchange information.

I don’t know if it’s to compensate the ecosystems that we are destructing on our planet, but one of the current buzzwords among developers is “ecosystem”: dominant programming languages such as Java and JavaScript are becoming “ecosystems” that you can use to run a number of applications that may be written using other programming languages.

What we’ve built with XML during the past 14 years is a very strong ecosystem.

The XML ecosystem is based on an (almost) universal data model that can not only represent well formed XML documents but also HTML5 documents and (with an impedance mismatch that may be reduced in future versions) JSON objects.

<div class=
" />
Note
Notable exceptions that cannot be represented by the XML data model include overlapping structures and graphs.

On top of this data model, we have a unique toolbox that includes:

  • transformation and query languages
  • schema languages
  • processing (pipeline) languages
  • databases
  • web forms
  • APIs for traditional programming languages
  • signature and encryption standards
  • a text based serialization syntax
  • binary serialization syntaxes

We can truly say that what’s important in XML is not the syntax but that:

Angle Brackets Are a Way of Life
Planet XMLHack

Rather than fighting fights that we’ve already lost we need to develop our ecosystem.

The number one priority is to make sure that our data model embraces the web that is taking shape (which means HTML5 and JSON) as efficiently as possible. Rather than converge or convert we must embrace, the actual syntax is not that important after all!

To grow our ecosystem, we could also consider embracing more data models, such as graphs (RDF), name/value pairs (NOSQL), relations (SQL), overlaps (LMNL).

I am more skeptical about refactoring XML at that stage.

It’s always interesting to think about what could be done better, but refactoring a technology as widespread as XML is tough and needs to be either backward compatible or provide a huge benefit to compensate the incompatibilities.

Will we see a proposal that will prove me wrong during the conference?

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites
This entry was posted in English, Internet, XML. Bookmark the permalink.

7 Responses to XML, the eX Markup Language?

  1. I don’t see here IMHO the crucial difference between XML-as-document and XML-as-data-transfer-format. Whereas the first (where XML is truly ancestor of SGML) is IMHO success (see Docbook, TEI, and if you want programmatically process HTML, then it as well), the later (which was always pushing XML to do something which it wasn’t designed to do) is pretty much failure and JSON, YAML & al. already ate its lunch.

    • Matěj,

      I see what you mean, but did DocBook, TEI and HTML need XML? They were already popular SGML vocabularies and I don’t think they required XML to work well. I would even argue that the huge refactoring that has been required to rebuild (reinvent?) the XML stack has been a loss of time for them and that some of the components of this stack (such as XML Schema) are not very relevant for them.

      The promise of XML to be “application agnostic” and be useful both for document and for data oriented applications still makes a lot of sense to me.

      XML as a data transfer format can’t be considered as a complete failure either and XML is still being used a lot in enterprise proprietary applications but ironically it’s on the web that it has lost the battle against JSON.

      Eric

      • It is unpleasant how much you are right ;). There is a bit of advantage in the very rich XML toolchain we have comparing to poor old nsgmls, but yes I the advantage of XML for Docbook, TEI is an interesting question. Hmm.

        And yes, JSON ate a lot of lunch from XML when either developer holds both ends of the communication (which happens a lot with AJAX apps; or at least where there is just one server, e.g., Bugzilla JSON-RPC) or when the format is rather simple so there are no doubts. Looking over the shoulder of my JBoss colleagues, I cannot imagine that JSON-based format (without validation, and if you want validation for JSON, then I really don’t know why you don’t use XML) would be able to supply multi-server multi-client standard format of many highly competing vendors at least on the level of J2EE & al.

        Maybe we just believed too much XML propaganda. Maybe XML is just good for something and it is not panacea for all evils of the world (BTW, I do prefer XML over SGML even for Docbook/TEI/OSIS)? Maybe http://www.tbray.org/ongoing/When/200x/2006/12/21/JSON (to have it from the horse’s mouse so to say it) is right and JSON and XML could live side by side, and both of them have enough raison d’être to survive.

  2. Pingback: Introducing χίμαιραλ (chimeral), the Chimera Language | Eric van der Vlist

  3. Ryan Sharp says:

    The problem with XML (and most W3C standards for that matter) is that they seem to be completely divorced from the realities of the common use cases. Ultimate flexibility and extensibility are NOT worth the complexity of XML and they never will be.

    Using XML to express flat or shallow data structures is frankly rather moronic. Yet that’s exactly how people are being encouraged to use it. It’s like using a hammer to slice a tomato.

    The W3C really do need to get this “One True Way” approach out of their heads but it seems the exact opposite is happening right now. You are mustering ups plans and projections about how to ram XML down people’s throats from a new angle. Just let it die for heaven’s sake.

    • Ryan,

      Using XML to express flat or shallow data structures is frankly rather moronic.

      I agree that using the XML infoset to express flat or shallow data structures is kind of weird and that’s why I found the addition of maps in XDM 3.0 so useful. In that way, XDM is no longer XML…

      However, this won’t fly unless this is done in a smooth and coherent way (see Jeni Tennison’s keynote and χίμαιραλ).

  4. Peter Rushforth says:

    XML could directly embrace the key fact about the Web that makes it a Web: URIs and media types. If placed in xml: namespace, it would be backwards compatible with all existing XML. If not used, it would hurt nothing. If provided/ encouraged by the XML community, it would integrate XML with the Web in a way not yet seen, except perhaps in Atom, which is not for everyone and might be overkill for some applications. See http://www.w3.org/community/xmlhypermedia/ for details.

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter your OpenID as your website to log and skip name and email validation and moderation!

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>