English – Page 43 – Eric van der Vlist

Wireless

In the wild without your wireless phone? I dare you!

Cobweb

The first time I went out without a wireless phone I felt isolated, weak, out of reach and out of control: I was 100 meters from home but I could not be reached and could not immediately reach my relatives if anything happened. Anything could happen to them and anything could happen to me and we wouldn’t know.

This unpleasant feeling has been soon replaced by another one: anything could happen for the worse but also for the best and I was again fully in charge of myself and could think and look at the outside world without being interrupted.

These phones are wireless, but they keep you strongly wired to your network!

Why not try it?

Cut your wire: leave your wireless phone home and make an adventure out of the most boring shopping…

$Toiles d\'araignées$

See you in Prague

Just got the confirmation that I’ll be presenting a paper on XQuery injection at XML Prague March 26th or 27th.

I look forward to attending once again this conference which in a few years has become one of the best XML conferences.

String Formatting in XSLT 2.0

To migrate my photo albums from Gallery to WordPress, I ended up having to transform XML fragments such as:

<row>
  <field name="g_id">22909</field>
  <field name="g_description" xsi:nil="true" />
  <field name="g_keywords" xsi:nil="true" />
  <field name="g_summary" xsi:nil="true" />
  <field name="g_title">dsc00001</field>
  <field name="g_pathComponent">aaa.jpg</field>
  <field name="g_pathComponent">011120-Forum-XML-2001</field>
  <field name="g_orderWeight">1000</field>
</row>

Into SQL statements such as:

update
  wp_ngg_gallery g,
  wp_ngg_pictures p
set
  p.image_slug = "dsc00001",
  p.description = "dsc00001",
  p.alttext = "dsc00001",
  p.sortorder = 1
where
  p.galleryid = g.gid
  and g.title = "011120-Forum-XML-2001"
  and p.filename = "aaa.jpg";

That’s pretty obvious to do, but the usual technique is both verbose, boring and unreadable, giving something more or less like this:

<xsl:template match="row">
 <xsl:variable name="title" select="if (field[@name='g_title'] != '') then field[@name='g_title'] else field[@name='g_pathComponent'][1]"/>
 <xsl:variable name="description" select="if (field[@name='g_description'] != '') then field[@name='g_description'] else $title"/>
 <xsl:variable name="summary" select="if (field[@name='g_summary'] != '') then field[@name='g_summary'] else $title"/>

 <xsl:text><![CDATA[update
 wp_ngg_gallery g,
 wp_ngg_pictures p
set
 p.image_slug = "]]></xsl:text>
 <xsl:value-of select="$title"/>
 <xsl:text><![CDATA[",
 p.description = "]]></xsl:text>
 <xsl:value-of select="$description"/>
 <xsl:text><![CDATA[",
 p.alttext = "]]></xsl:text>
 <xsl:value-of select="$summary"/>
 <xsl:text><![CDATA[",
 p.sortorder = ]]></xsl:text>
 <xsl:value-of select="position()"/>
 <xsl:text><![CDATA[
where
 p.galleryid = g.gid
 and g.title = "]]></xsl:text>
 <xsl:value-of select="field[@name='g_pathComponent'][2]"/>
 <xsl:text><![CDATA["
 and p.filename = "]]></xsl:text>
 <xsl:value-of select="field[@name='g_pathComponent'][1]"/>
 <xsl:text><![CDATA[";
]]></xsl:text>
 </xsl:template>

Half way through typing this awful tag soup, I wished I could use some string formatting feature such as what I would have done in Python:

print '''
update
 wp_ngg_gallery g,
 wp_ngg_pictures p
set
 p.image_slug = "%(title)",
 p.description = "%(description)",
 p.alttext = "%(alttext)",
 p.sortorder = %(sortOrder)
where
 p.galleryid = g.gid
 and g.title = "%(galleryTitle)"
 and p.filename = "%(filename)";
''' % {"title": ..., "description: ...", ...}

Much better, don’t you agree?

The good news is that it’s easy to implement in XSLT 2.0!

I have chosen to use a « ${param-name} » syntax rather than the Pythonic « %(param-name) » but you could easily adapt the implementation to stick to the Pythonic syntax and the code is as simple as defining this named template:

<xsl:template name="template">
 <xsl:param name="template"/>
 <xsl:param name="parameters"/>
 <xsl:analyze-string select="$template" regex="\$\{{(\i\c*)\}}" flags="">
   <xsl:matching-substring>
     <xsl:value-of select="$parameters/*[name() = regex-group(1)]"/>
   </xsl:matching-substring>
   <xsl:non-matching-substring>
     <xsl:value-of select="."/>
   </xsl:non-matching-substring>
 </xsl:analyze-string>
</xsl:template>

Having done so, I can now use it to format my string:

 <xsl:variable name="title"
   select="if (field[@name='g_title'] != '') then field[@name='g_title'] else field[@name='g_pathComponent'][1]"/>
 <xsl:variable name="description" select="if (field[@name='g_description'] != '') then field[@name='g_description'] else $title"/>
 <xsl:variable name="summary" select="if (field[@name='g_summary'] != '') then field[@name='g_summary'] else $title"/>
 <xsl:call-template name="template">
   <xsl:with-param name="parameters">
     <title>
       <xsl:value-of select="$title"/>
     </title>
     <description>
       <xsl:value-of select="$description"/>
     </description>
     <alttext>
       <xsl:value-of select="$summary"/>
     </alttext>
     <gallery-title>
       <xsl:value-of select="field[@name='g_pathComponent'][2]"/>
     </gallery-title>
     <filename>
       <xsl:value-of select="field[@name='g_pathComponent'][1]"/>
     </filename>
     <sort-order>
       <xsl:value-of select="position()"/>
     </sort-order>
   </xsl:with-param>
   <xsl:with-param name="template"><![CDATA[update
 wp_ngg_gallery g,
 wp_ngg_pictures p
set
 p.image_slug = "${title}",
 p.description = "${description}",
 p.alttext = "${alttext}",
 p.sortorder = ${sort-order}
where
 p.galleryid = g.gid
 and g.title = "${gallery-title}"
 and p.filename = "${filename}";
]]></xsl:with-param>
</xsl:call-template>

Much nicer, don’t you think so?

Debian/Ubuntu PHP packages and virtual hosts: introducing adminstance

As a short term way to deal with my Debian/Ubuntu PHP packages and virtual hosts issue, I have written a pretty crude Python script that I have called « adminstance« .

This script can currently install, update and remove an instance of a web package such as websvn:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ ./adminstance


Usages:  

adminstance -h|--help
  print this message

adminstance -l|--list 
  lists the installed instances for this directory

adminstance -i|--install [-f|--force]  
  installs an instance for a root directory
  
adminstance -u|--update [-f|--force]  
  updates an instance for a root directory
  
adminstance -r|--remove [-f|--force] [-p|--purge]  
  removes an instance for a root directory

Options:

  -i, --install : action = installation 
  -f, --force   : when action = install, update or remove, install
                  without prompting the user for a confirmation
  -h, --help    : prints this message
  -l, --list    : action = list 
  -p, --purge   : when action = remove, remove also files and directories
                  under /var and /etc (by default, these are preserved)
  -r, --remove  : action = remove
  -u, --update  : action = update

To install an instance of websvn named « foo », type:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -i /usr/share/websvn/ foo
[sudo] password for vdv: 
install an instance of /usr/share/websvn/ named foo? (y|N) y
Copying /var/cache/websvn to /var/cache/adminstance/websvn/foo

Copying /usr/share/websvn to /usr/share/adminstance/websvn/foo

Copying /etc/websvn to /etc/adminstance/websvn/foo

Creating a symlink from /etc/adminstance/websvn/foo/config.php to /usr/share/adminstance/websvn/foo/include/config.php
Creating a symlink from /var/cache/adminstance/websvn/foo/tmp to /usr/share/adminstance/websvn/foo/temp
Creating a symlink from /var/cache/adminstance/websvn/foo to /usr/share/adminstance/websvn/foo/cache
Creating a symlink from /etc/adminstance/websvn/foo/wsvn.php to /usr/share/adminstance/websvn/foo/wsvn.php

To update it if you get a new version of websvn:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -u /usr/share/websvn/ foo
update an instance of /usr/share/websvn/ named foo? (y|N) y
Synchronizing /usr/share/websvn to /usr/share/adminstance/websvn/foo
rsync -a --delete /usr/share/websvn/ /usr/share/adminstance/websvn/foo/

Creating a symlink from /etc/adminstance/websvn/foo/config.php to /usr/share/adminstance/websvn/foo/include/config.php
Creating a symlink from /var/cache/adminstance/websvn/foo/tmp to /usr/share/adminstance/websvn/foo/temp
Creating a symlink from /var/cache/adminstance/websvn/foo to /usr/share/adminstance/websvn/foo/cache
Creating a symlink from /etc/adminstance/websvn/foo/wsvn.php to /usr/share/adminstance/websvn/foo/wsvn.php

To list the instances of websvn:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -l /usr/share/websvn/ 
List of instances for the package websvn:
	bar
	foo

To remove the instance foo:

dv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -r /usr/share/websvn/ foo
remove an instance of /usr/share/websvn/ named foo? (y|N) y
Deleting /usr/share/adminstance/websvn/foo
rm -r /usr/share/adminstance/websvn/foo

To remove it including its directory under /etc and /var:

vdv@studio:~/Documents/Dyomedea/code/adminstance$ sudo ./adminstance -rp /usr/share/websvn/ foo
remove an instance of /usr/share/websvn/ named foo? (y|N) y
Deleting /var/cache/adminstance/websvn/foo
rm -r /var/cache/adminstance/websvn/foo
Deleting /usr/share/adminstance/websvn/foo
rm -r /usr/share/adminstance/websvn/foo
Deleting /etc/adminstance/websvn/foo
rm -r /etc/adminstance/websvn/foo

It’s pretty basic and has a few limitations but that should be enough for me for the moment.

In the longer term, it should be possible to pack it as a .deb that uses dpkg triggers to automate the update of all its instances when a package is updated through apt…

Debian/Ubuntu PHP packages and virtual hosts

I am a big fan of the Debian packaging system and use it on my Ubuntu systems as much as I can as it greatly simplifies both the installation of new software and more important their maintenance and security updates.

There is unfortunately one downside that bites me so often that I am really surprised that nobody seems to care…

When you run a web server, it is often the case that you want to install popular web applications such as WordPress, Gallery, websvn or whatever and Debian/Ubuntu packages are perfectly fine until you want to run these applications on multiple virtual hosts.

To enforce the strict separation between /usr, /var and /etc that is part of the Debian policy, these packages usually put their PHP source files under /usr/share and replace the configuration files by symbolic links to files located under /etc. Symbolic links to files located under /var are also added in some cases.

I understand the reasons for this policy but when you want to run several instances of these applications, links from the source to a single set of configuration files just seem plain wrong! Ideally you’d want things to work the other way round and get instances that have their own configuration and variable space under /etc and /var and link to a common set of source files located under /usr.

Taking a package such as WordPress and converting it into a « virtual host friendly » form isn’t that difficult but as soon as you start modifying a package after it’s been installed you need to redo these modifications after each new package update and loose a lot of the benefit of using a package.

Have I missed something obvious and is there an easy solution for this issue?

HTML 5 turns documents into applications

Voir aussi la version française de cet article sur XMLfr.

HTML 5 is not just HTML 4 + 1

This announcement has been already widely commented and I won’t come back on the detail of the differences between HTML 4.1 and HTML 5 which are detailed in one of the documents published with the Working Draft. What I find unfortunate is that this document and much of the comments about HTML 5 focus on the detail of the syntactical differences between these versions rather than commenting more major differences.

These differences are clearly visible as soon as you read the introduction:

The World Wide Web’s markup language has always been HTML. HTML was primarily designed as a language for semantically describing scientific documents, although its general design and adaptations over the years has enabled it to be used to describe a number of other types of documents.

The main area that has not been adequately addressed by HTML is a vague subject referred to as Web Applications. This specification attempts to rectify this, while at the same time updating the HTML specifications to address issues raised in the past few years.

This introduction does a good job in setting the context and expectations: the goal of HTML 5 is to move from documents to applications and this is confirmed in many other places, such as for instance the section titled “Relationship to XUL, Flash, Silverlight, and other proprietary UI languages”:

This specification is independent of the various proprietary UI languages that various vendors provide. As an open, vender-neutral language, HTML provides for a solution to the same problems without the risk of vendor lock-in.

To understand this bold move, we need to set this back into context.

Nobody denies that HTML has been created to represent documents, but its success comes from its neutrality: even if it is fair to say that Web 2.0 is the web has it was meant to be, the makers of HTML couldn’t imagine everything that can be done in modern web applications. If these applications are possible in HTML, this is because HTML has been designed to be neutral enough to describe yesterday’s, today’s and probably tomorrow’s applications.

If on the contrary, HTML 4.01 had attempted to describe, in 1999, what was a web application, it is pretty obvious that this description would have had to be in the best case worked around and that it might even have slowed down the development of Web 2.0.

This is the reason why I would make to HTML 5 the same kind of criticism I made to W3C XML Schema: over specifying how to use a document is a risk to block creativity and increase the coupling between applications.

While many people agree that web applications should be designed as documents, HTML 5 appears to propose to move from documents to applications. This seems to me to be a major step… backward!

Flashback on HTML’s history

Another point that needs to be highlighted are the relations between HTML 5 and XML in general and XHTML in particular.

HTML 5 presents itself as the sibling of both HTML 4.01 and XHTML 1.1 and as a competitor of XHTML 2.0.

To understand why the W3C is developing two competing standards, we need a brief reminder of the history of HTML.

HTML has been originally designed as a SGML vocabulary and uses some of its features to reduce the verbosity of its documents. This is the case for instance of tags such as <img> or <link> that do not need to be closed in HTML.

XML has been designed to be a simplification of SGML and this simplification does not allow to use the features used by HTML to reduce its verbosity.

When XML has been published, W3C found themselves with a SGML application in one hand (HTML) and a simplification of SGML in the other hand (XML) and these two recommendations were incompatible.

To make these recommendations compatible, they decided to create XHTML 1.0 which is a revamping of HTML to be compatible with the XML recommendation while keeping th exact same features. This lead to XHTML 1.0 and then XHTML 1.1 which is roughly the same thing cut into modules that can be used independently.

One of the weaknesses of HTML being its forms, W3C did also work on XForms, a new generation of web forms and started to move forward working on a new version of XHTML with new features, XHTML 2.0 still work in progress.

The approach looked so obvious that W3C has probably neglected to check that the community was still following its works. With the euphoria that followed the publication of XML 1.0 many people were convinced that the browsers war was over, the interest for HTML which was partly fueled by this war started to decline and the W3C works in this domain didn’t seem to raise that much interest compared to let’s say XML Schema languages or Web Services.

It is also fair to say that the practical interest to move from HTML to XHTML wasn’t (and still isn’t) obvious for web site developers since the features are the same. Migrating a site from HTML to XHTML involves an additional work which is only compensated by the joy of displaying a “W3C XHTML 1.x compliant” logo!

This is also the moment when Microsoft stopped any development on Internet Explorer and Netscape transferred their development to Mozilla.

The old actors from the browsers war, well represented at the W3C which was one of their battle fields led the way to new actors, Mozilla, Opera and Apple/Safari younger and less keen to accept the heaviness of W3C procedures.

At the same time, the first Web 2.0 applications sparkled a new wave of creativity among web developers and all this happened outside the W3C. This is not necessarily a bad thing since the mission of standard bodies such as W3C is to standardize rather than innovate, but the W3C doesn’t appear to have correctly estimated the importance of these changes and seems to have lost the contact with their users.

And when these users, led by Opera, Mozilla and Safari decided that it was time to move HTML forward, rather than jump into the XHTML 2.0 wagon, they decided to create their own Working Group, WHATWG, outside the W3C. This is where the first versions of HTML 5 have been drafted together with Web Forms 2.0, a sister documentation designed to be an enhancement of HTML forms simpler than Xforms.

Microsoft was still silent on this subject and the W3C saw themselves as editor of a promising new specification, XHTML 2.0 which didn’t seem to attract much attention while, outside, a new specification claiming to be the true successor of HTML was being developed by the most promising outsiders in the browser market.

At XTech 2007, I had a chance to measure the depth of the channel that separates the two communities by attending to a debate between both working groups.

Tim Berners-Lee must have found that this channel was too deep when he took the decision to invite the WHATWG to continue their work within the W3C in a Working Group created for this purpose and distinct from the XHTML 2.0 Working Group that continues their work as if nothing has changed.

HTML 5 or XHTML 2.0?

So, the W3C has now two distinct and competing Working Groups.

Missons are very close

The XHTML 2.0 Working Group develops an extensible vocabulary based on XML:

The mission of the XHTML2 Working Group is to fulfill the promise of XML for applying XHTML to a wide variety of platforms with proper attention paid to internationalization, accessibility, device-independence, usability and document structuring. The group will provide an essential piece for supporting rich Web content that combines XHTML with other W3C work on areas such as math, scalable vector graphics, synchronized multimedia, and forms, in cooperation with other Working Groups.

The HTML Working Group focuses on the continuity with previous HTML versions:

The mission of the HTML Working Group, part of the HTML Activity, is to continue the evolution of HTML (including classic HTML and XML syntaxes).

The conciseness of this sentence doesn’t imply that the HTML Working Group isn’t worried about extensibility and cross platform support since the list of deliverables says “there is a single specification deliverable for the HTML Working Group, the HTML specification, a platform-neutral and device-independent design”and later on “the HTML WG is encouraged to provide a mechanism to permit independently developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and RDFa to be mixed into HTML documents”.

The policy is thus clearly, taking the risk to see a standards war develop, to develop two specifications and let user choose.

XHTML 5 is a weak alibi

We find this policy within the HTML 5 specification that proposes to choose between two syntaxes:

This specification defines an abstract language for describing documents and applications, and some APIs for interacting with in-memory representations of resources that use this language.

The in-memory representation is known as « DOM5 HTML », or « the DOM » for short.

There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.

The first such concrete syntax is « HTML5 ». This is the format recommended for most authors. It is compatible with all legacy Web browsers. If a document is transmitted with the MIME type text/html, then it will be processed as an « HTML5 » document by Web browsers.

The second concrete syntax uses XML, and is known as « XHTML5 ». When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is processed by an XML processor by Web browsers, and treated as an « XHTML5 » document. Authors are reminded that the processing for XML and HTML differs; in particular, even minor syntax errors will prevent an XML document from being rendered fully, whereas they would be ignored in the « HTML5 » syntax.

This section, which by chance is non-normative, appears to exclude that a browser might accept any other HTML document than HTML5 or any XHTML other than XHTML5!

Furthermore, with such a notice, I wonder who would want to choose XHTML 5 over HTML5…

This notice relies on a frequent misunderstanding of the XML recommendation. It is often said that XML parsing must stop after the first error, but the recommendation is much more flexible than that and distinguishes two types of errors:

An error is “a violation of the rules of this specification; results are undefined. Unless otherwise specified, failure to observe a prescription of this specification indicated by one of the keywords MUST, REQUIRED, MUST NOT, SHALL and SHALL NOT is an error. Conforming software MAY detect and report an error and MAY recover from it.”
A fatal errors is “an error which a conforming XML processor MUST detect and report to the application. After encountering a fatal error, the processor MAY continue processing the data to search for further errors and MAY report such errors to the application. In order to support correction of errors, the processor MAY make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document’s logical structure to the application in the normal way).”

We see that on the contrary, the XML recommendation specifies that a XML processor can correct simple errors.

One may argue that what XML considers as a fatal error can be considered by users as simple errors, this would be the case for instance of a <img> tag that wouldn’t be closed. But even for fatal errors, the recommendation doesn’t stipulate that the browser should not display the document. It does require that the parser report the error to the browser but doesn’t say anything about how the browser should react. Similarly, the recommendation imposes that normal processing should stop because the parser would be unable to reliability report the structure of the document but doesn’t say that the browser shouldn’t switch to a recovery mode where it could try to correct this error.

In fact, if browsers are so strict when they display XML documents, this isn’t to be conform to the XML recommendation but because there was a consensus that they should be strict at the time when they implemented their XML support.

At that time, everyone had in mind the consequence of the browsers war that was one of the reasons why browsers accepted pretty much anything that pretended to be HTML. While this can be considered a good thing in some cases, this also means implementing a lot of undocumented algorithms and this leads to major interoperability issues.

The decision to be strict when displaying XML documents came as a new era good resolution and nobody seemed to dissent at that time.

If this position needs to be revisited, it would be ridiculous to throw away XML since we have seen that it isn’t imposed by the recommendation.

The whole way in which the two HTML5 syntaxes are presented is a clear indication thet the XML syntax which was not mentioned in the first HTML5 drafts has been added as a compromise so that the W3C doesn’t look like if they rejected XML, but that the idea is to maintain and promote a non XML syntax.

HTML 5 gets rid of its SGML roots

Not only does HTML 5 rejects XML, but it also abandons any kind of compatibility with SGML and says clearly “while the HTML form of HTML5 bears a close resemblance to SGML and XML, it is a separate language with its own parsing rules”.

This sentence is symptomatic of the overall attitude of the specification that seems to pretend to build on the experience of the web and ignore the experience of markup languages, taking the risk once again, to freeze the web to its current status.

The attitude of the XHTML Working Group is better balanced. Of course, XHTML 2.0 is about building on the most recent web development, but it doesn’t do so without keeping the experience acquired while developing XML and SGML vocabularies.

Technical approaches radically different

Without entering into a detailed comparison, two points are worth mentioning.

XHTML 2.0 is more extensible

Both specifications acknowledge the need to take into account the requirements that have appeared since HTML has been created when these are not correctly supported, but the method to do so is totally different.

HTML 5 has adopted a method that looks simple: if a new need is considered important enough, a new element is added. Since many pages contain articles, a new <article> element is added. And since most pages have navigation bars, a new <nav> element is added…

We have seen with the big vocabularies in document applications what are the limits of this approach: this leads to an explosion of the number of elements and the simplicity turns into complexity. It becomes difficult to choose between elements and pick the right one and since these elements are specialized, they never meet exactly your needs

Using this approach with HTML is more or less a way to transform it into a kind of DocBook clone for the web in the long term.

XHTML 2.0 has taken an opposite approach. The idea is, on the contrary, to start with a clean up and remove any element from XHTML that isn’t absolutely necessary.

It relies then on current practices: how do we do to represent an article or a navigation bar? The most common approach is to use a standard element, often a <div> and hijack the class attribute to apply a CSS style or a JavaScript animation.

The downside is that the values of the class attribute aren’t standardised and that the class attribute is used to convey information about the meaning of an element rather than define the way it should be displayed. This kind of hijack is pretty common since this is also the foundation of microformats.

To avoid this hijack while keeping the flexibility if this approach, XHTML 2.0 proposes to add a role attribute that defines the role of XHTML elements. This attribute can take a set of predefined values together with ad hoc values differentiated by their namespaces.

This method is a way to introduce the same kind of features that will be added to HTML 5 without adding new elements. This is more flexible since anyone can create new values in new namespaces. This also gives microformats a way to build upon something more solid than the class attribute that can be used again to define how elements should be presented.

Documents versus applications

Another important point that differentiate these two specification is their balance between data and applications or treatments.

XHTML 2.0 is built upon the XML stack:

The lower level is syntactical and consists of the XML and namespaces recommendations.
On top of this layer, the XML infoset defines a data model independent of any kind of treatment.
APIs, specific languages (XPath, XQuery, …) and schema languages are built on to of this data model.

It took some few years to build this architecture and things haven’t always been that clear and simple, but its big benefit is to separate data and treatments and be just the right one for a weak coupling between applications.

We’ve seen that HTML 5 has cut all its links to XML and SGML and that means that it doesn’t rely on this architecture. On the contrary, this specification mixes everything, syntax, data model and API (DOM) in a single specification.

This is because, as we’ve already seen, HTML 5 is a vocabulary to develop web applications rather than a vocabulary to write documents.

The difference seems important to me in not only in term of architecture but also in term of sustainability. Everyone agrees that XML is one of the best formats for long term preservation of documents. What is true of documents is probably not true of applications and I don’t think a HTML 5 application with a good proportion of JavaScript will be as sustainable as a XHTML 2.0 document.

The architecture on which XHTML 2.0 is built doesn’t prevent people from developing applications, but it dissociates more clearly these applications from the content.

Furthermore, XHTML 2.0 is also trying to develop and promote declarative alternatives such as XForms to define web applications that should be a better fit than JavaScript for documents.

Will the best specification win?

For all these reasons, HTML 5 looks to me as a big step backward and XHTML 2.0 seems to be a much better alternative.

Does that mean that XHTML 2.0 will be the winner or on the contrary, does the fact that HTML 5 is written by those who develop web browsers mean that XHTML 2.0 is doomed?

XHTML 2.0 has a strong handicap, but the battle isn’t lost yet. The HTML Working Group doesn’t expect that HTML 5 becomes a recommendation before Q3 20010 and before that date everything can happen.

It is up to us, the users, to vote with our feet and pen and start by boycotting the HTML 5 features that are already implemented in some browsers.

And short term, certifying that a page us XHTML 1.x valid is a good way to certify that it doesn’t contain HTML 5 features!

Sun to buy the M from LAMP

Sun has announced their intention to buy MySQL, the number one database for web applications used both by Google et Amazon but also powering most of personal blogs.

Sun has considered that being the M from “LAMP” (Linux, Apache, MySQL, PHP) would be a good way step to be the “.” in “.com” as they used to say in one of their taglines.

This announce has and will be commented at large… Personally, I do hope that this will speed up a better support of XML by MySQL.

I had the opportunity to have a look at XML support in MySQL 5.1 for the chapters about databases in the book “Beginning XML” that I have co-written with Joe Fawcett (he covered SQL Server and wrote two sections about eXist and MySQL). My conclusion is that these features are a good start but there is still a lot of work between they reach something that can match modern databases!

Knowing the long term commitment of Sun to XML, I do hope that this will boost the developments of new XML features.

While we’re speaking of modern databases, one of the leaders in term of XML support is Oracle.

And today is also the date they’ve chosen to announce that they’re buying BEA. what’s the link between these two announcements? It’s a factor 8.5! It will cost $b 8.5 to Oracle to buy BEA and only $b 1 to Sun to buy MySQL.

I don’t want to underestimate the BEA’s business value, but it looks to me that in term of overall visibility and contribution to the net economy, the factor should be the other way round!

That’s probably a good illustration that it remains more difficult to monetize open source than commercial developments.

To XForms or not to XForms?

Yahoo! has released Yahoo! Mobile Developer Platform sous le nom de « Blueprint » and the news has been widely commented by XForms fans: Micah Dubinko states that “Yahoo! introduces mobile XForms” and Erik Bruchez “Yahoo! goes XForms”

The roadmap published by Yahoo! appears to be much more cautious and just says “Much of Blueprint’s philosophy and syntax comes from XForms”.

The developers’ guide clearly shows that if Yahoo! did borrow elements from the XForms recommendation, these elements do not belong to the XForms namespace, cohabit with elements similarly borrowed to XHTML and elements that are specific to Yahoo! and are declared under a single namespace.

The result seems as different from XForms than WAP Forum’s WML was different from XHTML.

If the defenders of a declarative approach can celebrate the fact that this approach has been preferred by Yahoo! over a more procedural approach based on JavaScript, I think that this is an overstatement to say that this is a success for XForms.

XForms has been designed to be user agent agnostic and the development of a Basic version has even been started for low end terminals.

Mobiles were obviously a target for XForms from the beginning and the adoption by Yahoo! of a not really compatible clone can on the contrary be seen as a new failure.

This is especially regrettable for a technology that has a huge technical potential.

Adios Syncato

It’s been fun to use Syncato, but the lack of any kind of efficient anti spam is really overwhelming and I had to switch to something else to reopen the comments that I had to close with Syncato.

I am giving a try to WordPress which is in a way the complete opposite of Syncato: I don’t like that much its technical foundation (I had a look at its implementation of XML features and I’ll come back on that if time permits), but it is so much more user friendly that it’s difficult to resist… After all, I may be a XML Geek, I am also a user!

XSLT has been my friend again for this migration (the XML import that has been used to initialize the WordPress database and the rewrite rules have been generated with XSLT). As a result, all the posts, comments and feeds are available through the same URIs and the side effects should be minimized for the readers of this blog.

Farewell Syncato, I’ll miss your XML abilities!

First contact with Facebook

Being already using LinkedIn and Video, I wasn’t really excited by joining yet another networking community, but a friend of mine kept saying that Facebook was different and more open and he convinced me to join insisting that his life would be easier if I joined a group he had created on this site.

You can’t join if Facebook judges you too old for them

Still only half convinced, I clicked on the link in the invitation he had sent me and found out that Facebook wanted to know my birth date. Being in a bad mood, I entered the earliest date that the web page lets you enter: « Jan 1st, 1910 » and the site told me:

                You must enter your real birthday to register.
                You can restrict who can see your birthday after you join.

I agree that there are probably not that many people born on « Jan 1st, 1910 » or before who will want to join Facebook, but denying them the right to join seems plain wrong to me.

Furthermore, you shouldn’t be forced to give a private information to join. It’s not only a matter of restricting who can see this information but also a matter of trusting the web site enough to want to give this information to them.

Extensive use of cookies

As you’ll have guessed, this is not the only thing that I have tried to test their registration procedure and I have also noticed that they store a bunch of cookies on your browser (around 6 of them can usually be seen). These cookies do not contain values in clear, but they are sensitive enough so that the error messages that you get from the site when you try weird things change if you delete them.

I tend to consider that as insane for all kind of well known technical reasons.

Not able to spell my name

After a while, I eventually got registered to see that they are not able to spell my name correctly.

My name is « Eric van der Vlist » and they keep displaying it as « Eric Van Der Vlist » whatever I do to change it.

This is just rude and disrespectful.

They use Flash

Meaning that a lot of features don’t work for people who, like me, do not want to install a flash plugin.

Abusive terms of use

This got me angry enough to make me read their terms of use and I find the following term really abusive:

When you post User Content to the Site, you authorize and direct us to make such copies thereof as we deem necessary in order to facilitate the posting and storage of the User Content on the Site. By posting User Content to any part of the Site, you automatically grant, and you represent and warrant that you have the right to grant, to the Company an irrevocable, perpetual, non-exclusive, transferable, fully paid, worldwide license (with the right to sublicense) to use, copy, publicly perform, publicly display, reformat, translate, excerpt (in whole or in part) and distribute such User Content for any purpose on or in connection with the Site or the promotion thereof, to prepare derivative works of, or incorporate into other works, such User Content, and to grant and authorize sublicenses of the foregoing. You may remove your User Content from the Site at any time. If you choose to remove your User Content, the license granted above will automatically expire, however you acknowledge that the Company may retain archived copies of your User Content.

If I read it correctly, this means that they can reuse whatever I publish on their site whether I choose to make it readable only by me, by my friends or public and let’s say make a book with a distorted version of what I wrote.

They copyright even what you tell to their support

Why not contact their support rather than posting on my blog?

If you read ahead their terms of use, you’ll find this other interesting piece:

You acknowledge and agree that any questions, comments, suggestions, ideas, feedback or other information about the Site or the Service (« Submissions »), provided by you to Company are non-confidential and shall become the sole property of Company. Company shall own exclusive rights, including all intellectual property rights, and shall be entitled to the unrestricted use and dissemination of these Submissions for any purpose, commercial or otherwise, without acknowledgment or compensation to you.

Unlike the other stuff that you may post on their site for which you grant them non exclusive rights (meaning that you can at least re-use what you’ve contributed), you grant them exclusive rights over any feedback meaning that they can sue you if you want to reuse an idea that you’ve submitted!

Data lock-in

My friend had told me that Facebook was more open than other networking systems.

I find on the contrary that they are a most perfect example of the Data Lock-In area announced by Tim O’Reilly and that I was already mentioning in my first post about Web 2.0.

Are LinkedIn and Viadeo better? Probably not that much, but they seem less insidious to me because they are single purposed and do not tempt you, like Facebook tries to do, to share everything about you under their umbrella.