septembre 2005 – Eric van der Vlist

Lawyers shouldn’t edit XML documents

One of my customers has found out that the DTD published by Sun to validate property files in JSE 1.5.0 is not well formed!

The javadoc explains :

Note that the system URI (http://java.sun.com/dtd/properties.dtd) is not accessed when exporting or importing properties; it merely serves as a string to uniquely identify the DTD, which is:

<?xml version="1.0" encoding="UTF-8"?>

<!-- DTD for properties -->

<!ELEMENT properties ( comment?, entry* ) >

<!ATTLIST properties version CDATA #FIXED "1.0">

<!ELEMENT comment (#PCDATA) >

<!ELEMENT entry (#PCDATA) >

<!ATTLIST entry key CDATA #REQUIRED>

Reducing the system URI to a mere identifier is a simplification that can lead to problems when you parse your document: XML parsers are free to load DTDs even if you specify standalone= »yes » in your XML declaration and even if you run them in non-validating mode.

In that case, including a system URI pointing to a non well formed DTD means that depending on you parser and on the options you’ll send it at parse time, you may get (or not) a well formness error.

Interestingly, the DTD listed above and borrowed from the javadoc is well formed.

The DTD published at http://java.sun.com/dtd/properties.dtd appears to have been modified to:

<!--
   Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
  -->

<?xml version="1.0" encoding="UTF-8"?>

<!-- DTD for properties -->

<!ELEMENT properties ( comment?, entry* ) >

<!ATTLIST properties version CDATA #FIXED "1.0">

<!ELEMENT comment (#PCDATA) >

<!ELEMENT entry (#PCDATA) >

<!ATTLIST entry key CDATA #REQUIRED>

See what has happened? Someone has probably insisted that they should add a copyright statement at the beginning of each of their documents, forgetting that XML forbids comments before the XML declaration…

We shouldn’t let lawyers edit XML documents!

Les chiffres du chomage

Deux petits chiffres à propos du chômage en France glanés à l’écoute de la très intéressante émission « la nouvelle fabrique » avec Richard Dethyre ce matin sur France Culture.

Suivant cet interlocuteur :

seuls quatre chômeurs sur dix seraient comptabilisés dans les statistiques officielles du chômage,
le taux d’activité en France serait de 63%.

Vignettes de pages Web

Pour égayer la page articles du site http://dyomedea.com, j’ai mis des vignettes composées de captures d’écran.

Pour constituer ces vignettes, j’ai voulu éviter la méthode bestiale « capture d’écran et redimensionnement manuel avec Gimp ».

Le procédé n’étant pas bien original, j’ai recherché des outils faisant cela et n’ai trouvé en Open Source que webthumb, un script Perl qui enchaîne des commandes pour lancer Mozilla sur un serveur Xvfb et en effectuer une capture d’écran.

Pour une raison que je n’ai pas cherché à approfondir, webthumb ne semble pas tourner directement sur mon poste de travail (Ubuntu Hoary). Par contre le lancement manuel des commandes pour obtenir le résultat est assez facile.

Dans un premier terminal, il suffit de lancer Xvfb et les commandes dont on veut capturer le résultat, par exemple :

vdv@grosbill:~ $ Xvfb :2 -screen 0 1024x768x24 -ac -fbdir /tmp/xvfb/ &
[1] 14006
vdv@grosbill:~ $ Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!

vdv@grosbill:~ $ export DISPLAY=:2
vdv@grosbill:~ $ firefox http://dyomedea.com
Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!

Dans un deuxième terminal, on peut alors vérifier l’affichage avec xwud et le sauvegarder avec xwdtopnm. Pour obtenir ces captures, j’ai utilisé les séquences :

vdv@grosbill:~ $ xwud -in /tmp/xvfb/Xvfb_screen0
vdv@grosbill:~ $ xwdtopnm /tmp/xvfb/Xvfb_screen0| pnmscale -xysize 120 120 | pnmtojpeg -quality 95 > thumb.jpg
xwdtopnm: writing PPM file
vdv@grosbill:~ $ gimp thumb.jpg
*** attempt to put segment in horiz list twice
*** attempt to put segment in horiz list twice

Simple, non?

Notes :

Les messages d’avertissements affichés ci-dessus ne semblent pas être importants.
Les packages Debian/Ubuntu nécessaires pour lancer ces commandes sont xvfb (virtual framebuffer X server), netpbm (Graphics conversion tools) et bien entendu Firefox.

Dyomedea.com est enfin valide!

Après avoir passé des années à expliquer à mes clients qu’il fallait suivre les recommandations du W3C, je viens tout juste d’appliquer ces principes à mon propre site institutionnel : http://dyomedea.com/!

Pour ceux d’entre vous qui voudraient voir la différence, l’ancienne version appartient maintenant aux archives du Web…

Le nouveau site parait être très différent, mais sa structure est similaire et les URIs n’ont pas changé.

Ce nouveau site est bien entendu conforme à XHTML 1.1 et CSS 2.0, dénué de tables de présentation et, comme il se doit, basé sur XML.

En plus des grands classiques (GNU/Linux, Apache, …); les site est propulsé par une nouvelle version bêta de Orbeon PresentationServer.

Cette nouvelle version apporte plein de fonctionnalités sexies telle qu’un support de XForms basé sur les technologies Ajax (que je n’utilise pas ici) et un support de XHTML en standard (ce qui n’était pas le cas des versions précédentes qu’il fallait bricoler pour générer du XHTML valide).

J’utilise ce produit (un peu trop puissant pour les besoins de ce site) parce que je l’aime bien (c’est une bonne raison!) et également pour générer des pages dynamiques ce qui a quelques avantages y compris pour un site relativement statique comme celui-là :

J’envoie du XHTML (avec un type média « application/xhtml+xml ») uniquement aux navigateurs qui annoncent dans leur réponse qu’ils le supporte (sauf le validateur XHTML du W3C qui ne dit pas ce qu’il supporte; si vous pensez qu’il a tort, vous pouvez voter pour ce bug!) et du HTML aux autres (curieusement, Konqueror qui ne le devrait pas semble faire partie de cette liste).
Bien entendu, j’en profite pour faire de l’agrégation de flux RSS 1.0 (de XMLfr et de ce carnet Web) pour afficher mes derniers articles et l’agenda XMLfr.
Plus intéressant, j’ai développé deux nouveaux générateurs OPS qui vont récupérer dans ma boîte à lettres les derniers messages que j’ai envoyé sur des listes publiques.
Ces générateurs utilisent mon API de binding XML/Java pour lire leurs configurations.
Et bien entendu, une plateforme XML/XSLT facilite grandement la gestion de l’internationalisation (le site est en français et en anglais) et permet d’ajouter des gadgets tels qu’un plan du site.

Tout cela a été bien intéressant à réaliser, j’aurais dû le faire avant!

Il ne me reste plus qu’à faire la même chose avec XMLfr…

Dyomedea.com is valid, at last!

There is a French dictum that says that cobblers are the worst shod (curiously, the English equivalent, « shoemaker’s children are the worst shod » bring children into the picture).

After having spent years teaching to my customers that they should follow the W3C recommendations, I have just finished to apply that to my own corporate site, http://dyomedea.com/english/!

For those of you who would like to see the difference, the old one now belongs to web.archive.org…

The new site is looking very different, but the structure has been kept similar and the old URIs haven’t changed.

Of course, the new site is now valid XHTML 1.1 and CSS 2.0, free from layout tables and of course, it is powered by XML.

In addition to classics (GNU/Linux, Apache, …); the site is powered by the new beta version of Orbeon PresentationServer.

This version has a lot of fancy stuff such as its Ajax based XForms support (that I am not using here) and a support out of the box for XHTML (which wasn’t the case in previous versions).

I am using it because I like this product (that’s a good reason, isn’t it?) and also to create dynamic pages:

I send XHTML (as application/xhtml+xml) to browsers that announce they support it (and also to the W3C XHTML validator that doesn’t send accept headers; if you think that this wrong, vote for this bug!) and HTML to the others (Konqueror appears to be in that list!).
Of course, I aggregate RSS 1.0 feeds (from XMLfr and from this blog) to display my latest articles and the XMLfr agenda.
More interesting, I have developed a couple of new OPS generators to fetch in my mailbox the latest mails I have sent to public lists.
These generators are using my TreeBind JAVA/XML API to read their config inputs.
And, of course, an XML/XSLT platform helps a lot to manage the i18n issues (the site is in English and French) and to add goodies such as a site map.

That’s been fun, I should have done it before!

Next on my list should be to do the same with XMLfr…

When old good practices become bad

There are some people with whom you just can’t disagree in their domains of expertise.

These people are always precise and accurate and when you read what one of them writes, you have a feeling that each of his words have been carefully pondered and is just the most accurate that could have been chosen.

In XML land, names that come to mind in that category are (to name few) James Clark, Rick Jelliffe, Murata Makoto, Jeni Tennison, David Carlisle, Uche Ogbuji and, of course, Michael Kay.

It is very exceptional that one can disagree with Michael Kay, his books appear to be 100% bullet proof and it can seem unbelievable that Joris Gillis could dare to write on the xsl-list:

You nearly gave me a heart attack when I encountered the following code in your – in all other aspects excellent – XSLT 2.0 book (3rd edition):…/…

You’ll have guessed that the reason why this happened is that the complain was not related to XSLT skills and the code that followed is:

<xsl:variable name="table-heading">
        <tr>
                <td><b>Date</b></td>
                <td><b>Home Team</b></td>
                <td><b>Away Team</b></td>
                <td><b>Result</b></td>
        </tr>
</xsl:variable>

Michael Kay apologized:

I think it’s true to say that practices like this were commonplace five years ago when many of these examples were written – they are still commonplace today, but no longer regarded as good practice.

And the thread ended up as a discussion about common sense and good practices:

« Common sense » is after all by definition what the majority of people think at the time – it was common sense back then to use tables, it’s common sense now to avoid them…

This thinking itself is also common sense but still good food for thought: good practices of yesterday become poor practices and it’s always worth reconsidering our practices.

When I saw Derrick Story’s announcement of O’Reilly Network Homepage beta, I was quite sure that the publisher of Eric Meyer would have taken the opportunity to follow today’s good practices…

Guess what? The W3C HTML validator reports 40 errors on that page and I can’t disagree with that comment posted on their site:

Well. […] 2 different sites to allow for cookies, redirects that went nowhere and all I really wanted to say was « IT’S A TABLE-BASED LAYOUT! ». Good grief.