Vignettes de pages Web

vignette du site XMLfr Pour égayer la page articles du site http://dyomedea.com, j’ai mis des vignettes composées de captures d’écran.

Pour constituer ces vignettes, j’ai voulu éviter la méthode bestiale « capture d’écran et redimensionnement manuel avec Gimp ».

Le procédé n’étant pas bien original, j’ai recherché des outils faisant cela et n’ai trouvé en Open Source que webthumb, un script Perl qui enchaîne des commandes pour lancer Mozilla sur un serveur Xvfb et en effectuer une capture d’écran.

Pour une raison que je n’ai pas cherché à approfondir, webthumb ne semble pas tourner directement sur mon poste de travail (Ubuntu Hoary). Par contre le lancement manuel des commandes pour obtenir le résultat est assez facile.

Dans un premier terminal, il suffit de lancer Xvfb et les commandes dont on veut capturer le résultat, par exemple :

vdv@grosbill:~ $ Xvfb :2 -screen 0 1024x768x24 -ac -fbdir /tmp/xvfb/ &
[1] 14006
vdv@grosbill:~ $ Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!

vdv@grosbill:~ $ export DISPLAY=:2
vdv@grosbill:~ $ firefox http://dyomedea.com
Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!
        

Dans un deuxième terminal, on peut alors vérifier l’affichage avec xwud et le sauvegarder avec xwdtopnm. Pour obtenir ces captures, j’ai utilisé les séquences :

vdv@grosbill:~ $ xwud -in /tmp/xvfb/Xvfb_screen0
vdv@grosbill:~ $ xwdtopnm /tmp/xvfb/Xvfb_screen0| pnmscale -xysize 120 120 | pnmtojpeg -quality 95 > thumb.jpg
xwdtopnm: writing PPM file
vdv@grosbill:~ $ gimp thumb.jpg
*** attempt to put segment in horiz list twice
*** attempt to put segment in horiz list twice
            

Simple, non?

Notes :

  • Les messages d’avertissements affichés ci-dessus ne semblent pas être importants.
  • Les packages Debian/Ubuntu nécessaires pour lancer ces commandes sont xvfb (virtual framebuffer X server), netpbm (Graphics conversion tools) et bien entendu Firefox.

Dyomedea.com est enfin valide!

Après avoir passé des années à expliquer à mes clients qu’il fallait suivre les recommandations du W3C, je viens tout juste d’appliquer ces principes à mon propre site institutionnel : http://dyomedea.com/!

Pour ceux d’entre vous qui voudraient voir la différence, l’ancienne version appartient maintenant aux archives du Web

Le nouveau site parait être très différent, mais sa structure est similaire et les URIs n’ont pas changé.

Ce nouveau site est bien entendu conforme à XHTML 1.1 et CSS 2.0, dénué de tables de présentation et, comme il se doit, basé sur XML.

En plus des grands classiques (GNU/Linux, Apache, …); les site est propulsé par une nouvelle version bêta de Orbeon PresentationServer.

Cette nouvelle version apporte plein de fonctionnalités sexies telle qu’un support de XForms basé sur les technologies Ajax (que je n’utilise pas ici) et un support de XHTML en standard (ce qui n’était pas le cas des versions précédentes qu’il fallait bricoler pour générer du XHTML valide).

J’utilise ce produit (un peu trop puissant pour les besoins de ce site) parce que je l’aime bien (c’est une bonne raison!) et également pour générer des pages dynamiques ce qui a quelques avantages y compris pour un site relativement statique comme celui-là :

  • J’envoie du XHTML (avec un type média « application/xhtml+xml ») uniquement aux navigateurs qui annoncent dans leur réponse qu’ils le supporte (sauf le validateur XHTML du W3C qui ne dit pas ce qu’il supporte; si vous pensez qu’il a tort, vous pouvez voter pour ce bug!) et du HTML aux autres (curieusement, Konqueror qui ne le devrait pas semble faire partie de cette liste).
  • Bien entendu, j’en profite pour faire de l’agrégation de flux RSS 1.0 (de XMLfr et de ce carnet Web) pour afficher mes derniers articles et l’agenda XMLfr.
  • Plus intéressant, j’ai développé deux nouveaux générateurs OPS qui vont récupérer dans ma boîte à lettres les derniers messages que j’ai envoyé sur des listes publiques.
  • Ces générateurs utilisent mon API de binding XML/Java pour lire leurs configurations.
  • Et bien entendu, une plateforme XML/XSLT facilite grandement la gestion de l’internationalisation (le site est en français et en anglais) et permet d’ajouter des gadgets tels qu’un plan du site.

Tout cela a été bien intéressant à réaliser, j’aurais dû le faire avant!

Il ne me reste plus qu’à faire la même chose avec XMLfr…

Dyomedea.com is valid, at last!

There is a French dictum that says that cobblers are the worst shod (curiously, the English equivalent, « shoemaker’s children are the worst shod » bring children into the picture).

After having spent years teaching to my customers that they should follow the W3C recommendations, I have just finished to apply that to my own corporate site, http://dyomedea.com/english/!

For those of you who would like to see the difference, the old one now belongs to web.archive.org

The new site is looking very different, but the structure has been kept similar and the old URIs haven’t changed.

Of course, the new site is now valid XHTML 1.1 and CSS 2.0, free from layout tables and of course, it is powered by XML.

In addition to classics (GNU/Linux, Apache, …); the site is powered by the new beta version of Orbeon PresentationServer.

This version has a lot of fancy stuff such as its Ajax based XForms support (that I am not using here) and a support out of the box for XHTML (which wasn’t the case in previous versions).

I am using it because I like this product (that’s a good reason, isn’t it?) and also to create dynamic pages:

  • I send XHTML (as application/xhtml+xml) to browsers that announce they support it (and also to the W3C XHTML validator that doesn’t send accept headers; if you think that this wrong, vote for this bug!) and HTML to the others (Konqueror appears to be in that list!).
  • Of course, I aggregate RSS 1.0 feeds (from XMLfr and from this blog) to display my latest articles and the XMLfr agenda.
  • More interesting, I have developed a couple of new OPS generators to fetch in my mailbox the latest mails I have sent to public lists.
  • These generators are using my TreeBind JAVA/XML API to read their config inputs.
  • And, of course, an XML/XSLT platform helps a lot to manage the i18n issues (the site is in English and French) and to add goodies such as a site map.

That’s been fun, I should have done it before!

Next on my list should be to do the same with XMLfr…

When old good practices become bad

There are some people with whom you just can’t disagree in their domains of expertise.

These people are always precise and accurate and when you read what one of them writes, you have a feeling that each of his words have been carefully pondered and is just the most accurate that could have been chosen.

In XML land, names that come to mind in that category are (to name few) James Clark, Rick Jelliffe, Murata Makoto, Jeni Tennison, David Carlisle, Uche Ogbuji and, of course, Michael Kay.

It is very exceptional that one can disagree with Michael Kay, his books appear to be 100% bullet proof and it can seem unbelievable that Joris Gillis could dare to write on the xsl-list:

You nearly gave me a heart attack when I encountered the following code in your – in all other aspects excellent – XSLT 2.0 book (3rd edition):…/…

You’ll have guessed that the reason why this happened is that the complain was not related to XSLT skills and the code that followed is:

<xsl:variable name="table-heading">
        <tr>
                <td><b>Date</b></td>
                <td><b>Home Team</b></td>
                <td><b>Away Team</b></td>
                <td><b>Result</b></td>
        </tr>
</xsl:variable>

Michael Kay apologized:

I think it’s true to say that practices like this were commonplace five years ago when many of these examples were written – they are still commonplace today, but no longer regarded as good practice.

And the thread ended up as a discussion about common sense and good practices:

« Common sense » is after all by definition what the majority of people think at the time – it was common sense back then to use tables, it’s common sense now to avoid them…

This thinking itself is also common sense but still good food for thought: good practices of yesterday become poor practices and it’s always worth reconsidering our practices.

When I saw Derrick Story’s announcement of O’Reilly Network Homepage beta, I was quite sure that the publisher of Eric Meyer would have taken the opportunity to follow today’s good practices…

Guess what? The W3C HTML validator reports 40 errors on that page and I can’t disagree with that comment posted on their site:

Well. […] 2 different sites to allow for cookies, redirects that went nowhere and all I really wanted to say was « IT’S A TABLE-BASED LAYOUT! ». Good grief.

The transform source effect

Why is exposing a document model so important? Why would that be better than providing import/export capabilities or API accesses to the document mode?

The « view source effect » is often considered as one of the reasons why XML is so important: people can just learn from opening existing documents and copy/paste stuff they like into their own documents.

Following this analysis, the view source effect would be one of the main reasons of the success of the web: you can just learn by looking at the source of the pages you consider as good examples.

The view source effect is important indeed, but to take it to its full potential copy/paste need to be automated and the view source effect to become the « transform source effect ».

The ability to transform sources means that you don’t need to fully understand what’s going on to take advantage of markup languages formats: you can just do text substitution on a template.

The web is full of examples of the power of the transform source effect: the various templating languages such as PHP, ASP, JSP and many more are nothing more than implementations of the transform source effect.

The « style free stylesheets » which power XMLfr and that I have described in an XML.com article are another example of the transform source effect.

How does that relate to desktop publishing formats? Let’s take a simple example to illustrate that point.

Let’s say I am programmer and I need to deliver an application that takes models of letters and print them after having changed the names and addresses.

Let’s also imagine that I do not know anything of the specifics of the different word processors and that I want my application to be portable across Microsoft Word, Open Office, WordPerfect, AbiWord and Scribus.

Finally, let’s say, for the fun, that I do not know anything of XML but that I am a XXX programmer (substitute XXX by whatever programming language you like).

Because all these word processors can read and write their documents as XML, I’ll just write a simple program that will substitute predefined strings values included in the documents (let’s call them $name, $address, …) with the contents of variables that I could retrieve for instance from a database.

I am sure that you know how to do that with your favorite programming language! In Perl for instance, that would be something like:

#!/usr/bin/perl

$name = 'Mr. Eric van der Vlist';
$address = '22, rue Edgar Faure';
$postcode = 'F75015';
$city = 'Paris';
$country = 'France';

while (<>) {

	s/\$(name|address|postcode|city|country)/${$1}/g;
	print;


            }

There is no magic here: I am just replacing occurrences of the string « $name » in the text by the variable $name that contains « Eric van der Vlist », occurrences of « $address » by « 22, rue Edgar Faure » and so on in a plain text document.

I am leveraging on the « transform source effect » to write a simple application that is compatible with any application that enables this effect by exposing its model as plain text.

This application will work with Microsoft Word (using WordML and probably even RTF), OpenOffice, WordPerfect, AbiWord, Scribus and may more.

It will also work with HTML, XHTML, XSL-FO, SVG, DocBook, TEI, plain text, TEX, …

It will work with Quark, but only if we use QXML as a XML format and not as an API.

And it won’t work with InDesign unless there is a way to import/export full InDesign documents in XML…

See also:

InDesign and XML: no better than its competition

Someone who had read my bog entry about Quark and XML has asked me if I knew whether Adobe had followed the same principles for the support of XML in InDesign.

I am not a specialist of this product range and I had no answer to this question.

Anyway, some quick research on Adobe’s web site and on Google makes me think that even though I find it disappointed that Quark (like so many others) is making this confusion between markup languages and APIs, InDesign hasn’t even reached this point yet.

Adobe’s web site describes InDesign’s flexible XML support as:

Enhanced XML import: Import XML files with flexible control using the Structure view and Tags palette. Automatically flow XML into tagged templates or import and interactively place it. Enhanced options give you greater import control and make it easier to achieve the results you want.

Linked XML files: Create a link to an XML file on import, so you can easily update your placed XML content whenever the source XML content is updated.

XML automation: Automatically format XML on import by mapping XML tags to paragraph and character styles in your document, or more easily reuse existing content by mapping text styles to XML tags and then exporting the XML content.

XML table tagging: Easily apply XML tags to InDesign tables. Then import XML content into the tables or export it from them.

These are useful features that are detailed in a XML.com article, but they do not expose their complete document model in XML, either directly nor even through a XML Schema of the DOM.

That might be the reason why someone that defines himself as a die hard InDesign fan has commented this article to say that Adobe InDesign is behind QuarkXPress in terms of XML features.

This comment has been written in August, 2004 but, has far as I can see on the web, this is still the case today.

See also:

An unconventional XML naming convention

I am not a big fan of naming conventions but I don’t like to be obliged to follow naming conventions that do not seem to make sense!

One of the issues added by W3C XML Schema is that, in addition to define names for elements and attributes, you often have to also define names for simple and complex types.

Even though the W3C XML Schema recommendation says that elements, attributes, types, element groups and attribute groups have separate name spaces, many people want to have a mean to differentiate these spaces just looking at names and end up with using all kind of verbose suffixes.

The other issue is of course to define which character set and capitalization methods should be used.

It happens that the conventions most of my customers have to follow are the UN/CEFACT XML Naming and Design Rules Version 1.1 (PDF).

Following ebXML and UBL, they state that:

Following the ebXML Architecture Specification and commonly used best practice, Lower Camel Case (LCC) is used for naming attributes and Upper Camel Case (UCC) is used for naming elements and types. Lower Camel Case capitalizes the first character of each word except the first word and compounds the name. Upper Camel Case capitalizes the first character of each word and compounds the name.

I think that these rules do not make sense for a couple of reasons:

  1. There are many circumstances where elements and attributes are interchangeable and many vocabularies try to minimize the differences of treatments between elements and attributes. On the contrary, elements and attributes on one hand and types on the other hand are very different kind of beasts: elements and attributes are physical notions that are visible in instance documents while type are abstract notions that belong to schemas.
  2. This convention is not coherent with the UML naming conventions defined in the ebXML Technical Architecture Specification which says that Class, Interface, Association, Package, State, Use Case, Actor names SHALL use UCC convention (examples: ClassificationNode, Versionable, Active,InsertOrder, Buyer). Attribute, Operation, Role, Stereotype, Instance, Event, Action names SHALL use LCC convention (examples: name, notifySender, resident, orderArrived). XML elements and attributes are similar to UML object instances while types are similar to UML classes and they should follow similar naming conventions.

My preferred naming conventions for XML schemas (and those that I am going to follow in the future for projects that are not tied to other conventions) is to use LCC for element and attribute names and UCC for type and group names (or RELAX NG named patterns).

Sticking to this rule will give consistency with the Object Oriented world and allow me to get rid of suffixes to distinguish between what can be seen in the instance documents (elements and attributes) and what belongs to schemas (types, groups or RELAX NG named patterns).

Quark’s desperate attempt to keep XML under control

Yesterday, I had the opportunity to read more carefully the press release made back in January 2005 to announce QuarkXPress Markup Language (QXML).

My first guess, before clicking on the link, was that QXML would be an XML vocabulary.

Wrong guess!

QXML appears to be an XML schema of the World Wide Web Consortium (W3C) Document Object Model (DOM).

Although the press release doesn’t give a definition of this term (new to me), its benefits are detailed:

With QXML, the new DOM schema for QuarkXPress, developers can dynamically access and update the content, structure, and style of a QuarkXPress project using a DOM interface. XTensions modules can be more versatile because they can use a project’s complete content, including all formatting, style sheets, hyphenation, and justification specifications.

A PDF white paper further explains:

One of the goals of an open-standard DOM is to produce common methods for accessing, modifying, creating, and deleting content within a DOM schema. If you are familiar with one particular DOM, understanding and working with another DOM is easy because you are already familiar with the common methods and metaphors applicable to all DOMs. This commonality gives the DOM a distinct advantage over traditional C/C++ application programming interfaces (APIs).

I did quite a lot of searches both on the Quark’s web site and on the Internet and did not see any reference to the possibility of directly using the XML document or any description of the XML vocabulary itself.

To me, it looks like Quark is just missing the point of XML.

XML is about letting anyone read documents in a text editors and write them through print statements…

Whether you like it or not, you just can’t publish documents in XML and still constrain developers to use your own API, only available to your official partners on your company web site and working only on the platforms and languages that you support!

And since you can’t avoid that people use their XML format directly, you should rather document it…

See also:

Good old entities

There is a tendency, among XML gurus, to deprecate everything from the XML recommendation that is not element or attribute and XML constructions such as comments or processing instructions have been deprecated de facto by specifications such as W3C XML Schema that have reinvented their own element based replacements.

Many people also think that DTDs are an archaism that should be removed from the spec.

Lacking of SGML culture, I am not a big fan of DTDs, but there are cases where they can be very useful.

I came upon one of these cases this afternoon while implementing multi-level grouping in XSLT 1.0 following the so-called Muenchian method.

At the fourth level, I ended up with XPath expressions that looked like these ones:

        <xsl:when test="key('path4', concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤',
          ../path/step[3], '¤', ../path/step[4]))[../path/step[5]] ">
          <xs:complexType>
            <xsl:if test="key('path4', concat(@name, '¤', ../path/step[1], '¤', ../path/step[2],
              '¤', ../path/step[3], '¤', ../path/step[4]))[../path/step[5][starts-with(., '@')]]">
              <xsl:attribute name="mixed">true</xsl:attribute>
            </xsl:if>
            <xs:sequence>
              <xsl:apply-templates select="key('path4', concat(@name, '¤', ../path/step[1], '¤',
                ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4]))[ count( . |
                key('path5', concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤',
                ../path/step[3], '¤', ../path/step[4], '¤', ../path/step[4]))[1]  )
                = 1 ]" mode="path5"/>
            </xs:sequence>
            <xsl:apply-templates select="key('path4', concat(@name, '¤', ../path/step[1], '¤',
              ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4]))[ count( . | key('path5',
              concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3], '¤',
              ../path/step[4], '¤', ../path/step[4]))[1]  ) = 1                             ]"
              mode="path5Attributes"/>
          </xs:complexType>
        </xsl:when>

Isn’t it cute?

If you have to write such repetitive expressions, XML entities are your friends. Just write:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY path1 "concat(@name, '¤', ../path/step[1])">
<!ENTITY path2 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2])">
<!ENTITY path3 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3])">
<!ENTITY path4 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4])">
<!ENTITY path5 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4], '¤', ../path/step[4])">
<!ENTITY kCase "key('case', @name)">
<!ENTITY kPath1 "key('path1', &path1;)">
<!ENTITY kPath2 "key('path2', &path2;)">
<!ENTITY kPath3 "key('path3', &path3;)">
<!ENTITY kPath4 "key('path4', &path4;)">
<!ENTITY kPath5 "key('path5', &path5;)">

]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="1.0">
    <xsl:import href="excel2model.xsl"/>
    <xsl:output media-type="xml" indent="yes"/>
    <xsl:key name="case" match="case" use="@name"/>
    <xsl:key name="path1" match="case" use="&path1;"/>
    <xsl:key name="path2" match="case[../path/step[2]]" use="&path2;"/>
    <xsl:key name="path3" match="case[../path/step[3]]" use="&path3;"/>
    <xsl:key name="path4" match="case[../path/step[4]]" use="&path4;"/>
    <xsl:key name="path5" match="case[../path/step[5]]" use="&path5;"/>
            ...

And you’ll be able to simplify the previous snippet to:

                <xsl:when test="&kPath4;[../path/step[5]] ">
                    <xs:complexType>
                        <xsl:if test="&kPath4;[../path/step[5][starts-with(., '@')]]">
                            <xsl:attribute name="mixed">true</xsl:attribute>
                        </xsl:if>
                        <xs:sequence>
                            <xsl:apply-templates select="&kPath4;[ count( . | &kPath5;[1] = 1 ]"
                                  mode="path5"/>
                        </xs:sequence>
                        <xsl:apply-templates select="&kPath4;[ count( . | &kPath5;[1]  ) = 1]"
                                  mode="path5Attributes"/>
                    </xs:complexType>
            </xsl:when>

Doesn’t that look better?

Normalizing Excel’s SpreadsheetML using XSLT

Spreadsheet tables are full of holes and spreadsheet processors such as OpenOffice or Excel have implemented hacks to avoid having to store empty cells.

In the case of Excel, that’s done using ss:Index and ss:MergeAcross attributes.

While these attributes are easy enough to understand, they add a great deal of complexity to XSLT transformations that need to access to a specific cell since you can’t any longer index directly your target.

The traditional way to work around this kind of issue is to pre-process your spreadsheet document to get an intermediary result that lets you index your target cells.

Having already encountered this issue with OpenOffice, I needed something to do the same with Excel when Google led me to a blog entry proposing a transformation something similar.

The transformation needed some adaptation to be usable as I wanted to use it, ie as a transformation that does not modify your SpreadsheetML document except for inserting an ss:Index attribute to every cell.

Here is the result of this adaptation:

This version is buggy. An updated one is available here

<?xml version="1.0"?>
<!--

Adapted from http://ewbi.blogs.com/develops/2004/12/normalize_excel.html

This product may incorporate intellectual property owned by Microsoft Corporation. The terms
and conditions upon which Microsoft is licensing such intellectual property may be found at
http://msdn.microsoft.com/library/en-us/odcXMLRef/html/odcXMLRefLegalNotice.asp.
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
    <xsl:output method="xml" indent="no" encoding="UTF-8"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="ss:Cell/@ss:Index"/>
    <xsl:template match="ss:Cell">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:variable name="prevCells" select="preceding-sibling::ss:Cell"/>
            <xsl:attribute name="ss:Index">
                <xsl:choose>
                    <xsl:when test="@ss:Index">
                        <xsl:value-of select="@ss:Index"/>
                    </xsl:when>
                    <xsl:when test="count($prevCells) = 0">
                        <xsl:value-of select="1"/>
                    </xsl:when>
                    <xsl:when test="count($prevCells[@ss:Index]) > 0">
                        <xsl:value-of select="($prevCells[@ss:Index][1]/@ss:Index) +
                            ((count($prevCells) + 1) -
                            (count($prevCells[@ss:Index][1]/preceding-sibling::ss:Cell)
                            + 1)) + sum($prevCells/@ss:MergeAcross)"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="count($prevCells) + 1 +
                            sum($prevCells/@ss:MergeAcross)"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:attribute>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>
            

This version is buggy. An updated one is available here