XML – Page 8 – Eric van der Vlist

Dyomedea.com est enfin valide!

Après avoir passé des années à expliquer à mes clients qu’il fallait suivre les recommandations du W3C, je viens tout juste d’appliquer ces principes à mon propre site institutionnel : http://dyomedea.com/!

Pour ceux d’entre vous qui voudraient voir la différence, l’ancienne version appartient maintenant aux archives du Web…

Le nouveau site parait être très différent, mais sa structure est similaire et les URIs n’ont pas changé.

Ce nouveau site est bien entendu conforme à XHTML 1.1 et CSS 2.0, dénué de tables de présentation et, comme il se doit, basé sur XML.

En plus des grands classiques (GNU/Linux, Apache, …); les site est propulsé par une nouvelle version bêta de Orbeon PresentationServer.

Cette nouvelle version apporte plein de fonctionnalités sexies telle qu’un support de XForms basé sur les technologies Ajax (que je n’utilise pas ici) et un support de XHTML en standard (ce qui n’était pas le cas des versions précédentes qu’il fallait bricoler pour générer du XHTML valide).

J’utilise ce produit (un peu trop puissant pour les besoins de ce site) parce que je l’aime bien (c’est une bonne raison!) et également pour générer des pages dynamiques ce qui a quelques avantages y compris pour un site relativement statique comme celui-là :

J’envoie du XHTML (avec un type média « application/xhtml+xml ») uniquement aux navigateurs qui annoncent dans leur réponse qu’ils le supporte (sauf le validateur XHTML du W3C qui ne dit pas ce qu’il supporte; si vous pensez qu’il a tort, vous pouvez voter pour ce bug!) et du HTML aux autres (curieusement, Konqueror qui ne le devrait pas semble faire partie de cette liste).
Bien entendu, j’en profite pour faire de l’agrégation de flux RSS 1.0 (de XMLfr et de ce carnet Web) pour afficher mes derniers articles et l’agenda XMLfr.
Plus intéressant, j’ai développé deux nouveaux générateurs OPS qui vont récupérer dans ma boîte à lettres les derniers messages que j’ai envoyé sur des listes publiques.
Ces générateurs utilisent mon API de binding XML/Java pour lire leurs configurations.
Et bien entendu, une plateforme XML/XSLT facilite grandement la gestion de l’internationalisation (le site est en français et en anglais) et permet d’ajouter des gadgets tels qu’un plan du site.

Tout cela a été bien intéressant à réaliser, j’aurais dû le faire avant!

Il ne me reste plus qu’à faire la même chose avec XMLfr…

Dyomedea.com is valid, at last!

There is a French dictum that says that cobblers are the worst shod (curiously, the English equivalent, « shoemaker’s children are the worst shod » bring children into the picture).

After having spent years teaching to my customers that they should follow the W3C recommendations, I have just finished to apply that to my own corporate site, http://dyomedea.com/english/!

For those of you who would like to see the difference, the old one now belongs to web.archive.org…

The new site is looking very different, but the structure has been kept similar and the old URIs haven’t changed.

Of course, the new site is now valid XHTML 1.1 and CSS 2.0, free from layout tables and of course, it is powered by XML.

In addition to classics (GNU/Linux, Apache, …); the site is powered by the new beta version of Orbeon PresentationServer.

This version has a lot of fancy stuff such as its Ajax based XForms support (that I am not using here) and a support out of the box for XHTML (which wasn’t the case in previous versions).

I am using it because I like this product (that’s a good reason, isn’t it?) and also to create dynamic pages:

I send XHTML (as application/xhtml+xml) to browsers that announce they support it (and also to the W3C XHTML validator that doesn’t send accept headers; if you think that this wrong, vote for this bug!) and HTML to the others (Konqueror appears to be in that list!).
Of course, I aggregate RSS 1.0 feeds (from XMLfr and from this blog) to display my latest articles and the XMLfr agenda.
More interesting, I have developed a couple of new OPS generators to fetch in my mailbox the latest mails I have sent to public lists.
These generators are using my TreeBind JAVA/XML API to read their config inputs.
And, of course, an XML/XSLT platform helps a lot to manage the i18n issues (the site is in English and French) and to add goodies such as a site map.

That’s been fun, I should have done it before!

Next on my list should be to do the same with XMLfr…

When old good practices become bad

There are some people with whom you just can’t disagree in their domains of expertise.

These people are always precise and accurate and when you read what one of them writes, you have a feeling that each of his words have been carefully pondered and is just the most accurate that could have been chosen.

In XML land, names that come to mind in that category are (to name few) James Clark, Rick Jelliffe, Murata Makoto, Jeni Tennison, David Carlisle, Uche Ogbuji and, of course, Michael Kay.

It is very exceptional that one can disagree with Michael Kay, his books appear to be 100% bullet proof and it can seem unbelievable that Joris Gillis could dare to write on the xsl-list:

You nearly gave me a heart attack when I encountered the following code in your – in all other aspects excellent – XSLT 2.0 book (3rd edition):…/…

You’ll have guessed that the reason why this happened is that the complain was not related to XSLT skills and the code that followed is:

<xsl:variable name="table-heading">
        <tr>
                <td><b>Date</b></td>
                <td><b>Home Team</b></td>
                <td><b>Away Team</b></td>
                <td><b>Result</b></td>
        </tr>
</xsl:variable>

Michael Kay apologized:

I think it’s true to say that practices like this were commonplace five years ago when many of these examples were written – they are still commonplace today, but no longer regarded as good practice.

And the thread ended up as a discussion about common sense and good practices:

« Common sense » is after all by definition what the majority of people think at the time – it was common sense back then to use tables, it’s common sense now to avoid them…

This thinking itself is also common sense but still good food for thought: good practices of yesterday become poor practices and it’s always worth reconsidering our practices.

When I saw Derrick Story’s announcement of O’Reilly Network Homepage beta, I was quite sure that the publisher of Eric Meyer would have taken the opportunity to follow today’s good practices…

Guess what? The W3C HTML validator reports 40 errors on that page and I can’t disagree with that comment posted on their site:

Well. […] 2 different sites to allow for cookies, redirects that went nowhere and all I really wanted to say was « IT’S A TABLE-BASED LAYOUT! ». Good grief.

The transform source effect

Why is exposing a document model so important? Why would that be better than providing import/export capabilities or API accesses to the document mode?

The « view source effect » is often considered as one of the reasons why XML is so important: people can just learn from opening existing documents and copy/paste stuff they like into their own documents.

Following this analysis, the view source effect would be one of the main reasons of the success of the web: you can just learn by looking at the source of the pages you consider as good examples.

The view source effect is important indeed, but to take it to its full potential copy/paste need to be automated and the view source effect to become the « transform source effect ».

The ability to transform sources means that you don’t need to fully understand what’s going on to take advantage of markup languages formats: you can just do text substitution on a template.

The web is full of examples of the power of the transform source effect: the various templating languages such as PHP, ASP, JSP and many more are nothing more than implementations of the transform source effect.

The « style free stylesheets » which power XMLfr and that I have described in an XML.com article are another example of the transform source effect.

How does that relate to desktop publishing formats? Let’s take a simple example to illustrate that point.

Let’s say I am programmer and I need to deliver an application that takes models of letters and print them after having changed the names and addresses.

Let’s also imagine that I do not know anything of the specifics of the different word processors and that I want my application to be portable across Microsoft Word, Open Office, WordPerfect, AbiWord and Scribus.

Finally, let’s say, for the fun, that I do not know anything of XML but that I am a XXX programmer (substitute XXX by whatever programming language you like).

Because all these word processors can read and write their documents as XML, I’ll just write a simple program that will substitute predefined strings values included in the documents (let’s call them $name, $address, …) with the contents of variables that I could retrieve for instance from a database.

I am sure that you know how to do that with your favorite programming language! In Perl for instance, that would be something like:

#!/usr/bin/perl

$name = 'Mr. Eric van der Vlist';
$address = '22, rue Edgar Faure';
$postcode = 'F75015';
$city = 'Paris';
$country = 'France';

while (<>) {

	s/\$(name|address|postcode|city|country)/${$1}/g;
	print;


            }

There is no magic here: I am just replacing occurrences of the string « $name » in the text by the variable $name that contains « Eric van der Vlist », occurrences of « $address » by « 22, rue Edgar Faure » and so on in a plain text document.

I am leveraging on the « transform source effect » to write a simple application that is compatible with any application that enables this effect by exposing its model as plain text.

This application will work with Microsoft Word (using WordML and probably even RTF), OpenOffice, WordPerfect, AbiWord, Scribus and may more.

It will also work with HTML, XHTML, XSL-FO, SVG, DocBook, TEI, plain text, TEX, …

It will work with Quark, but only if we use QXML as a XML format and not as an API.

And it won’t work with InDesign unless there is a way to import/export full InDesign documents in XML…

InDesign and XML: no better than its competition

Someone who had read my bog entry about Quark and XML has asked me if I knew whether Adobe had followed the same principles for the support of XML in InDesign.

I am not a specialist of this product range and I had no answer to this question.

Anyway, some quick research on Adobe’s web site and on Google makes me think that even though I find it disappointed that Quark (like so many others) is making this confusion between markup languages and APIs, InDesign hasn’t even reached this point yet.

Adobe’s web site describes InDesign’s flexible XML support as:

Enhanced XML import: Import XML files with flexible control using the Structure view and Tags palette. Automatically flow XML into tagged templates or import and interactively place it. Enhanced options give you greater import control and make it easier to achieve the results you want.

Linked XML files: Create a link to an XML file on import, so you can easily update your placed XML content whenever the source XML content is updated.

XML automation: Automatically format XML on import by mapping XML tags to paragraph and character styles in your document, or more easily reuse existing content by mapping text styles to XML tags and then exporting the XML content.

XML table tagging: Easily apply XML tags to InDesign tables. Then import XML content into the tables or export it from them.

These are useful features that are detailed in a XML.com article, but they do not expose their complete document model in XML, either directly nor even through a XML Schema of the DOM.

That might be the reason why someone that defines himself as a die hard InDesign fan has commented this article to say that Adobe InDesign is behind QuarkXPress in terms of XML features.

This comment has been written in August, 2004 but, has far as I can see on the web, this is still the case today.

An unconventional XML naming convention

I am not a big fan of naming conventions but I don’t like to be obliged to follow naming conventions that do not seem to make sense!

One of the issues added by W3C XML Schema is that, in addition to define names for elements and attributes, you often have to also define names for simple and complex types.

Even though the W3C XML Schema recommendation says that elements, attributes, types, element groups and attribute groups have separate name spaces, many people want to have a mean to differentiate these spaces just looking at names and end up with using all kind of verbose suffixes.

The other issue is of course to define which character set and capitalization methods should be used.

It happens that the conventions most of my customers have to follow are the UN/CEFACT XML Naming and Design Rules Version 1.1 (PDF).

Following ebXML and UBL, they state that:

Following the ebXML Architecture Specification and commonly used best practice, Lower Camel Case (LCC) is used for naming attributes and Upper Camel Case (UCC) is used for naming elements and types. Lower Camel Case capitalizes the first character of each word except the first word and compounds the name. Upper Camel Case capitalizes the first character of each word and compounds the name.

I think that these rules do not make sense for a couple of reasons:

There are many circumstances where elements and attributes are interchangeable and many vocabularies try to minimize the differences of treatments between elements and attributes. On the contrary, elements and attributes on one hand and types on the other hand are very different kind of beasts: elements and attributes are physical notions that are visible in instance documents while type are abstract notions that belong to schemas.
This convention is not coherent with the UML naming conventions defined in the ebXML Technical Architecture Specification which says that Class, Interface, Association, Package, State, Use Case, Actor names SHALL use UCC convention (examples: ClassificationNode, Versionable, Active,InsertOrder, Buyer). Attribute, Operation, Role, Stereotype, Instance, Event, Action names SHALL use LCC convention (examples: name, notifySender, resident, orderArrived). XML elements and attributes are similar to UML object instances while types are similar to UML classes and they should follow similar naming conventions.

My preferred naming conventions for XML schemas (and those that I am going to follow in the future for projects that are not tied to other conventions) is to use LCC for element and attribute names and UCC for type and group names (or RELAX NG named patterns).

Sticking to this rule will give consistency with the Object Oriented world and allow me to get rid of suffixes to distinguish between what can be seen in the instance documents (elements and attributes) and what belongs to schemas (types, groups or RELAX NG named patterns).

Quark’s desperate attempt to keep XML under control

Yesterday, I had the opportunity to read more carefully the press release made back in January 2005 to announce QuarkXPress Markup Language (QXML).

My first guess, before clicking on the link, was that QXML would be an XML vocabulary.

Wrong guess!

QXML appears to be an XML schema of the World Wide Web Consortium (W3C) Document Object Model (DOM).

Although the press release doesn’t give a definition of this term (new to me), its benefits are detailed:

With QXML, the new DOM schema for QuarkXPress, developers can dynamically access and update the content, structure, and style of a QuarkXPress project using a DOM interface. XTensions modules can be more versatile because they can use a project’s complete content, including all formatting, style sheets, hyphenation, and justification specifications.

A PDF white paper further explains:

One of the goals of an open-standard DOM is to produce common methods for accessing, modifying, creating, and deleting content within a DOM schema. If you are familiar with one particular DOM, understanding and working with another DOM is easy because you are already familiar with the common methods and metaphors applicable to all DOMs. This commonality gives the DOM a distinct advantage over traditional C/C++ application programming interfaces (APIs).

I did quite a lot of searches both on the Quark’s web site and on the Internet and did not see any reference to the possibility of directly using the XML document or any description of the XML vocabulary itself.

To me, it looks like Quark is just missing the point of XML.

XML is about letting anyone read documents in a text editors and write them through print statements…

Whether you like it or not, you just can’t publish documents in XML and still constrain developers to use your own API, only available to your official partners on your company web site and working only on the platforms and languages that you support!

And since you can’t avoid that people use their XML format directly, you should rather document it…

Good old entities

There is a tendency, among XML gurus, to deprecate everything from the XML recommendation that is not element or attribute and XML constructions such as comments or processing instructions have been deprecated de facto by specifications such as W3C XML Schema that have reinvented their own element based replacements.

Many people also think that DTDs are an archaism that should be removed from the spec.

Lacking of SGML culture, I am not a big fan of DTDs, but there are cases where they can be very useful.

I came upon one of these cases this afternoon while implementing multi-level grouping in XSLT 1.0 following the so-called Muenchian method.

At the fourth level, I ended up with XPath expressions that looked like these ones:

        <xsl:when test="key('path4', concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤',
          ../path/step[3], '¤', ../path/step[4]))[../path/step[5]] ">
          <xs:complexType>
            <xsl:if test="key('path4', concat(@name, '¤', ../path/step[1], '¤', ../path/step[2],
              '¤', ../path/step[3], '¤', ../path/step[4]))[../path/step[5][starts-with(., '@')]]">
              <xsl:attribute name="mixed">true</xsl:attribute>
            </xsl:if>
            <xs:sequence>
              <xsl:apply-templates select="key('path4', concat(@name, '¤', ../path/step[1], '¤',
                ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4]))[ count( . |
                key('path5', concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤',
                ../path/step[3], '¤', ../path/step[4], '¤', ../path/step[4]))[1]  )
                = 1 ]" mode="path5"/>
            </xs:sequence>
            <xsl:apply-templates select="key('path4', concat(@name, '¤', ../path/step[1], '¤',
              ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4]))[ count( . | key('path5',
              concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3], '¤',
              ../path/step[4], '¤', ../path/step[4]))[1]  ) = 1                             ]"
              mode="path5Attributes"/>
          </xs:complexType>
        </xsl:when>

Isn’t it cute?

If you have to write such repetitive expressions, XML entities are your friends. Just write:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY path1 "concat(@name, '¤', ../path/step[1])">
<!ENTITY path2 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2])">
<!ENTITY path3 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3])">
<!ENTITY path4 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4])">
<!ENTITY path5 "concat(@name, '¤', ../path/step[1], '¤', ../path/step[2], '¤', ../path/step[3], '¤', ../path/step[4], '¤', ../path/step[4])">
<!ENTITY kCase "key('case', @name)">
<!ENTITY kPath1 "key('path1', &path1;)">
<!ENTITY kPath2 "key('path2', &path2;)">
<!ENTITY kPath3 "key('path3', &path3;)">
<!ENTITY kPath4 "key('path4', &path4;)">
<!ENTITY kPath5 "key('path5', &path5;)">

]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="1.0">
    <xsl:import href="excel2model.xsl"/>
    <xsl:output media-type="xml" indent="yes"/>
    <xsl:key name="case" match="case" use="@name"/>
    <xsl:key name="path1" match="case" use="&path1;"/>
    <xsl:key name="path2" match="case[../path/step[2]]" use="&path2;"/>
    <xsl:key name="path3" match="case[../path/step[3]]" use="&path3;"/>
    <xsl:key name="path4" match="case[../path/step[4]]" use="&path4;"/>
    <xsl:key name="path5" match="case[../path/step[5]]" use="&path5;"/>
            ...

And you’ll be able to simplify the previous snippet to:

                <xsl:when test="&kPath4;[../path/step[5]] ">
                    <xs:complexType>
                        <xsl:if test="&kPath4;[../path/step[5][starts-with(., '@')]]">
                            <xsl:attribute name="mixed">true</xsl:attribute>
                        </xsl:if>
                        <xs:sequence>
                            <xsl:apply-templates select="&kPath4;[ count( . | &kPath5;[1] = 1 ]"
                                  mode="path5"/>
                        </xs:sequence>
                        <xsl:apply-templates select="&kPath4;[ count( . | &kPath5;[1]  ) = 1]"
                                  mode="path5Attributes"/>
                    </xs:complexType>
            </xsl:when>

Doesn’t that look better?

Normalizing Excel’s SpreadsheetML using XSLT

Spreadsheet tables are full of holes and spreadsheet processors such as OpenOffice or Excel have implemented hacks to avoid having to store empty cells.

In the case of Excel, that’s done using ss:Index and ss:MergeAcross attributes.

While these attributes are easy enough to understand, they add a great deal of complexity to XSLT transformations that need to access to a specific cell since you can’t any longer index directly your target.

The traditional way to work around this kind of issue is to pre-process your spreadsheet document to get an intermediary result that lets you index your target cells.

Having already encountered this issue with OpenOffice, I needed something to do the same with Excel when Google led me to a blog entry proposing a transformation something similar.

The transformation needed some adaptation to be usable as I wanted to use it, ie as a transformation that does not modify your SpreadsheetML document except for inserting an ss:Index attribute to every cell.

Here is the result of this adaptation:

This version is buggy. An updated one is available here

<?xml version="1.0"?>
<!--

Adapted from http://ewbi.blogs.com/develops/2004/12/normalize_excel.html

This product may incorporate intellectual property owned by Microsoft Corporation. The terms
and conditions upon which Microsoft is licensing such intellectual property may be found at
http://msdn.microsoft.com/library/en-us/odcXMLRef/html/odcXMLRefLegalNotice.asp.
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
    <xsl:output method="xml" indent="no" encoding="UTF-8"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="ss:Cell/@ss:Index"/>
    <xsl:template match="ss:Cell">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:variable name="prevCells" select="preceding-sibling::ss:Cell"/>
            <xsl:attribute name="ss:Index">
                <xsl:choose>
                    <xsl:when test="@ss:Index">
                        <xsl:value-of select="@ss:Index"/>
                    </xsl:when>
                    <xsl:when test="count($prevCells) = 0">
                        <xsl:value-of select="1"/>
                    </xsl:when>
                    <xsl:when test="count($prevCells[@ss:Index]) > 0">
                        <xsl:value-of select="($prevCells[@ss:Index][1]/@ss:Index) +
                            ((count($prevCells) + 1) -
                            (count($prevCells[@ss:Index][1]/preceding-sibling::ss:Cell)
                            + 1)) + sum($prevCells/@ss:MergeAcross)"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="count($prevCells) + 1 +
                            sum($prevCells/@ss:MergeAcross)"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:attribute>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

This version is buggy. An updated one is available here

TreeBind goes RDF

TreeBind can be seen as yet another open source XML <-> Java object data binding framework.

The two major design decisions that differentiate TreeBind from other similar frameworks are that:

TreeBind has been designed to work with existing classes through Java introspection and doesn’t rely on XML schemas.
Its architecture is not specific to XML and TreeBind can be used to bind any source to any sink, assuming they can be browsed and built following the paradigm of trees (this is the reason why we have chosen this name).

Another difference with other frameworks is that its TreeBind has been sponsored by one of my customers (INSEE) and that I am its author…

The reason why we’ve started this project is that we’ve not found any framework that was meeting the two requirements I have mentioned and I am now bringing TreeBind a step forward by designing a RDF binding.

I have sent an email with the design decisions I am considering for this RDF binding to the TreeBind mailing list and I include a copy bellow for your convenience:

Hi,

I am currently using TreeBind on a RDF/XML vocabulary.

Of course, a RDF/XML document is a well formed XML document and I could use the current XML bindings to read and write RDF/XML document.

However, these bindings focus on the actual XML syntax used to serialize the document. They don’t see the RDF graph behind that syntax and are sensitive to the « style » used in the XML document.

For instance, these two documents produce very similar triples:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns="http://ns.treebind.org/example/";>
    <book>
        <title>RELAX NG</title>
        <written-by>
            <author>
                <fname>Eric</fname>
                <lname>van der Vlist</lname>
            </author>
        </written-by>
    </book>
</rdf:RDF>

and

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns="http://ns.treebind.org/example/";>
    <book>
        <title>RELAX NG</title>
        <written-by rdf:resource="#vdv"/>
    </book>
    <author rdf:ID="vdv">
        <fname>Eric</fname>
        <lname>van der Vlist</lname>
    </author>
</rdf:RDF>

but the XML bindings will generate a quite different set of objects.

The solution to this problem is to create RDF bindings that will sit on top of a RDF parser to pour the content of the RDF model into a set of objects.

The overall architecture of TreeBind has been designed with this kind of extension in mind and that should be easy enough.

That being said, design decisions need to be made to define these RDF bindings and I’d like to discuss them in this forum.

RDF/XML isn’t so much an XML vocabulary in the common meaning of this term but rather a set of binding rules to bind an XML tree into a graph.

These binding rules introduce some conventions that are sometimes different from what we use to do in « raw » XML documents.

In raw XML, we would probably have written the previous example as:

<?xml version="1.0" encoding="UTF-8"?>
<book xmlns="http://ns.treebind.org/example/";>
    <title>RELAX NG</title>
    <author>
        <fname>Eric</fname>
        <lname>van der Vlist</lname>
    </author>
</book>

The XML bindings would pour that content into a set of objects using the following algorithm:

find a class that matches the XML expanded name {http://ns.treebind.org/example/}book and create an object from that class.
try to find a method such as addTitle or setTitle with a string parameter on this book object and call that method with the string « RELAX NG ».
find a class that matches the XML expanded name {http://ns.treebind.org/example/}author and create an object from that class.
try to find a method such as addFname or setFname with a string parameter on this author object and call that method with the string « Eric ».
try to find a method such as addLname or setLname with a string parameter on this author object and call that method with the string « van der Vlist ».
try to find a method such as addAuthor or setAuthor with a string parameter on the book object and call that method with the author object.

We see that there is a difference between the way simple type and complex type elements are treated.

For a simple type element (such as « title », « fname » and « lname »), the name of the element is used to determine the method to call and the parameter type is always string.

For a complex type element (such as author), the name of the element is used both to determine the method to call and the class of the object that needs to be created. The parameter type is this class.

This is because when we write in XML there is an implicit expectation that « author » is used both as a complex object and as a verb.

Unless instructed otherwise, RDF doesn’t allow these implicit shortcuts and an XML element is either a predicate or an object. That’s why we have added an « written-by » element in our RDF example:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns="http://ns.treebind.org/example/";>
    <book>
        <title>RELAX NG</title>
        <written-by>
            <author>
                <fname>Eric</fname>
                <lname>van der Vlist</lname>
            </author>
        </written-by>
    </book>
</rdf:RDF>

The first design decision we have to make is to decide how we will treat that « written-by » element.

To have everything in hand to take a decision, let’s also see what are the triples for that example:

rapper: Parsing file book1.rdf
_:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.treebind.org/example/book> .
_:genid1 <http://ns.treebind.org/example/title> "RELAX NG" .
_:genid2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.treebind.org/example/author> .
_:genid2 <http://ns.treebind.org/example/fname> "Eric" .
_:genid2 <http://ns.treebind.org/example/lname> "van der Vlist" .
_:genid1 <http://ns.treebind.org/example/written-by> _:genid2 .
rapper: Parsing returned 6 statements

In these triples, two of them are defining element types:

_:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.treebind.org/example/book> .

and

_:genid2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.treebind.org/example/author> .

I propose to use these statements to determine which classes must be used to create the objects. So far, that’s pretty similar to what we’re doing in XML.

Then, we have triples that assign literals to our objects:

_:genid1 <http://ns.treebind.org/example/title> "RELAX NG" .
_:genid2 <http://ns.treebind.org/example/fname> "Eric" .
_:genid2 <http://ns.treebind.org/example/lname> "van der Vlist" .

We can use the predicates of these triples (<http://ns.treebind.org/example/title>, <http://ns.treebind.org/example/fname>, <http://ns.treebind.org/example/lname>) to determine the names of the setter methods to use to add the corresponding information to the object. Again, that’s exactly similar to what we’re doing in XML.

Finally, we have a statement that links two objects together:

_:genid1 <http://ns.treebind.org/example/written-by> _:genid2 .

I think that it is quite natural to use the predicate (<http://ns.treebind.org/example/written-by>) to determine the setter method that needs to be called on the book object to set the author object.

This is different from what we would have been doing in XML: in XML, since there is a written-by element, we would have created a « written-by » object, added the author object to the written-by object and added the written-by object to the book object.

Does that difference make sense?

I think it does, but the downside is that the same simple document like this one:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns="http://ns.treebind.org/example/";>
    <book>
        <title>RELAX NG</title>
        <written-by>
            <author>
                <fname>Eric</fname>
                <lname>van der Vlist</lname>
            </author>
        </written-by>
    </book>
</rdf:RDF>

will give a quite different set of objects depending which binding (XML or RDF) will be used.

That seems to be the price to pay to try to get as close as possible to the RDF model.

What do you think?

Earlier, I have mentioned that RDF can be told to accept documents with « shortcuts ». What I had in mind is:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns="http://ns.treebind.org/example/";>
    <book>
        <title>RELAX NG</title>
        <author rdf:parseType="Resource">
            <fname>Eric</fname>
            <lname>van der Vlist</lname>
        </author>
    </book>
</rdf:RDF>

Here, we have used an attribute rdf:parseType= »Resource » to specify that the author element is a resource.

The triples generated from this document are:

rapper: Parsing file book3.rdf
_:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.treebind.org/example/book> .
_:genid1 <http://ns.treebind.org/example/title> "RELAX NG" .
_:genid2 <http://ns.treebind.org/example/fname> "Eric" .
_:genid2 <http://ns.treebind.org/example/lname> "van der Vlist" .
_:genid1 <http://ns.treebind.org/example/author> _:genid2 .
rapper: Parsing returned 5 statements

The model is pretty similar except that there is a triple missing (we have now 5 triples instead of 6).

The triple that is missing is the one that gave the type of the author element:

_:genid2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.treebind.org/example/author> .

The other difference is that <http://ns.treebind.org/example/author> is now a predicate.

In this situation when we don’t have a type for a predicate linking to a resource, I propose that we follow the rule we use in XML and use the predicate to determine both the setter method and the class of the object to create to pour the resource.

What do you think? Does that make sense?

Thanks,

Eric

Thanks for your comments, either on this blog or (preferred) on the TreeBind mailing list.