A short URI on Amazon.com for our book

I am impressed! I wasn’t aware that such things did exist… Amazon.com has been kind enough to give us a short URI for our book Professional Web 2.0 Programming. This URI is http://www.amazon.com/web2-0thebook/.

This is a short and cool URI indeed and if, being a cool URI, it doesn’t change our book will always remain THE Web 2.0 book for Amazon.com!

However, as I pasted it my Web browser, I noticed that this short and cool URI was immediately replaced by http://www.amazon.com/Professional-Web-Programming-Eric-Vlist/dp/0470087889/.

Being suspicious, I tried:

vdv@grosbill:/tmp $ curl -D - -A  "Mozilla/4.0"  http://www.amazon.com/web2-0thebook/
HTTP/1.1 301 Moved Permanently
Date: Fri, 17 Nov 2006 21:05:22 GMT
Server: Server
Set-Cookie: skin=; domain=.amazon.com; path=/; expires=Wed, 01-Aug-01 12:00:00 GMT
Location: http://www.amazon.com/Professional-Web-Programming-Eric-Vlist/dp/0470087889/
Vary: User-Agent
Content-Length: 0
Content-Type: text/plain
nnCoection: close

A 301 HTTP response code! This code is meant for obsolete resources:

10.3.2 301 Moved Permanently

The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise. The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).

Amazon.com have given us an obsolete URI! I would have much preferred a 302 (FOUND) which seems to be exactly our situation, a 303 (See Other) or even a 307 (Temporary Redirect) since none of these codes carries this meaning of a URI that should no longer be used.

I was also wondering if this URI can be used with Amazon partners tags and I tried:

vdv@grosbill:/tmp $ curl -D - -A  "Mozilla/4.0" http://www.amazon.com/web2-0thebook/?tag=<mytag>
HTTP/1.1 301 Moved Permanently
Date: Fri, 17 Nov 2006 21:07:28 GMT
Server: Server
Set-Cookie: skin=; domain=.amazon.com; path=/; expires=Wed, 01-Aug-01 12:00:00 GMT
Location: http://www.amazon.com/Professional-Web-Programming-Eric-Vlist/dp/0470087889/
Vary: User-Agent
Content-Length: 0
Content-Type: text/plain
nnCoection: close

There might other solutions, but the answer seems to be « no »: the query string is stripped during the redirect and the cookie which is set doesn’t carry this information either. This means that when we use this short URI, we are not treated as Amazon.com affiliated sites.

Anyway, I shouldn’t be that picky! Thanks Amazon.com, this is a short and cool URI even if its implementation could be improved :) …

Web 2.0, professional… and Fun!

Phew! said Danny Ayers, relieved said Erik Bruchez. Our upcoming Web 2.0 book is written and it’s been both hard work but also fun.

This book is a long story, almost as long as my interest for Web 2.0…

A long time Web and XML expert, the marketing around Web 2.0 kept me away for a while and I didn’t bother to take a look at what is behind the smoke before December 2005 when the business networking initiative sparklingPoint Networking invited me to present my analysis of Web 2.0.

I have published my presentation both in English on my blog and in French on XMLfr. The French version has rapidly become the most read article on XMLfr and one of the reference definitions of Web 2.0 in French.

These documents go through the so called social and technical layers of Web 2.0 and note that yes, Web 2.0 is nothing but a new term to designate the Web as it was meant to be. Does that make the word pointless? I don’t think so. Nobody can deny that the web is changing and finding a word to name this new web is most useful.

One of the things that really stuck me was the number of technologies involved in Web 2.0 applications. Web 2.0 is anything but a coherent platform! There is nothing wrong in using a whole set of technologies except that you need to keep the vision of the big picture. This goes against another tendency which is to specialise people and resources.

Most of the technologies that constitute the Web 2.0 technology stack, from (X)HTML to databases through CSS, Javascript or HTTP, are getting more complex everyday and most of the books available focus on only one of these technologies. This does not only mean that you need to buy a full bookshelf if you want to cover the Web 2.0 stack, but also that most of these books hardly cover how these technologies can be used together.

Web 2.0 Professional Programming is born from this analysis and tries to give the big picture by presenting the whole Web 2.0 stack.

Erik kindly wrote that this book owes everything to Eric van der Vlist, who provided the vision, outline, and much more over those last few months. I’d rather say that I am the only responsible for the incoherences and misses that you might find in the outline and the writers (including me) are responsible for the good things that you’ll find in their chapters!

This book also owes a lot to the friends which have helped me to write the outline before I submitted it to publishers, to Jim Minatel our Senior Acquisitions Editor who believed in the project from the very beginning, to Sara Shlaer our editor and to Micah Dubinko our tech reviewer.

Both the book and the writing experience turned out to be quite different from what I had expected.

When I first submitted my outline to several publishers I had planed to write this book, like my previous books, alone and over a period of twelve to eighteen months. If Jim Minatel was quite happy with my original outline, he was less than appealed by my agenda! WROX has a lot of experience with multi-authored books and he gently convinced me to build a team to cut the delays.

The team has been constituted with a mix of authors that I knew and authors who were previous WROX authors and this has been a good decision: a more homogeneous team would probably not have been able to provide the diversity of points of view that you will find in this book.

The authors were spread between the Silicon Valley, UK, Italy, Switzerland and France and I had anticipated that communication could be an issue. To facilitate the writing, I have setup a bunch of internet based goodies including a mailing list, an IRC channel, a wiki, a subversion repository to share our prose and code samples and an external web site.

This has proven to work very efficiently and Jim wrote: What I really love about working with this gang of authors […] is how relentless they are about communicating and collaborating with each other.

The mailing list has been widely used all over the period and it has been critical to keep contact between the whole team. The wiki has been very helpful to finalize the outline. One of my regrets is that we’ve not used it to edit the whole book… My previous book (RELAX NG) has been edited on a wiki and transformed into DocBook before publication and I have found that very handy. I am pretty confident that we could have used the same method to edit this book but that was breaking too much of the WROX policies and we’ve not followed that way. The subversion repository has been used by most writers more like a backup than as a repository but here again it was not aligned on the WROX policies. We’ve used the IRC only once to solve a controversial issue on the outline.

The writing itself went very smoothly with everyone doing his best to be on schedule. Joe Fawcett wrote in his blog: One of the great things about writing is how much you learn. It’s easy to pick a passing knowledge of a subject but when have to write about it and provide working code examples then you really need to burrow down and learn. I couldn’t agree more with his statement and have been surprised again to see how much you learn by writing!

The main difficulty of this book is that since its goal is to give the « big picture », it needs to be coherent between chapters which is always difficult for multi-authored books. Also, since the agenda was very tight, we couldn’t afford to spend too much time building a very detailed outline. Keeping things coherent has been the job of Sara and Micah and they’ve done a good job checking the cross references, redundancies and other incoherences.

The other difficulty is that the target audience are « professional developers with no prior experience of Web 2.0 » and deciding what the prerequisites for this book are was quite subjective. On one hand, we would like this book to be a central place where people can find most of what they need to write Web 2.0 applications. On the other hand, we couldn’t afford to introduce each of the technologies from scratch. We’ve done our best to guess what most of you already know about web technologies at large, but the result may sometimes seem arbitrary. For instance, we’ve presented HTTP from scratch because we think that this is an area where there are still a lot of misconceptions, but we’ve assumed that our readers are already somewhat familiar with HTML. Your reviews will tell if we need to change this in future editions!

Even if I knew that Sara and Micah were carefully tracking inconsistencies, I was rather anxious to know what the book would like too as a whole and as soon as I got enough chapters written and some extra time, I started reviewing as a whole. I am rather happy with the result and think that this should be a useful resource for web developers. The only regret I have is about the number of programming languages we’ve covered.

Server side, there isn’t a programming language of choice to write Web 2.0 applications and my attempt was that the book should not only be as agnostic as possible but also provide examples using as many different programming languages as possible. I think that the book can be considered agnostic, but the examples are not using as many languages as I would have liked: Java, C# and PHP represent most of the examples and Python, Perl or Ruby users may feel frustrated.

Still, I really believe that they can easily understand the examples in this book. A great way to follow the explanations given in the book is of course to try the examples. A still healthier exercise is to translate these examples in your favorite programming language. If you do so, I strongly encourage you to post these translations on the forum dedicated to this book on the WROX web site.

Writing this book has ben fun and I hope that reading it will be an enjoyable experience too!

YUI and XHTML

Update: Good news, Matt Sweeney from Yahoo! answered that they are in the process of rolling in XHTML support. This issue should thus rapidly become an old story!
That’s probably well known but I have been surprised to see that the Yahoo! UI Library doesn’t support XHTML or rather that it supports only XHTML documents that pretend to be HTML!

If you use YUI with XHTML documents served as they should be served with a application/xhtml+xml media type, you’ll see strange errors pop up. This is because YUI uses the HTML DOM inner property to insert HTML elements which names are uppercase and HTML entities which are not built-in in XHTML.

It appears that the issue has already been mentioned in the ydn-javascript mailing list that supports the YUI and a fix has even been proposed in July by Laurens Holst.

For whatever reason, the YUI team which is usually pretty reactive on bug reports doesn’t seem to be interested by this issue and there has been no answer to this suggestion.

The only answer (so far) to my own post on the same subject came from another user and is interesting to read:

I think you’re confused. No versions of IE (even IE7) support this, for example. http://www.w3.org/People/mimasa/test/xhtml/media-types/results

In other words, I must be wrong if I care about a web standard that isn’t implemented in Internet Explorer!

I wouldn’t go as far as Karl Dubost who has decided to server all his pages with an application/xhtml+xml media type (link in French) without bothering upsetting Internet Explorer users to whom he kindly advises to use another browser, but it is quite easy to setup a web server so that XHTML pages are served with the right media type to compliant browsers and as text/html to other browsers…

Easy, except that YUI doesn’t work any longer and that you would have similar issues with other scripts such as those from Google Ads.

What I find most disappointing is that YUI seems to get pretty much everything right except for this « detail ». For instance, I like very much the way you can, with the YUI, produce clean XHTML code that will display correctly in any browser (including text based browsers) and animate this page for recent graphical browsers. This feature means that, with sufficient care, you can use the YUI to produce accessible applications which happens to be an issue for most Web 2.0 applications.

Let’s hope that the YUI team will change their mind and decide that, after all, being conform to the standard isn’t optional!

XHTML 2.0 and HTML 5: The figures

This post has been updated to take into account a mail from Björn Höhrmann with a heads-up about missing elements in the XHTML 2.0 list of elements.

The future of (X)HTML appears to be searching its way between two conflicting visions:

I have posted my views on the subject on XML-DEV and have been surprised by the answer from Björn Höhrmann. The server hosting XML-DEV and its archives is currently down but you can see this answer in Google’s cache.

The point I have found most surprising are his statistics: « XHTML 2 increases the element count by 50% compared to XHTML 1.0 Strict, and by 10% compared to HTML 2.0, HTML 3.2, HTML 4.01, and XHTML 1.1 combined, including the Frameset and Transitional variants. »

Other chapters from our upcoming Web 2.0 book kept me too busy to double check these figures but we have decided to mention this debate in our Chapter 5 and I really needed to analyse these statistics in more detail.

My sources for this exercise are:

The data concerning XHTML 2.0 is the consolidation between the list of XHTML 2.0 elements included in the Working Draft, the RELAX NG schema and the W3C XML Schema for XForms. This is needed because the list of elements is a simplified list where XForms and Ruby sub-elements are not included (see my mail to the HTML Working Group for more details). Many thanks to Björn Höhrmann for pointing that out.

By scraping these pages, I have extracted a consolidated list of elements that can be represented by the following table where in each cell you find the module into which the element belongs for the corresponding (X)HTML version or the mention « deprecated » if the element is deprecated:

Element HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
a Core Hypertext Hypertext Phrase
abbr Core Text Text Phrase
access Access
acronym Core Text
action XForms
address Core Text Structural Sections
alert XForms
applet Deprecated Deprecated
area Core Client-side Image Map
article Sections
aside Sections
b Core Presentation
base Core Base Document metadata
basefont Deprecated Deprecated
bdo Core Bi-directional Text Phrase
big Core Presentation
bind XForms
blockcode Structural
blockquote Core Text Structural Sections
body Core Structure Document Sections
br Core Text Phrase
button Core Forms
caption Core Tables Tables
case XForms
center Deprecated Deprecated
choices XForms
cite Core Text Text Phrase
code Core Text Text Phrase
col Core Tables Tables
colgroup Core Tables Tables
command Interactive
copy XForms
datagrid Interactive
dd Core List List Lists
del Core Edit Edits
delete XForms
details Interactive
dfn Core Text Text Phrase
di List
dir Deprecated Deprecated
dispatch XForms
div Core Text Structural
dl Core List List Lists
dt Core List List Lists
em Core Text Text Phrase
ev:listener XML Events
event-source Server-sent DOM events
extension XForms
fieldset Core Forms
filename XForms
font Deprecated Deprecated
footer Sections
form Core Forms
frame Frames Frames
frameset Frames Frames
group XForms
h Structural
h1 Core Text Structural Sections
h2 Core Text Structural Sections
h3 Core Text Structural Sections
h4 Core Text Structural Sections
h5 Core Text Structural Sections
h6 Core Text Structural Sections
handler Handler
head Core Structure Document Document metadata
header Sections
help XForms
hint XForms
hr Core Presentation Paragraphs
html Core Structure Document HTML documents and document fragments
i Core Presentation Phrase
iframe Core Iframe
img Core Image Image content[TBW]
input Core Forms XForms
ins Core Edit Edits
insert XForms
instance XForms
isindex Deprecated Deprecated
item XForms
itemset XForms
kbd Core Text Text Phrase
l Text
label Core Forms List
legend Core Forms
li Core List List Lists
link Core Link Metainformation Document metadata
load XForms
m Phrase
map Core Client-side Image Map
mediatype XForms
menu Deprecated Deprecated Interactive
message XForms
meta Core Metainformation Metainformation Document metadata
meter Phrase
model XForms
nav Sections
nl List
noframes Frames Frames
noscript Core Scripting Scripting
object Core Object Object
ol Core List List Lists
optgroup Core Forms
option Core Forms
output XForms
p Core Text Structural Paragraphs
param Core Object Object
pre Core Text Structural Preformatted text
progress Phrase
q Core Text Text Phrase
range XForms
rb Ruby
rbc Ruby
rebuild XForms
recalculate XForms
refresh XForms
repeat XForms
reset XForms
revalidate XForms
rp Ruby
rt Ruby
rtc Ruby
ruby Ruby
s Deprecated Deprecated
samp Core Text Text Phrase
script Core Scripting Scripting
secret XForms
section Structural Sections
select Core Forms XForms
select1 XForms
send XForms
separator Structural
setfocus XForms
setindex XForms
setvalue XForms
small Core Presentation Phrase
span Core Text Text Phrase
standby Object
strike Deprecated Deprecated
strong Core Text Text Phrase
style Core Style Sheet Style Sheet Document metadata
sub Core Presentation Text Phrase
submission XForms
submit XForms
summary Tables
sup Core Presentation Text Phrase
switch XForms
t Phrase
table Core Tables Tables
tbody Core Tables Tables
td Core Tables Tables
textarea Core Forms XForms
tfoot Core Tables Tables
th Core Tables Tables
thead Core Tables Tables
title Core Structure Document Document metadata
toggle XForms
tr Core Tables Tables
trigger XForms
tt Core Presentation
u Deprecated Deprecated
ul Core List List Lists
upload XForms
value XForms
var Core Text Text Phrase

The total numbers of elements are :

HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
Number of elements 91 91 115 63

Now, it should be noted that we are not comparing apples to apples: HTML 4.01 and XHTML 1.x include a number of deprecated elements that shouldn’t be used. They also include frames elements that have been taken out from XHTML 2.0 to be defined in the XFrames specification and are not part of HTML 5 either. It seems fair to remove all these elements from our numbers and that gives:

HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
Number of non deprecated elements 81 81 115 63
Number of non deprecated non frames elements 78 78 115 63

These figures confirm the increase of almost 50% between HTML 4.01 or XHTML 1.1 and XHTML 2.0 mentioned by Björn Höhrmann and it is worth searching where the increase comes from. If you look at the different modules in this table, you’ll see that whereas HTML 4.01 and XHTML 1.1 include 10 elements from their Forms module, XHTML 2.O includes 46 XForms elements. The increase in the number of elements comes entirely from the XHTML 2.0 Xforms support and there is an actual decrease in the number of elements in the other modules.

Furthermore to compare with HTML 5.0, you also need to remove table elements which are not yet defined in HTML 5.0 and the figures are quite different:

HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
Number of non deprecated elements 78 78 115 63
Number of Forms or XForms elements 10 10 46 0
Number of non deprecated non frames non forms elements 68 68 69 63
Number of tables elements 10 10 11 0
Number of non deprecated non frames non forms non tables elements 58 58 58 63

In other words, the debate of whether XHTML 2.0 is a simplification can be split into two different points:

  • The number of elements for the classical non forms related features is the same between HTML 4.01 and XHTML 1.1 and XHTML 2.0.
  • The replacement of the Forms module by XForms represente a complete paradigm change that undeniably leads to more complexity and an increase in the number of elements.

The last line shows that there is an actual increase in the number of elements between HTML 4.01 or XHTML 1.1 and HTML 5. If you look in the overall table, you’ll notice that this increase is due to the addition to quite a number of new elements that is compensated by removing elements that have been considered as either almost duplicated (for instance acronym has been removed and people advised to use abbr for both acronyms and abbreviations) or not very useful.

Of course, number of elements are 100% representative of the complexity of a vocabulary, but they give a good indication and the figures given by Björn did deserve some further analysis.

PS: I have sent an answer to XML-DEV that may find its way when their server will be up again.

PPS: I recommend reading Björn Höhrmann mails to the www-html@w3.org mailing list as a complement to this blog entry:

Bitten by text html for XHTML documents

The W3C « XHTML media types » note mentions that:

XHTML documents served as ‘text/html’ will not be processed as XML [XML10], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see C.11 and C13 of [XHTML1] respectively).

I have been bitten by this rule while developing the « Hello World » application that will illustrate the first chapter of our upcoming Web 2.0 book.

In this sample application, I am using Javascript to fill information in XHTML elements that act as place holders and noticed that updating elements could sometimes lead to erasing their following siblings:

[<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Repro case</title>
<script type="text/javascript">

function init()  {
	// here, we still have a p[@id='bar'] element
	alert("bar: " + document.getElementById("bar"));
	document.getElementById("foo").innerHTML="foo";
	// but now, the p[@id='bar'] element has disappeared...
	alert("bar: " + document.getElementById("bar"));
}

</script>
</head>
<body onload="init()">
    <div id="foo"/>
    <p id="bar"/>
</body>
</html>

One of the things that I found most surprising is that the three browsers I was testing (Firefox, Opera and Mozilla) showed the same « bug ».

It took me a while to understand that the behaviour is dictated by the media type associated to the document: when the media type is « text/html », the document is interpreted as HTML despite its XML declaration and the trailing slash in the div start tag is ignored. The document body seen by the browser is thus equivalent to:

<body onload="init()">
    <div id="foo">
      <p id="bar"></p>
    </div>
</body>

The p element which is a following sibling of the div element in XML becomes a child of the div element in HTML mode!

In Firefox or Opera, the clean way to fix that would be to send the proper media type (application/xhtml+xml) but unfortunately Internet Explorer doesn’t support it.

A workaround is to avoid using empty tags in XHTML and a comment can be included if you want to make sure that no badly behaved editor will minimise your document:

<body onload="init()">
    <div id="foo"><!-- --></div>
    <p id="bar"><!-- --></p>
</body>
            

Note that this isn’t necessary for the p element but that it doesn’t do any harm and looks more consistent.

Traduction automatique et survie des dinosaures

En 1998, j’étais responsable du support européen de deuxième niveau chez Sybase, pas peu fier d’être partenaire officiel de la coupe du monde. Compte tenu de la visibilité de l’évènement, nous étions tous sur le pied de guerre et d’astreinte 24h/24 et 7j/7.

Pour détendre l’atmosphère, j’avais eu l’idée bizarre de faire traduire la page d’accueil du site Sybase.com par le moteur de recherche qui dominait le marché, AltaVista qui n’avait pas encore son nom de domaine!

Je n’ai pas réussi à retrouver le texte d’origine sur web.archive.org mais je me souviens de notre hilarité devant une traduction dans laquelle des millions de ventilateurs (fans) se pressaient pour aller voir les allumettes (matches) de la tasse du monde (world cup).

Huit ans après, en cette nouvelle période de tasse du monde et alors que les ventilateurs se déchaînent à nouveau pour voir les allumettes, j’ai eu l’idée de m’assurer des progrès accomplis par les moteurs de traduction automatique en leur demandant de traduire la phrase « Millions of fans follow each match of the World cup ».

Chez Altavista / Babel Fish, les ventilateurs se passionnent toujours pour les allumettes de la tasse du monde : « Les millions de ventilateurs suivent chaque allumette de la tasse du monde ».

Google nous épargne les allumettes mais ne fait pas beaucoup mieux pour le reste : « Les millions de ventilateurs suivent chaque match de la tasse du monde ».

Ces essais ont de quoi entamer la belle assurance avec laquelle j’affirme volontiers que les technologies d’analyse du langage naturel font de gros progrès!

Souvenez-vous, en 98 on parlait à peine de Linux, nos ordinateurs tournaient sous Windows 95, on explorait le Web 0.9 avec Nestape 4 ou IE 4 et on commençait à trembler à cause du bug de l’an 2000…

Les seuls dinosaures du vingtième siècle à avoir résisté à tout ces bouleversements seraient-ils les logiciels de traduction automatique?

RELAX NG and W3C XML Schema compared (continued)

A lot of comparisons have already been published on this topic, but there are still plenty of misunderstanding when comparing W3C XML Schema so called Object Oriented features with RELAX NG patterns.

Many people complain that RELAX NG does not support complex type derivation nor substitution groups.

There are two ways to look at these features:

  1. If you focus on validation, these are ways to define sets of valid instance fragments.
  2. If you focus on modeling, these are ways to define design patterns and declare to potential applications what kind of relations exist between definitions.

RELAX NG (and DSDL in general) focuses on validation and its built in features provide equivalences to W3C XML Schema features in term of validation only.

Let’s see what this means on a simple example.

Derivation by extension

XW3C XML Schema:

   <xs:complexType name="BaseType">
        <xs:sequence>
            <xs:element name="FirstName" type="xs:token"/>
            <xs:element name="LastName" type="xs:token"/>
            <xs:element name="Mail" type="xs:token" minOccurs="0"/>
        </xs:sequence>
    </xs:complexType>

    <xs:complexType name="ExtendedType">
        <xs:complexContent>
            <xs:extension base="BaseType">
                <xs:sequence>
                    <xs:element name="Password" type="xs:token"/>
                </xs:sequence>
            </xs:extension>
        </xs:complexContent>
   </xs:complexType>            

The equivalent schema in RELAX NG is (compact syntax):

BaseType =
  element FirstName { xsd:token },
  element LastName { xsd:token },
  element Mail { xsd:token }?

ExtendedType =
  BaseType,
  element Password { xsd:token }
            

Or (XML syntax):

  <define name="BaseType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
    <optional>
      <element name="Mail">
        <data type="token"/>
      </element>
    </optional>
  </define>

  <define name="ExtendedType">
    <ref name="BaseType"/>
    <element name="Password">
      <data type="token"/>
    </element>
  </define> 

A derivation by extension translates in RELAX NG by creating a new pattern that adds content after a reference to the base pattern.

Derivation by restriction

XW3C XML Schema:

    <xs:complexType name="RestrictedType">
        <xs:complexContent>
            <xs:restriction base="BaseType">
                <xs:sequence>
                    <xs:element name="FirstName" type="xs:token"/>
                    <xs:element name="LastName" type="xs:token"/>
                </xs:sequence>
            </xs:restriction>
        </xs:complexContent>
    </xs:complexType>

The equivalent schema in RELAX NG is (compact syntax):

RestrictedType =
  element FirstName { xsd:token },
  element LastName { xsd:token }
            

Or (XML syntax):

  <define name="RestrictedType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
  </define> 

A derivation by restriction translates in RELAX NG by creating a new pattern that contains a definition that is a restriction of the base pattern.

Substitution groups

XW3C XML Schema:

    <xs:element name="Root">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="Head"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <xs:element name="Head" type="BaseType"/>

    <xs:element name="Restricted" type="RestrictedType" substitutionGroup="Head"/>

    <xs:element name="Extended" type="ExtendedType" substitutionGroup="Head"/>

The equivalent schema in RELAX NG is (compact syntax):

Head = element Head { BaseType }
Head |= element Restricted { RestrictedType }
Head |= element Extended { ExtendedType }
start = element Root { Head  }
            

Or (XML syntax):

   <define name="Head">
    <element name="Head">
      <ref name="BaseType"/>
    </element>
  </define>
  <define name="Head" combine="choice">
    <element name="Restricted">
      <ref name="RestrictedType"/>
    </element>
  </define>
  <define name="Head" combine="choice">
    <element name="Extended">
      <ref name="ExtendedType"/>
    </element>
  </define>
  <start>
    <element name="Root">
      <ref name="Head"/>
    </element>
  </start> 

A substitution group translates in RELAX NG by combining by choice the definition of the head of the substitution group with the definitions of the group members.

What did we miss

These schemas can be considered equivalent because they validate the same set of instance documents (with the difference that the RELAX NG schemas do not allow xsi attributes).

The main difference is that the relation between the base and derived types and between the members of the substitution group is made explicit in W3C XML Schema and is implicit in RELAX NG.

For the derivation by extension and substitution groups, the design patterns used in RELAX NG (content added after a reference for an extension and combination by choice of an element definition) could be considered characteristic enough so that tools can automatically detect them.

For the derivation by restriction, there isn’t much in the RELAX NG schema that could inform a tool that RestrictedType is a restriction of BaseType.

To make these relations or design patterns explicit, it is very easy to use annotations.

A complete schema with annotations for all three design patterns could be (compact syntax):

namespace oo = "http://ns.xmlschemata.org/object-orientation/"

BaseType =
  element FirstName { xsd:token },
  element LastName { xsd:token },
  element Mail { xsd:token }?

[ oo:extends = "BaseType" ]
ExtendedType =
  BaseType,
  element Password { xsd:token }

[ oo:restricts = "BaseType" ]
RestrictedType =
  element FirstName { xsd:token },
  element LastName { xsd:token }

Head = element Head { BaseType }

[ oo:substitutionGroup = "Head" ]
Head |= element Restricted { RestrictedType }

[ oo:substitutionGroup = "Head" ]
Head |= element Extended { ExtendedType }

start = element Root { Head }

            

Or (XML syntax):

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns:oo="http://ns.xmlschemata.org/object-orientation/"
  xmlns="http://relaxng.org/ns/structure/1.0"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <define name="BaseType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
    <optional>
      <element name="Mail">
        <data type="token"/>
      </element>
    </optional>
  </define>
  <define name="ExtendedType" oo:extends="BaseType">
    <ref name="BaseType"/>
    <element name="Password">
      <data type="token"/>
    </element>
  </define>
  <define name="RestrictedType" oo:restricts="BaseType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
  </define>
  <define name="Head">
    <element name="Head">
      <ref name="BaseType"/>
    </element>
  </define>
  <define name="Head" combine="choice" oo:substitutionGroup="Head">
    <element name="Restricted">
      <ref name="RestrictedType"/>
    </element>
  </define>
  <define name="Head" combine="choice" oo:substitutionGroup="Head">
    <element name="Extended">
      <ref name="ExtendedType"/>
    </element>
  </define>
  <start>
    <element name="Root">
      <ref name="Head"/>
    </element>
  </start>
</grammar>

            

These annotations would be (as any annotation) ignored by RELAX NG processors but can be used by tools that need to understand the relation between type and element definitions (such as binding tools). These tools could also enforce the rules defined by W3C XML Schema and check that restrictions are actual restrictions (a number of papers have been published explaining how this can be implemented).

It should also be noted that annotations can be used to identity other design patterns than those implemented by W3C XML Schema.

References

This post is a consolidation of mails sent on the XML-DEV mailing list: [1] [2] [thread]

Client side XSLT brings live to static HTML pages and microformats

I am making all kind of tests for the chapter about multimedia of our upcoming Web 2.0 book and as it is often the case when I am writing, this is sparkling a number of strange ideas.

I was exploring the similarities between playlists, podcasts and SMIL animation when it occurred to me that it might be interesting to see what can be done with microformats.

Although the relEnclosure proposal still needs some polishing (for instance, it mentions that Atom requires a length on enclosures but do not define a way to express this length), the result would be something such as:

      <div class="hfeed">
         <h1>SVG en quinze points</h1>
         <div class="hentry">
            <h2 class="hentry-title">
               <a
                  href="http://xmlfr.org/documentations/articles/i040130-0001/01%20-%20C'est%20base%20sur%20XML.mp3"
                  rel="bookmark" title="...">C'est basé sur XML</a>
            </h2>
            <p class="hentry-content">By <address class="vcard author fn">Antoine Quint</address> -
                  <abbr class="updated" title="2004-01-30T00:00:00">2004-01-30T00:00:00</abbr>
            </p>
            <p>[<a
                  href="http://xmlfr.org/documentations/articles/i040130-0001/01%20-%20C'est%20base%20sur%20XML.mp3"
                  rel="enclosure">download</a>] (<span class="htype">audio/mpeg</span>, <span
                  class="hLength">231469</span> bytes).</p>
         </div>
 .
 .
 .
      </div>        

[hatom.xhtml]

I am not a microformat expert and I have been surprised to see that this document is actually much harder to write than the corresponding Atom document. It probably contains lots of errors and if you spot one of them, thanks to report it as a comment.

This is nice, but probably not what users would expect for a Web 2.0 application. For one thing, this page is static and lacking all the bells and whistles of a Web 2.0 application. For instance, we might want to use one of the techniques exposed by Mark Huckvale to play the audio in the web page itself.

For this, we would need to modify the document and entries could become:

                 <div class="hentry">
                        <h2 class="hentry-title">
                              <a
                                    href="http://xmlfr.org/documentations/articles/i040130-0001/01%20-%20C'est%20base%20sur%20XML.mp3"
                                    rel="bookmark" title="...">C'est basé sur XML</a>
                        </h2>
                        <p class="hentry-content">By
                              <address class="vcard author fn">Antoine Quint</address> - <abbr
                                    class="updated" title="2004-01-30T00:00:00"
                              >2004-01-30T00:00:00</abbr>
                        </p>
                        <p>[<a
                                    href="javascript:play(&#34;http://xmlfr.org/documentations/articles/i040130-0001/01%20-%20C'est%20base%20sur%20XML.mp3&#34;);"
                                    rel="enclosure">play</a>] (<span class="htype"
                              >audio/mpeg</span>, <span class="hLength">231469</span> bytes).</p>
                  </div>
            

[hatom-decorated.xhtml]

This is not very different, but the links with rel= »enclosure » have been replaced by a call to a Javascript function and this is enough to loose the semantic of the microformat since we obfuscate the enclosure’s URL.

We have thus a situation where the document that we want to server is different from the document that we want to display client side and that’s a typical use case for client side XSLT.

The trick is to write a simple transformation that makes the static page synamic:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.w3.org/1999/xhtml" xmlns:x="http://www.w3.org/1999/xhtml" version="1.0"
    exclude-result-prefixes="x">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" cdata-section-elements="x:style x:script"/>
    <xsl:strip-space elements="*"/>
    <xsl:preserve-space elements="x:script x:style"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="x:head">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
            <style type="text/css"><![CDATA[

#player {
    padding: 10px;
    background-color: gray;
    position:fixed;
    top: 20px;
    right:10px
}

                    ] ]></style>
            <script type="text/javascript"><![CDATA[

function play(surl) {
  document.getElementById("player").innerHTML=
    '<embed src="'+surl+'" hidden="false" autostart="true" loop="false"/>';
}

                ] ]></script>
                </xsl:copy>
                </xsl:template>

            <xsl:template match="x:body">
                <xsl:copy>
                    <xsl:apply-templates select="@*|node()"/>
                    <div id="player">A media player<br/>will pop-up here.</div>
                </xsl:copy>
            </xsl:template>

            <xsl:template match="x:a[@rel='enclosure']/@href">
                <xsl:attribute name="href">
                    <xsl:text>javascript:play("</xsl:text>
                    <xsl:value-of select="."/>
                    <xsl:text>");</xsl:text>
                </xsl:attribute>
            </xsl:template>

            <xsl:template match="x:a[@rel='enclosure']/text()">
                <xsl:text>play</xsl:text>
            </xsl:template>

</xsl:stylesheet>

            

[decorateMf.xsl]

And add a xsl-stylesheet PI to the static (microformat) page:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="decorateMf.xsl" type="text/xsl"?>
<html xmlns="http://www.w3.org/1999/xhtml">
.
.
.
</html>
            

This is working fine for me (GNU Linux/Ubuntu, Firefox 1.5) and the mplayer plug-in nicely pops up in the player div when I click on one of the « play » links but it would require a bit of polishing to work in other browsers:

  • The page crashes Opera 9.0 (I have entered a bug report and have been contacted back by their tech support who is already working on the issue).
  • The XSLT output method needs to be changed to HTML to work in Internet Explorer (otherwise the result is displayed as a XML document). Furthermore, IE inserts the embed element as text in the player div and you might need to use a proper DOM method to insert the embed element as a DOM node.

[Try it!]

There are probably a number of other (easier?) solutions for the specific problem I have solved here. However, this is an interesting pattern to apply in situations where you want to serve a clean document that needs to be altered to display nicely in a browser.

XSLT has sometimes been described as a « semantic firewall » that removes the semantic of XML documents to keep only their presentation. I like to think at this technique as a semantic « anti-firewall » or « tunnel » that keeps the semantic of XML documents intact until the very last stage before it hits the browser’s rendering engine…

Too many SVG profiles

Our upcoming Web 2.0 book is giving me the opportunity to have a closer look to the state of SVG.

After all kind of announcements for native SVG support in browsers, I was expecting that with my new Ubuntu Dapper distribution, SVG would be really easy to display.

The first thing I have tested is to display the clock that animates the front page of XMLfr: <em> You need either a browser that supports SVG or a <a href="http://www.adobe.com/svg/viewer/install/">SVG plug-in</a> to display this image. </em> [download] in Firefox.

First test, first disappointment: the text « Réalisé en SVG » doesn’t show up in Firefox. This text is displayed on a path using a textPath element which isn’t supported by Firefox.

Beginning to wonder if all that would be as easy as I had thought, I have developed a sample document showing the relations between the tags in the RSS channel of the book site.

I wanted to show the level of animation that can be done declaratively without a single line of Javascript and I have used the « set » element.

Second test, second deception: this was just not working.

Thinking that I needed to do more exhaustive tests, I decided to install the Abode SVG plugin which, fortunately is quite easy if you switch the native SVG support in Firefox using about:config as explained on Mozillazine. A very cool feature is that you can switch between native and plug-in support trough clicking on « svg.enabled » option without having to restart the browser.

After more tests and the very helpful mouseEvents SVG sample, I came to the conclusion that no implementation, including Adobe SVG plug-in, supports the « mouseover » and « mouseout » events correctly and switched to using « mousedown » and « mouseup » instead.

The result is a SVG document which (I think) is perfectly valid but works only with the Adobe SVG plug-in:

<em> You need either a browser that supports SVG or a <a href="http://www.adobe.com/svg/viewer/install/">SVG plug-in</a> to display this image. </em>
[download]

This SVG document works fine with the Adobe plug-in but doesn’t work with any of the other implementations that I have tested. Note that it is almost working with Opera 9.0b2 but this implementation doesn’t seem to support « set » elements on groups: if I move the « set » elements to the individual shapes I can get it working with Opera.

The full test report is a below:

Test Firefox 1.5.0.4 (native mode) Adobe SVG viewer 3.01 beta 3 Opera 9.0b2 Konqueror 3.5.2 Amaya 8.5 X-smiles 1.0alpha1
SVG clock [download] No support for « textPath »: the text doesn’t show. OK OK No animation The document is reported as non well formed! No animation
Tags [download] No support for « set »: no link are displayed and nothing happens when you click on a tag. OK No support for « set »: no link are displayed and nothing happens when you click on a tag. No support for « set » not for the visibility attributes: all the links are always displayed. Furthemore, the browser crashes after a while when this document is left opened. No support for « set » not for the visibility attributes: all the links are always displayed. The « text-anchor: middle » property doesn’t work either. Crashes when there is a DOCTYPE declaration in the SVG document and throws an exception « Simple duration is not defined for this animation element » probably due to the fact that the set elements do not have durations when the DOCTYPE is removed.

Also, the media type « image/svg+xml » seems to be a problem for the Adobe plug-in in Firefox even if, curiously, this isn’t systematic.

I could probably get this sample working on most of these implementations by switching to Javascript animation and carefully testing against each of them, but is that something we really want to do again?

Wasn’t SVG supposed to be interoperable? The current situation reminds me on the contrary of the worse period of the browsers war even if I have no doubt that this time there are no political reasons behind that.

Mozilla explains that Firefox SVG is a subset of SVG 1.1, but not any of the official profiles (Tiny, Basic, Full).

Other implementations have probably similar policies and I can understand their reasons. However, I am wondering if these partial implementations do not hurt SVG more than they help.

The commonality between them is that, except Konqueror when it core dumps, they all fail silently when they encounter a feature that they do not support leaving users with the feeling that the document they are viewing is bogus.

When a user with a browser that has no support for SVG finds a SVG document, she/he is invited to load a plug-in. When her/his browser has one of these partial supports, she/he just moves away.

Web 2.0 at XML Prague

This coming week-end, I’ll have the pleasure to be at XML Prague, a small and friendly XML conference in a wonderful city.

This year, I’ll leave out my usual XML schema languages expert hat to speak on two topics:

  • An experience to define a RDF/XML Query By Example language. This presentation relates a very cool project that I am developing for one of my customers (INSEE) and that I have also presented at Extreme Markup Languages last year. It is very on topic with the focus of XML Prague this year which is « XML Native Databases and Querying XML ».
  • Web 2.0: myth and reality, a presentation derived from the blog entry with the same title. Even though people could probably argue that Web 2.0 is about making a web that can be queried, this talk will probably be felt more out of topic. I hope it will still be well received and look forward to delivering it in Prague.

XML Prague 2005 had also been an opportunity to see Prague that I hadn’t seen since… 1981… (I can tell you that so many things had changed that I could hardly recognize the city) and also to meet many members from an active and creative Eastern European XML community with whom I had often exchanged emails but had had few opportunities to meet face to face.

I have no doubt XML Prague 2006 will be as fun as its preceding issue.