Internet – Page 6 – Eric van der Vlist

The influence of microformats on style-free stylesheets

It’s been a while, almost six years, since I have written my Style-free XSLT Style Sheets piece for XML.com but this simple technique remains one of my favorite.

It has not only been my first article published on XML.com but also the subject of my first talk in an IDEAlliance XML conference and it’s fair to say that it as been instrumental to launch my career of « international XML guru ».

Despite all that, this technique remains my favorite because for its efficiency. I am using it over and over. To generate (X)HTML but also many other XML vocabularies. I have been using it to generate vocabularies as different as OpenOffice documents and W3C XML Schemas. The more complex is the vocabulary to generate, the more reasons you have to keep it outside your XSLT transformations and the more efficient style-free stylesheets are.

Style-free stylesheets have become a reflex for me and that’s without even thinking about them that I have written a style-free stylesheet to power the web site of our upcoming Web 2.0 book.

In my antique XML.com paper, I had been using specific, non XHTML elements:

        <td width="75%" bgcolor="Aqua">
            <insert-body/>
        </td>

That’s working fine, but your layout documents are no longer valid XHTML and they don’t display like target documents in a browser.

Why not follow the microformats approach and use regular XHTML elements with specif class attribues instead:

        <div id="planet">
            <h1>Planet Web 2.0 the book</h1>
            <p>Aggregated content relevant to this book.</p>
            <div class="fromRss"/>
             .../...
        </div>

In this case, the XSLT transformation replaces the content of any element with a class attribute containing the token « fromRSS » by the formated output of the RSS feed. This has the additional benefit that I can leave mock-up content to make the layout look like a final document:

<div id="planet">
            <h1>Planet Web 2.0 the book</h1>
            <p>Aggregated content relevant to this book.</p>
            <div class="fromRss">
                <ul>
                    <li>
                        <div>
                            <h2>
                                <a
                                    href="http://www.orbeon.com/blog/2006/06/02/about-json-and-poor-marketing-strategies/"
                                    title="XForms Everywhere » About JSON and poor marketing strategies"
                                    >XForms Everywhere » About JSON and poor marketing
                                strategies</a>
                            </h2>
                        </div>
                    </li>
                </ul>
            </div>
            <p>
                <a href="http://del.icio.us/rss/tag/web2.0thebook" title="RSS feed (on del.icio.us)">
                    <img src="feed-icon-24x24.png" alt="RSS feed"/>
                </a> (on <a href="http://del.icio.us/" title="del.icio.us">del.icio.us</a>)</p>
        </div>

What I like with simple ideas is that they always leave room for reuse and improvements (complex ideas on the other hand seem to only leave room for more complexity).

Web 2.0 the book

One of the reasons I have been too busy to blog these days is the project to write a comprehensive book about Web 2.0 technologies.

If Web 2.0 is about using the web as a platform, this platform is far from being homogeneous. On the contrary, it is made of a number of very different pieces of technology, from CSS to web server configuration through XML, Javascript, server side programming, HTML, …

I believe that integrating these technologies is one of the main challenges of Web 2.0 developers and I am always surprised if not frightened to see that people tend to get more and more specialized. Too many CSS gurus do not know the first thing about XML, too many XML gurus don’t know how to spell HTTP, too many Java programmers don’t want to know Javascript. And, no, knowing everything about Ajax isn’t enough to write a Web 2.0 application.

To the defense of these hyper-specialists, I have also found that most of the available resources, both online and in print, are even more heavily specialized than their authors and that even if you could read a book on each of these technologies you’d find it difficult to get the big picture and understand how they can be used together.

The goal of this book is fill the gap and be a useful resource for all the Web 2.0 developers who do not want to stay in their highly specialized domain as well as for project managers who need to grasp the Web 2.0 big picture.

This is an ambitious project on which I have started to work in December 2005.

The first phase has been to define the book outline with the helpful contribution of many friends.

The second one has been to find an editor. O’Reilly who is the editor of my two previous books happens to be also one of the co-inventors of the term « Web 2.0 » and that makes them very nervous about Web 2.0 book projects.

Jim Minatel from Wiley has immediately been convinced by the outline and the book will be published in the Wrox Professional Series.

I had initially planned to write the book all by myself but it would have taken me at least one year to complete this work and Jim wasn’t appealed by the idea of waiting until 2007 to get this book in print.

The third step has been to find the team to write the book and the lucky authors are:

Micah Dubinko is tech editing the book and Sara Shlaer is our Development Editor.

We had then to split the work between authors. The exercise has been easier than expected. Being in a position to arbiter the choice, I have found it fair to pick the chapters left by other authors and this leaves me with chapters that will require a lot of researches for me. This is fine since I like learning new things when I write but this also means more hard work.

This is my first co-authored book and I think that one of the challenges of these books is to keep the whole content coherent. This is especially true for a book which goal is to give « the big picture » and to explain how different technologies play together.

To facilitate the communication between authors, I have set up a series of internal resources (wiki, mailing list, subversion repository). It’s still too early to say if that will really help but the first results are encouraging.

More recently, I have also set up a public site (http://web2.0thebook.org/) that presents the book and aggregates relevant content. I hope that all these resources will help us to feel and act as a team rather than a set of individual authors.

The « real » work has finally started and we have now the first versions of our first chapters progressing within the Wiley review system.

It’s interesting to see the differences between processes and rules from different editors. To me, a book was a book and I hadn’t anticipated so many differences not only in the tools being used but also in style guidelines.

The first chapter I have written is about Web Services and that’s been a good opportunity to revisit the analysis I had done in 2004 for the ZDNet Web Services Convention [papers (in French)].

From a Web 2.0 developer perspective, I think that the main point is to publish Web Services that are perfectly integrated in the Web architecture and that means being as RESTfull as possible.

I have been happy to see that WSDL 2.0 appears to be making some progress in its support of REST Services even though it’s still not perfect yet. I have posted a mail with some of my findings to the Web Services Description Working Group comment list and they have split these comments as three issues on their official issue list ([CR052] [CR053] [CR054]).

I wish they can take these issues into account, even if that means updating my chapter!

Some resources I have found most helpful while I was writing this chapter are:

The paper presented by Leigh Dodds at XTech 2005.
Paul Prescod REST resources including his « Evaluating WSDL’s HTTP Support » and « State Transition » pieces. I had discovered the latest one in 2004 and find it as enjoyable to read each time I come back.

It’s been fun so far and I look forward to seeing this book « for real ».

Validating microformats

This blog entry is following up Norm Walsh’s essay on the same subject.

The first thing I’d want to react on isn’t the fact that RELAX NG isn’t suitable for this task, but the reason why this is the case.

Norm says that « there’s just no way to express a pattern that matches an attribute that contains some token » and this assertion isn’t true.

Let’s take the same hReview sample and see what happens when we try to define a RELAX NG schema:

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Review</title>
    </head>
    <body>
        <div class="hreview">
            <span><span class="rating">5</span> out of 5 stars</span>
            <h4 class="summary">Crepes on Cole is awesome</h4>
            <span class="reviewer vcard">Reviewer: <span class="fn">Tantek</span> -
                <abbr class="dtreviewed" title="20050418T2300-0700">April 18, 2005</abbr></span>
            <div class="description item vcard"><p>
                <span class="fn org">Crepes on Cole</span> is one of the best little
                creperies in <span class="adr"><span class="locality">San Francisco</span></span>.
                Excellent food and service. Plenty of tables in a variety of sizes
                for parties large and small.  Window seating makes for excellent
                people watching to/from the N-Judah which stops right outside.
                I've had many fun social gatherings here, as well as gotten
                plenty of work done thanks to neighborhood WiFi.
            </p></div>
            <p>Visit date: <span>April 2005</span></p>
            <p>Food eaten: <span>Florentine crepe</span></p>
        </div>
    </body>
</html>

To define an element which « class » attribute is « type », we would write:

element * {
    attribute class { "type" }
    .../...
}

To define an element which « class » attribute contains the token « type », we will use the same principle and use a W3C XML Schema pattern facet:

element * {
    attribute class {
        xsd:token { pattern = "(.+\s)?type(\s.+)?" }
    }
}

The regular expression expresses the fact that we want class attributes with an optional sequence of any character followed by a whitespace character, the token « type » and an optional whitespace followed by any characters.

It correctly catches values such as « type », « foo type », « foo type bar », « type bar » and rejects values such as « anytype ».

The next tricky thing to express to validate microformats is that you want to allow an element at any level of depth.

For instance, if you’re expecting a « type » tag, you’ll accept:

<span class=type>foo</span>

But also:

<div>
   <p>Type: <span class="type">foo</span></p>
</div>

To do so with RELAX NG, you’ll recursively say that you want either a tag « type » or any other element including a tag « type ».

The « any other element » will have include an optional « class » attribute which value doesn’t contain the token « type » but even that isn’t an issue with RELAX NG and the definition could be around these lines:

hreview.type =
    element * {
        anyOtherAttribute,
        mixed {
            (attribute class {
                 xsd:token { pattern = "(.+\s)?type(\s.+)?" }
             },
             anyElement)
            | (attribute class {
                   xsd:token - xsd:token { pattern = "(.+\s)?type(\s.+)?" }
               }?,
               hreview.type)
        }
}

This looks complex and quite ugly but we wouldn’t have to write such schemas by hand. I like Norm’s idea to write a simple RELAX NG schema where classes are replaced by element names and this definition has been generated by a XSLT transformation out of his own definition which is:

hreview.type = element type { text }

So far, so good. Let’s see where the real blockers are.

The first thing which is quite ugly to validate is the flexibility that allows siblings to be nested.

In the hReview schema, « reviewer » and « dtreviewed » are defined as siblings:

hreview.hreview =
  element hreview {
    text
    & hreview.version?
    & hreview.summary?
    & hreview.type?
    & hreview.item
    & hreview.reviewer?
    & hreview.dtreviewed?
    & hreview.rating?
    & hreview.description?
}

In a XML document, we would expect to see them at the same level as direct children od the « hreview » element.

In microformats world, this can be the case, but one can also be a descendant to the other which is the case in our example:

<span class="reviewer vcard">Reviewer: <span class="fn">Tantek</span> -
<abbr class="dtreviewed" title="20050418T2300-0700">April 18, 2005</abbr></span>

To express that, we would have to say that the content oh « hreview » is one of the many combinations between each sub elements being either siblings or descendants one of each other.

I haven’t tried to see if that would be feasible (we’ll see that there is another blocker that makes the question academic) but that would be a real mess to generate.

The second and probably most important blocker is the restrictions related to interleave: as stated in my RELAX NG book, « Elements combined through interleave must not overlap between name classes. »

This restriction is hitting us hard here since our name classes do overlap and we are combining the different sub patterns through interleave (see the definition of hreview.hreview above if you’re not convinced).

There are very few workarounds for this restriction:

Replacing interleave by an ordered group isn’t an option: microformats are about flexibility and imposing an order between the sub components is most probably out of question.
Replacing interleave by a « zeroOrMore/choice » combination means that we would loose any control over the number of occurrences of each sub components (we could get ten ratings and no items) and this control is one of the few things that this validation catches!

To me, this restriction is the real blocker and means that it isn’t practical to use RELAX NG to validate microformat instances directly.

Of course, we can transform these instances as plain XML as shown by Norm Walsh, but I don’t like this solution very much for a reason he hasn’t mentioned: when we would raise errors with such a validation, these errors would refer to the context within the transformed document which would be tough to understand by users and making the link between this context and the original document could be complex.

As an alternative, let’s see what we could do with Schematron.

To set a rule context to a specifi tag, we can write:

<rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">

We are no longer working on datatypes and need to apply the normalization by hand (thus the use of « normalize-space() »). On the other hand, we can freely use functions and by adding a leading and trailing space, we can make sure that the « hreview » token is matched if and only if he result of this manipulation contains the token preceded and followed by a space.

Within this context, we can check the number of occurrences of each sub pattern using more or less the same principle:

      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' version ')]) &gt; 1">A  "version" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' summary ')]) &gt; 1">A  "summary" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' type ')]) &gt; 1">A  "type" tag is duplicated.</report>
         <assert test=".//*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]">A mandatory "item" tag is missing.</assert>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]) &gt; 1">A  "item" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' reviewer ')]) &gt; 1">A  "reviewer" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' dtreviewed ')]) &gt; 1">A  "dtreviewed" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' rating ')]) &gt; 1">A  "rating" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' description ')]) &gt; 1">A  "description" tag is duplicated.</report>
     </rule>

Note that the use of the descendant axis (« // ») means that we are treating correctly cases where siblings are embedded.

Norm Walsh mentions that this can be tedious to write and that you need to define tests for what is allowed and also for what is forbidden.

That’s perfectly right but here again, you don’t have to write this schema by hand and I have written a XSLT transformation that transforms his RELAX NG schema into the following Schematron schema:

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
   <pattern name="hreview.hreview">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' version ')]) &gt; 1">A  "version" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' summary ')]) &gt; 1">A  "summary" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' type ')]) &gt; 1">A  "type" tag is duplicated.</report>
         <assert test=".//*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]">A mandatory "item" tag is missing.</assert>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]) &gt; 1">A  "item" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' reviewer ')]) &gt; 1">A  "reviewer" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' dtreviewed ')]) &gt; 1">A  "dtreviewed" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' rating ')]) &gt; 1">A  "rating" tag is duplicated.</report>
         <report test="count(.//*[contains(concat(' ', normalize-space(@class), ' '), ' description ')]) &gt; 1">A  "description" tag is duplicated.</report>
      </rule>
   </pattern>
   <pattern name="hreview.version">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' version ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">version not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.summary">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' summary ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">summary not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.type">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' type ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">type not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.item">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">item not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.fn">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' fn ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]">fn not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.url">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' url ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]">url not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.photo">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' photo ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' item ')]">photo not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.reviewer">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' reviewer ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">reviewer not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.dtreviewed">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' dtreviewed ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">dtreviewed not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.rating">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' rating ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">rating not allowed here.</assert>
      </rule>
   </pattern>
   <pattern name="hreview.description">
      <rule context="*[contains(concat(' ', normalize-space(@class), ' '), ' description ')]">
         <assert test="ancestor::*[contains(concat(' ', normalize-space(@class), ' '), ' hreview ')]">description not allowed here.</assert>
      </rule>
   </pattern>
</schema>

A couple of notes on this schema:

A class attribute can contain several tokens and a single element can match several rules. Since Schematron checks only the first matching rule in each pattern, each definition is in its own pattern.
In this example, I have added a test that each tag is found within the context where it is expected. This test reports an error on the sample at the first occurrence of « fn » because this occurrence belongs to another microformat (vCard) which is combine with hReview in this example. This test should be switchable off and that could be done using Schematron phases.

A part from that, I think that this could become a very practical solution. The idea would thus be:

Define a schema for a microformat using RELAX NG to describe its logical structure. This would probably lead to defining a language subset and conventions to convey information such as « which attribute is used » and would become a kind of « microschema ».
Transform this microschema into a Schematron schema.
Use this schema to validate instance documents.

What I find interesting is that the same RELAX NG microschema could be used as shown by Norm Walsh to feed a transformation that could be applied to instance documents before validation or transformed into a schema that would validate the instance documents and I am pretty sure that these schemas could have many other uses.

TreeBind is about making Java as agile as it can be

TreeBind seems to be getting more visible :

My submission for XTech 2006 has been accepted.
Ron Bourret has added TreeBind in his XML Data Binding Resources.
Oleg Parashchenko has sent me a couple of references about his own Scheme based proposal to deal with hierarchical structures : « Reusing XML Processing Code in non-XML Application » and « XML View on Hierarchical Data Using SXML and Scheme« . He has also been kind enough to include TreeBind in his bibliography.
David Webber has asked if I though TreeBind could be plugged into jCAM.

Last week I have attended several SD West sessions that gave me interesting ideas for TreeBind:

Under the rather misleading title « Enterprise Java for Elvis« , Cay Horstmann has presented some good stuff coming with EJB 3.0. I have been impressed by the POJOs (Plain Old Java Objects) can now be used and that most of the persistence configuration can be done through Java annotations. That’s something we could useful within TreeBind: annotations are available through reflexion and could be use to convey serialization information such as the relative order of sub-elements and whether a property should be written as element or attribute.
Allen Holub has given his very enlightening presentation: « Everything You Know is Wrong: Extends and Get/Set Methods are Evil » during which he explains why classes should expose behaviors rather than properties. When you think about it, that seems obvious enough but that still helps when someone such as Allen Holub explains it! The exceptions are for serialization and deserialization where classes need to expose their internals (Allen Holub says that the languages should take care of that but that’s not yet the case with Java). Even in that case, he favours specific « importer » and « exporter » classes over the getters and setters and that’s an option that could be used by TreeBind too (the current version relies on getters and setters).
Rick Wayne had proposed « Railin’ on AJAX » and I was looking forward seeing what was behind the Ruby on Rails buzz. How’s that related to TreeBind? One of the lessons learned from Ruby on Rails is this « DRY » (Don’t Repeat Yourself) principle and the way Rails generates the classes from the database. That would be easy enough for TreeBind to generate the Java classes corresponding to a XML document. This could be done by TreeBind itself, through a TreeBind Sink which would write Java source files. Annotations could be added to the document to describe cardinalities and datatypes and the XML document would be used as schemas are used by SAX-B. Using XML instances as schemas? Doesn’t that ring a bell? That’s exactly what Examplotron is about!

What’s the common thread between all that?

The initial motivation is still there: to make binding as transparent and lightweight as possible and Java as agile as it can be!

Web 2.0 et entreprises 1.0

Dare Obasanjo et Uche Ogbuji ont publié trois billets web ([dare], [uche1], [uche2]) qui illustrent bien le décalage entre l’informatique d’entreprise et l’informatique du Web.

Ce phénomène n’est pas nouveau et dans les années 90 on retrouvait le même décalage entre l’informatique « sérieuse » prônée par la plupart des DSI et les développements client/serveur que nous préconisions (je travaillais alors chez Sybase) et qui étaient souvent pris en charge par d’autres équipes (parfois les utilisateurs eux-mêmes).

Les DSI ont fini par s’y mettre mais les progrès récents du Web dit 2.0, sont tels qu’il y a peu d’intérêt (en dehors que quelques niches applicatives peu communes) à développer aujourd’hui autre chose que des applications Web.

Les implications sont plus profondes qu’il n’y parait.

Au plan technique, et c’est l’objet des billets que je cite, quelle justification peut-il y avoir à utiliser d’autres technologies que celles qui font le succès de monstres tels que Google, Yahoo ou Amazon?

Comment justifier la complexité et le coût des architectures qui caractérisent l’informatique d’entreprise pour développer des applications Web dont les contraintes techniques seront dans la grande majorité des cas nettement plus faibles que celles de ces monstres?

Les entreprises devraient au contraire plébisciter les architectures à base de logiciels Open Source et de langages de script utilisés par les grands sites Web!

Mais c’est peut-être au niveau des utilisations que les gains les plus importants peuvent être réalisés.

Le volet dit « social » du Web 2.0 parvient à rendre le web collaboratif et à transformer ses utilisateurs en acteurs.

N’est-ce pas un enjeu majeur dans les entreprises?

Beaucoup d’entreprises butent sur le manque d’adhésion des utilisateurs en cherchant à mettre en place de coûteux systèmes de gestion des connaissances.

Le Web 2.0 réussi au contraire à faire participer ses utilisateurs, que ce soit pour écrire des documents (Wikipédia), classifier des ressources (del.icio.us et dmoz), partager des photos (Flickr), informer (digg et wikinews), se faire connaître (blogs), constituer des réseaux sociaux (linkedIn, Viaduc, 6nergies, …), fournir du support technique (newsgroups, forums et listes de discussions), développer des logiciels de manière distribuée (SourceForge, Savannah, …), échanger des services intellectuels (Amazon Mechanical Turk, Google Answers, Yahoo! Answers), …

L’utilisation des applications Web 2.0 en entreprise démarre tout juste, essentiellement grâce aux Wikis qui commencent à gagner leurs lettres de noblesse.

Les entreprises ont pourtant tout à gagner à appliquer en interne les recettes qui marchent si bien sur le Web!

Les possibilités sont illimitées et l’entreprise 2.0 utilisera sans doute un Wikipédia interne pour éditer sa documentation, un clone de del.icio.us pour classifier ses ressources internes et externes, un simili LinkedIn pour gérer les relations entre ses employés, un dérivé d’Amazon Mechanical Turk pour canaliser les questions internes ou externes qui lui sont posées, …

C’est un sujet qui me tient à coeur. Contactez moi si vous souhaitez en discuter pour voir comment tout cela pourrait s’appliquer à votre entreprise.

W3C Internationalization « Tag » Set

2006-02-22: The Internationalization Tag Set Working Group has published an updated Working Draft of the Internationalization Tag Set (ITS). Organized by data categories, this set of elements and attributes supports the internationalization and localization of schemas and documents. Implementations are provided for DTDs, XML Schema and Relax NG, and for existing vocabularies like XHTML, DocBook and OpenDocument. Visit the Internationalization home page.

(Copied from the W3C News Archive)

I had missed the previous version of this document and I have been very impressed and pleased while (quickly) reading it.

Among the good things, I’d mention:

Flexibility: ITS can be used within the documents to localize, within the schemas that describe these documents or standalone.
Schema agnosticism: ITS can be used with DTDs, W3C XML Schema and RELAX NG (I don’t see why the list has been limited to these three ones, but, at least, RELAX NG is explicitly mentioned).
No QNames: more precisely, ITS has been wide enough to avoid using namespace declarations for its QNames.

Among the things that could be improved, I have found (and reported):

The word « tag » in name itself: « Internationalization Tag Set »: we spend our time to explain that XML is about trees and that tags are only syntactic sugar to mark the beginning and the end of elements and I wouldn’t have expected to see this word in the name of a W3C specification! [bug 2922]
The fact that the same element names are used in schemas and instance documents: schemas with XML syntaxes are also instances and ITS could be used to localize the schemas themselves instead of localizing the instances described by these schemas. Unfortunately, doing so would lead to a confusion since the ITS element names would be the same for both usages [bug 2923]
The list of schema languages could be left open [bug 2924]

Publishing GPG or PGP public keys considered harmful?

In a previous post, I have expressed the common thinking that digitally signed emails would be a strong spam stopper.

I am still thinking that a more general usage of electronic signatures would be really effective to fight against spammers, but it recently occurred to me that, at least before we reach that stage, publishing one’s public key can be considered… harmful!

A system such as GPG/PGP relies on the fact that public keys, used to check signatures are not only public but easy to find and you typically publish them both on your web site and on public key servers.

At the same time, these public keys can be used to cipher messages that you want to send to their owners.

This ciphering is typically « end to end »: the message is ciphered by the sender’s mail user agent and deciphered by the recipient’s mail agent with the recipient’s private key and nobody, either human or software, can read the content of the message in between.

While this is really great for preserving your privacy, this also means neither anti-spam nor anti-virus softwares can read the content of digitally signed emails without knowing the recipient’s private key and that pretty much eliminates any server side shielding.

Keeping your public key private would eliminate most of the benefit of signing your mails, but if you make your public key public, you’d better be very careful when reading ciphered emails, especially when they are not signed!

Edd Dumbill on XTech 2006

Last year Edd Dumbill, XTech Conference Chair, had been kind enough to answer my questions about the 2005 issue of the conference previously known as « XML Europe ». We’re renewing the experience, taking the opportunity to look back at last year issue and to figure out how XTech 2006 should look like.

vdV: You mention in your blog the success of XTech 2005 and that’s an appreciation which is shared by many attendees (including myself). Can you elaborate for those who have missed XTech 2005 what makes you say that it has been a success?

Edd: What I was particularly pleased with was the way we adapted the conference topic areas to reflect the changing technology landscape.

With Firefox and Opera, web browser technology matters a lot more now, but there was no forum to discuss it. We provided one, and some good dialog was opened up between developers, users and standards bodies.

But, to sum up how I know the conference was successful: because everybody who went told me that they had a good and profitable time!

vdV: You said during our previous interview that two new tracks which « aren’t strictly about XML topics at all » have been introduced last year (Browser Technology and Open Data) to reflect the fact that « XML broadens out beyond traditional core topics ». Have these tracks met their goal to attract a new audience?

Edd: Yes, I’m most excited about them. As I said before, the browser track really worked at getting people talking. The Open Data track was also very exciting: we heard a lot from people out there in the real world providing public data services.

The thing is that people in these « new » audiences work closely with the existing XML technologists anyway. It didn’t make sense to talk about XML and leave SVG, XHTML and XUL out in the cold: these are just as much document technologies as DocBook!

One thing that highlighted this for me was that I heard from a long-time SGML and then XML conference attendee that XTech’s subject matter was the most interesting they’d seen in years.

vdV: Did the two « older » tracks (Core Technologies and Applications) resist to these two new tracks and would you quality them as successful too?

Edd: Yes, I would! XTech is still a very important home for leaders in the core of XML technology. Yet also I think there’s always a need to change to adapt to the priorities of the conference attendees. One thing I want to do this year is to freshen the Applications track to reflect the rapidly changing landscape in which web applications are now being constructed. As well as covering the use of XML vocabularies and its technologies, I think the frameworks such as Rails, Cocoon, Orbeon and Django are important topics.

vdV: What would you like to do better in 2006?

Edd: As I’ve mentioned above, I think the Applications track can and will be better. I’d like also for there to be increased access to the conference for people such as designers and information architects. The technology discussed at XTech often directly affects these people, but there’s not always much dialogue between the technologists and the users. I’d love to foster more understanding and collaboration in that way.

vdV: You mention in your blog and in the CFP that there will be panel discussions for each track. How do you see these panel discussions?

Edd: Based on feedback from 2005’s conference, I would like the chance for people to discuss the important issues of the day in their field. For instance, how should XML implementors choose between XQuery and XSLT2, or how can organisations safely manage exposing their data as a web service? There’s no simple answer to these questions, and discussions will foster greater understanding, and maybe bring some previously unknown insights to those responsible for steering the technology.

vdV: The description of the tracks for XTech 2006 looks very similar to its predecessor. Does that mean that this will be a replay of XTech 2005?

Edd: Yes, but even more so! In fact, XTech 2005 was really a « web 2.0 » conference even before people put a name to what was happening. In 2006 I want to build on last year’s success and provide continuity.

vdV: l’année dernière: In last year’s description, the semantic web had its own bullet point in the « Open Data » track and this year, it’s sharing a bullet point with tagging and annotation. Does that mean that tagging and annotation can be seen as alternative to the semantic web? Doesn’t the semantic webtique deserve its own track?

Edd: The Semantic Web as a more formal sphere already has many conferences of its own. While XTech definitely wants to cover semantic web, it doesn’t want to get carried away with the complicated academic corners of the topic, but more see where semantic web technologies can be directly used today.

Also, I see the potential for semantic web technologies to pervade all areas that XTech covers. RDF for instance, is a « core technology ». RSS and FOAF are « applications » of RDF. RDF is used in browsers such as Mozilla. And RDF is used to describe metadata in the Creative Commons, relevant to « open data ». So why shut it off on its own? I’d far rather see ideas from semantic web spread throughout the conference.

vdV: In your blog, you’ve defended the choice of the tagline « Building Web 2.0 » quoting Paul Graham and saying that the Web 2.0 is a handy label for « The Web as it was meant to be used ». Why have you not chosen « Building the web as it was meant to be » as a tagline, then?

Edd: Because we decided on the tagline earlier! I’ll save « the web as it was meant to be » for next year :)

vdV: What struck me with this definition is that XML, Web Services and the Semantic Web are also attempts to build the Web as it was meant to be. What’s different with the Web 2.0?

Isn’t « building the web as it was meant to be » an impossible quest and why should the Web 2.0 be more successful than the previous attempts?

Edd:deux questions à la fois. I’ll answer both these together. I think the « Web 2.0 » name includes and builds on XML, Web Services and Semantic Web. But it also brings in the attitude of data sharing, community and the read/write web. Together, those things connote the web as it was intended by Berners-Lee: a two-way medium for both computers and humans.

Rather than an « attempt », I think « Web 2.0 » is a description of the latest evolution of web technologies. But I think it’s an important one, as we’re seeing a change in the notions of what makes a useful web service, and a validation of the core ideas of the web (such as REST) which the rush to make profit in « Web 1.0 » ignored.

vdV: In your blog, you said that you’re « particularly interested in getting more in about databases, frameworks like Ruby on Rails, tagging and search ». By databases, do you mean XML databases? Can you explain why you find these points particularly interesting?

Edd: I mean all databases. Databases are now core to most web applications and many web sites. They’re growing features to directly support web and XML applications, whether they’re true « XML databases » or not. A little bit of extra knowledge about the database side of things can make a great difference when creating your application.

XTech is a forum for web and XML developers, the vast majority of whom will use a database as part of their systems. Therefore, we should have the database developers and vendors there to talk as well.

vdV: One of the good things last year was the wireless coverage. Will there be one this year too?

Edd: Absolutely.

vdV: What is your worse souvenir of XTech 2005?

Edd: I don’t remember bad things :)

vdV: What is your best souvenir of XTech 2005?

Edd: For me, getting so many of the Mozilla developers out there (I think there were around 25+ Mozilla folk in all). Their participation really got the browser track off to a great start.

References:

Edd Dumbill on XTech 2005 — Eric van der Vlist
Building Web 2.0 at XTech 2006 — Edd Dumbill
The Web as it was meant to be used — Edd Dumbill
XTech 2006: Building Web 2.0 — Conference site
XTech 2006: CALL FOR PARTICIPATION — Conference site
XTech 2006: CONFERENCE TRACKS — Conference site
XTech 2005: CONFERENCE TRACKS — Conference site
Web 2.0 — Paul Graham

Non content based antispam sucks

My provider has recently changed the IP address of one of my server and my logs are flooded with messages such as:

Dec  7 08:21:57 gwnormandy postfix/smtp[22362]: connect to mx00.schlund.de[212.227.15.134]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)
Dec  7 08:21:57 gwnormandy postfix/smtp[22339]: connect to mx01.schlund.de[212.227.15.150]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)
Dec  7 08:21:57 gwnormandy postfix/smtp[22334]: connect to mx01.kundenserver.de[212.227.15.150]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)
Dec  7 08:21:57 gwnormandy postfix/smtp[22414]: connect to mx00.1and1.com[217.160.230.12]: server refused to talk to me: 421 Mails from this IP temporarily refused: Dynamic IP Addresses See: http://www.sorbs.net/lookup.shtml?213.41.184.90   (port 25)

Of course, I am trying to get this solved by sorbs.net (in that case, that should be possible since this is a fixed IP) but that incident reminds me why I think that we shouldn’t use « technical » or « non content based » antispam even if it happens to be efficient.

The basic idea of most if not all antispam software is to distinguish between what looks like a spam and what looks like a normal message.

To implement this, we’ve got three main types of implementations that can be combined:

Content based algorithms look at the content of the messages and use statistical methods to distinguish between « spam » and « ham » (non spam).
List based algorithms work with white and black lists to allow or deny mails, usually based on the address of mails sender.
Technical based algorithms look at the mail headers to reject most common practises used by spammers.

The problem with these technical algorithms is that the common practises used by spammers are not always practises that are not standard compliant and not even practises that should be considered as bad practises!

Let’s take the case of the sorbs.net database that identify dynamic IP addresses.

I would argue that sending a mail from a dynamic IP address is a good practise and that asking people to use their ISP mail servers when they don’t want to is a bad practise.

I personally consider that my mail is too important and sensitive for me to be outsourced to my ISP!

That’s the case when I am at home and I prefer to set up my own smtp servers that will take care of delivering my mails than using the smtp servers from my ISP.

When I am using my servers, I know from my logs if and when the smtp server of my recipients receive and queue the mails I am sending.

Also, I want to be able to manage mailing lists without having to ask to anyone.

And that’s still more the case when I am travelling and using an occasional ISP that I barely know and don’t know if I can trust.

We are using lots of these ISP when we are connected to WIFI spots and here again, I much prefer to send my mails from the smtp server that runs on my portable than from an unknown ISP.

Furthermore, that means that I don’t have to change the configuration of my mailer.

Content based antispam have also their flaws (they need training and are very inefficient with mails containing only pictures) but they don’t have false positives like technical based antispams that reject my mails if I send them from dynamic IP addresses.

That’s the reason why I have desinstalled Spam Assassin and replaced if with SpamBayes on my own systems.

Now, the thing that really puzzles me with antispam is that we have the technical solution that could eradicate spam from the web and that we just seem to ignore it.

If everyone was signing his mails with a PGP key, I could reject (or moderate) all the emails which are not signed.

Spammers would have to choose between signing their mails and being identified (meaning they could be sued) or not signing them and getting their mails trashed.

Now, the problem is that because so few people are signing their mails, I can’t afford to ignore unsigned mails and because PGP signatures are not handled correctly by many mailers and mailing list servers, most people (including me) don’t sign their mails.

The question is why doesn’t that change? Is this just a question of usages? Or is the community as a whole just not motivated to shut the spam down?

Web 2.0: myth and reality

The Web 2.0 is both a new buzzword and a real progress. In this article, I’ll to separate the myth from the reality.

Note

This article is a translation of the article published in French on XMLfr and presented at sparklingPoint.

This version does integrate, in a very “WEB 2.0 fashion” a lot of comments from XMLfr editors and sparklingPoint participants and I’d like to thank them for their contribution.

Definition

The first difficulty when we want to make an opinion about Web 2.0 is to distinguish its perimeter.

When you need to say if an application is XML or not, that’s quite easy: the application is an XML application if and only if it conforms to the XML 1.0 (or 1.1) recommendation.

That’s not so easy for Web 2.0 since Web 2.0 is not a standard but a set of practices.

In that sense, Web 2.0 can be compared to REST (Representational State Transfer) which is also a set of practices.

Fair enough will you say, but it’s easy to say if an application is RESTfull. Why would that be different with Web 2.0?

REST is a concept that is clearly described in a single document: Roy Fielding’s thesis which gives a precise definition of what REST is.

On the contrary, Web 2.0 is a blurred concept which aggregates a number of tendencies and everyone seems to have his own definition of Web 2.0 as you can see by the number of articles describing what the Web 2.0 is.

If we really need to define Web 2.0, I’ll take two definitions.

The first one is the one given by the French version of Wikipedia :

Web 2.0 is a term often used to describe what is perceived as an important transition of the World Wide Web, from a collection of web sites to a computing platform providing web application to users. The proponents of this vision believe that the services of Web 2.0 will come to replace traditional office applications.

This article also gives an history of the term:

The term was coined by Dale Dougherty of O’Reilly Media during a brainstorming session with MediaLive International to develop ideas for a conference that they could jointly host. Dougherty suggested that the Web was in a renaissance, with changing rules and evolving business models.

And it goes on by giving a series of examples that illustrate the difference between good old “Web 1.0” and Web 2.0:

DoubleClick was Web 1.0; Google AdSense is Web 2.0. Ofoto is Web 1.0; Flickr is Web 2.0.

Google who has launched AdSense in 2003 was doing Web 2.0 without knowing it one year before the term has been invented in 2004!

Technical layer

Let’s focus on the technical side of Web 2.0 first.

One of the characteristics of Web 2.0 is to be available to today’s users using reasonably recent versions of any browser. That’s one of the reasons why Mike Shaver said in its opening keynote at XTech 2005 that “Web 2.0 isn’t a big bang but a series of small bangs”.

Restricted by the set of installed browsers, Web 2.0 has no other choice than to rely on technologies that can be qualified of “matured”:

HTML (or XHTML pretending to be HTML since Internet Explorer doesn’t accept XHTML documents declared as such) –the last version of HTML has been published in 1999.
A subset of CSS 2.0 supported by Internet Explorer –CSS 2.0 has been published in 1998.
Javascript –a technology introduced by Netscape in its browser in 1995.
XML –published in 1998.
Atom or RSS syndication –RSS has been created by Netscape in 1999.
HTTP protocol –the latest HTTP version has been published in 1999.
URIs –published in 1998.
REST –a thesis published in 2000.
Web Services –XML-RPC APIs for Javascript were already available in 2000.

The usage of XML over HTTP in asynchronous mode has been given the name “Ajax”.

Web 2.0 appears to be the full appropriation by web developers of mature technologies to achieve a better user experience.

If it’s a revolution, this is a revolution in the way to use these technologies together, not a revolution in the technologies themselves.

Office applications

Can these old technologies really replace office applications? Is Web 2.0 about rewriting MS Office in Javascript and could that run in a browser?

Probably not if the rule was to keep the same paradigm with the same level of features.

We often quote the famous “80/20” rule after which 80% of the features would require only 20% of the development efforts and sensible applications should focus on these 80% of features.

Office applications have crossed the 80/20 border line years ago and have invented a new kind of 80/20 rule: 80% of the users use probably less than 20% of the features.

I think that a Web 2.0 application focussing on the genuine 80/20 rule for a restricted application or group of users would be a tough competition to traditional office applications.

This seems to be the case for applications such as Google Maps (that could compete with GIS applications on the low end market) or some of the new wysiwyg text editing applications that flourish on the web.

A motivation that may push users to adopt these web applications is the attractiveness of systems that help us manage our data.

This is the case of Gmail, Flickr, del.icio.us or LinkedIn to name few: while these applications relieve us from the burden of the technical management of our data they also give us a remote access from any device connected to the internet.

What is seen today as a significant advantage for managing our mails, pictures, bookmarks or contacts could be seen in the future as a significant advantage for managing our office documents.

Social layer

If the French version of Wikipedia has the benefit of being concise, its is slightly out of date and doesn’t describe the second layer of Web 2.0, further developed during the second Web 2.0 conference in October 2005.

The English version of Wikipedia adds the following examples to the list of Web 1.0/Web 2.0 sites:

Britannica Online (1.0)/ Wikipedia (2.0), personal sites (1.0)/ blogging (2.0), content management systems (1.0)/ wikis (2.0), directories (taxonomy) (1.0) / tagging (« folksonomy« ) (2.0)

These examples are interesting because technically speaking, Wikipedia, blogs, wikis or folksonomies are mostly Web 1.0.

They illustrate what Paul Graham is calling Web 2.0 “democracy”.

Web 2.0 democracy is the fact that to “lead the web to its full potential” (as the W3C tagline says) the technical layer of the internet must be complemented by a human network formed by its users to produce, maintain and improve its content.

There is nothing new here either and I remember Edd Dumbill launching WriteTheWeb in 2000, “a community news site dedicated to encouraging the development of the read/write web” because the “tide is turning” and the web is no longer a one way web.

This social effect was also the guide line of Tim O’Reilly in his keynote session at OSCON 2004, one year before becoming the social layer of Web 2.0.

Another definition

With a technical and a social layer, isn’t Web 2.0 becoming a shapeless bag in which we’re grouping anything that’s looking new on the web?

We can see in the technical layer a consequence of the social layer, the technical layer being needed to provide the interactivity required by the social layer.

This analysis would exclude from Web 2.0 applications such as Google Maps which have no social aspect but are often quoted as typical examples of Web 2.0.

Paul Graham tries to find common trends between these layers in the second definition that I’ll propose in this article:

Web 2.0 means using the web the way it’s meant to be used. The « trends » we’re seeing now are simply the inherent nature of the web emerging from under the broken models that got imposed on it during the Bubble.

This second definition reminds me other taglines and buzzword heard during these past years:

The W3C tagline is “Leading the Web to Its Full Potential”. Ironically, Web 2.0 is happening, technically based on many technologies specified by the W3C, without the W3C… It is very tempting to interpret the recent announcement of a “Rich Web Clients Activity” as an attempt to catch a running train.
Web Services are an attempt to make the web available to applications which was meant to be from the early ages of Web 1.0.
The Semantic Web -which seems to have completely missed the Web 2.0 train- is the second generation of the web seen by the inventor of Web 1.0.
REST is the description of web applications using the web as it is meant to be used.
XML is “SGML on the web” which was possible with HTTP as it was meant to be used.
…

Here again, Web 2.0 appears to be the continuation of the “little big bangs” of the web.

Technical issues

In maths, continuous isn’t the same as differentiable and in technology too, continuous evolutions can change direction.

Technical evolutions are often a consequence of changes in priorities that lead to these changes of direction.

The priorities of client/server applications that we developed in the 90’s were:

the speed of the user interfaces,

their quality,
their transactional behaviour,
security.

They’ve been swept out by web applications which priorities are:

a universal addressing system,
universal access,
globally fault tolerant: when a computer stops, some services might stop working but the web as a whole isn’t affected,
scalability (web applications support more users than client/server ones dreamed to support),
a user interface relatively coherent that enables sharing services through URIs,
open standards,

Web 2.0 is taking back some of the priorities of client/server applications and one needs to be careful that these priorities are met without compromising what is the strength of the web.

Technically speaking, we are lucky enough to have best practices formalized in REST and Web 2.0 developers should be careful to design RESTfull exchanges between browsers and servers to take full advantage of the web.

Ergonomic issues

Web 2.0 run in a web browsers and they should make sure that users can keep their Web 1.0 habits, especially with respect to URIs (including the ability to create bookmarks, send URIs by mail and use their back and forward buttons).

Let’s take a simple example to illustrate the point.

Have you noticed that Google, presented as a leading edge Web 2.0 company is stubbornly Web 1.0 on its core business: the search engine itself?

It is easy to imagine what a naïve Web 2.0 search engine might look like.

That might start with a search page similar to the current Google suggest. When you start writing your query terms, the service suggests possible completions of you terms.

When you would send the query, the page wouldn’t move. Some animation could keep you waiting even if that’s usually not necessary with a high speed connection on Google. the query would be sent and the results brought back asynchronously Then, the list of matches would be displayed in the same page.

The user experience would be fast and smooth, but there are enough drawbacks with this scenario that Google doesn’t seem to find it worth trying:

The URI in the address bar would stay the same: users would have no way to bookmark a search result or to copy and past it to send to a friend.

Back and forward buttons would not work as expected.
These result pages would be accessible to crawlers.

The web developer who would implement this Web 2.0 application should take care to provide good workarounds for each of these drawbacks. This is certainly possible, but that requires some effort.

Falling into these traps would be really counter-productive to Web 2.0 since we have seen that these are ergonomic issues that justify this evolution to make the web easier to use.

Development

The last point on which one must be careful when developing Web 2.0 applications are development tools.

The flow of press releases made by software vendors to announce development tools for Ajax based applications may put an end to this problem, but Web 2.0 often means developing complex scripts that are subject to interoperability issues between browsers.

Does that mean that Web 2.0 should ignore declarative definitions of user interface (such as in XForms, XUL or XAML) or even in the 4GL’s that had been invented for client/server applications in the early 90’s?

A way to avoid this regression is to use a framework that hides most of the Javascript development.

Catching up with the popular “Ruby on Rails”, web publications frameworks are beginning to propose Web 2.0 extensions.

This is the case of Cocoon which new version 2.1.8 includes a support of Ajax but also of Orbeon PresentationServer which includes in its version 3.0 a fully transparent support of Ajax through its Xforms engine.

This features enables to write user interfaces in standard XForms (without a single line of Javascript) and to deploy these applications on todays browsers, the system using Ajax interactions between browsers and servers to implement XForms.

Published in 2003, XForms is only two years old, way too young to be part of the Web 2.0 technical stack… Orbeon PresentationServer is a nifty way to use XForms before it can join the other Web 2.0 technologies!

Business model

What about the business model?

The definition of Paul Graham for whom Web 2.0 is a web rid of the bad practises of the internet bubble is interesting when you know that some analysts believe that a Web 2.0 bubble is on its way.

This is the case of Rob Hof (Business Week) who deploys a two step argumentation:

1) “It costs a whole lot less to fund companies to revenue these days”, which Joe Kraus (JotSpot) explains by the facts that:

“Hardware is 100X cheaper”,
“Infrastructure software is free”,
“Access to Global Labor Markets”,
Internet marketing is cheap and efficient for niche markets.

2) Even though venture capital investment seems to stay level, cheaper costs mean that much more companies are being funded with the same level of investment. Furthermore, cheaper costs also means that more companies can be funded by non VC funds.

Rob Hof also remarks that many Web 2.0 startups are created with no other business model than being sold in the short term.

Even if it is composed to smaller bubbles, a Web 2.0 bubble might be on the way…

Here again, the golden rule is to take profit of the Web 1.0 experience.

Data Lock-In Era

If we need a solid business model for Web 2.0, what can it be?

One of the answers to this question was in the Tim O’Reilly keynote at OSCON 2004 that I have already mentioned.

Giving its views on the history of computer technologies since their beginning, Tim O’Reilly showed how this history can be split into three eras:

During the “Hardware Lock-In” era, computer constructors ruled the market.
Then came the “Software Lock-In” era dominated by software vendors.
We are now entering the “Data Lock-In” era.

In this new era, illustrated by the success of sites such as Google, Amazon, or eBay, the dominating actors are companies that can gather more data than their competitors and their main asset is the content given or lent by their users for free.

When you outsource your mails to Google, you publish a review or even buy something on Amazon, upload your pictures to Flickr or add a bookmark in del.icio.us, you tie yourself to this site and you trade a service against their usage of your data.

A number of people are talking against what François Joseph de Kermadec is calling the “fake freedom” given by Web 2.0.

Against this fake freedom, users should be careful:

to trade data against real services,
to look into the terms of use of each site to know which rights they grant in exchange if these services,
to demand technical means, based on open standards, to get their data back.

So what?

What are the conclusions of this long article?

Web 2.0 is a term to qualify a new web that is emerging right now.

This web will use the technologies that we already know in creative ways to develop a collaborative “two way web”.

Like any other evolution, Web 2.0 comes with a series of risks: technical, ergonomic, financial and threats against our privacy.

Beyond the marketing buzzword, Web 2.0 is a fabulous bubble of new ideas, practices and usages.

The fact that its shape is still so blurred shows that everything is still open and that personal initiatives are still important.

The Web 2.0 message is a message of hope!

References

Web 2.0 definitions on Wikipedia [French] [English]
Web 2.0 seen by Paul Graham
Roy Fielding’s thesis
Rob Hof analysis
A fake freedom by François Joseph de Kermadec