The transform source effect

Why is exposing a document model so important? Why would that be better than providing import/export capabilities or API accesses to the document mode?

The « view source effect » is often considered as one of the reasons why XML is so important: people can just learn from opening existing documents and copy/paste stuff they like into their own documents.

Following this analysis, the view source effect would be one of the main reasons of the success of the web: you can just learn by looking at the source of the pages you consider as good examples.

The view source effect is important indeed, but to take it to its full potential copy/paste need to be automated and the view source effect to become the « transform source effect ».

The ability to transform sources means that you don’t need to fully understand what’s going on to take advantage of markup languages formats: you can just do text substitution on a template.

The web is full of examples of the power of the transform source effect: the various templating languages such as PHP, ASP, JSP and many more are nothing more than implementations of the transform source effect.

The « style free stylesheets » which power XMLfr and that I have described in an XML.com article are another example of the transform source effect.

How does that relate to desktop publishing formats? Let’s take a simple example to illustrate that point.

Let’s say I am programmer and I need to deliver an application that takes models of letters and print them after having changed the names and addresses.

Let’s also imagine that I do not know anything of the specifics of the different word processors and that I want my application to be portable across Microsoft Word, Open Office, WordPerfect, AbiWord and Scribus.

Finally, let’s say, for the fun, that I do not know anything of XML but that I am a XXX programmer (substitute XXX by whatever programming language you like).

Because all these word processors can read and write their documents as XML, I’ll just write a simple program that will substitute predefined strings values included in the documents (let’s call them $name, $address, …) with the contents of variables that I could retrieve for instance from a database.

I am sure that you know how to do that with your favorite programming language! In Perl for instance, that would be something like:

#!/usr/bin/perl

$name = 'Mr. Eric van der Vlist';
$address = '22, rue Edgar Faure';
$postcode = 'F75015';
$city = 'Paris';
$country = 'France';

while (<>) {

	s/\$(name|address|postcode|city|country)/${$1}/g;
	print;


            }

There is no magic here: I am just replacing occurrences of the string « $name » in the text by the variable $name that contains « Eric van der Vlist », occurrences of « $address » by « 22, rue Edgar Faure » and so on in a plain text document.

I am leveraging on the « transform source effect » to write a simple application that is compatible with any application that enables this effect by exposing its model as plain text.

This application will work with Microsoft Word (using WordML and probably even RTF), OpenOffice, WordPerfect, AbiWord, Scribus and may more.

It will also work with HTML, XHTML, XSL-FO, SVG, DocBook, TEI, plain text, TEX, …

It will work with Quark, but only if we use QXML as a XML format and not as an API.

And it won’t work with InDesign unless there is a way to import/export full InDesign documents in XML…

See also:

InDesign and XML: no better than its competition

Someone who had read my bog entry about Quark and XML has asked me if I knew whether Adobe had followed the same principles for the support of XML in InDesign.

I am not a specialist of this product range and I had no answer to this question.

Anyway, some quick research on Adobe’s web site and on Google makes me think that even though I find it disappointed that Quark (like so many others) is making this confusion between markup languages and APIs, InDesign hasn’t even reached this point yet.

Adobe’s web site describes InDesign’s flexible XML support as:

Enhanced XML import: Import XML files with flexible control using the Structure view and Tags palette. Automatically flow XML into tagged templates or import and interactively place it. Enhanced options give you greater import control and make it easier to achieve the results you want.

Linked XML files: Create a link to an XML file on import, so you can easily update your placed XML content whenever the source XML content is updated.

XML automation: Automatically format XML on import by mapping XML tags to paragraph and character styles in your document, or more easily reuse existing content by mapping text styles to XML tags and then exporting the XML content.

XML table tagging: Easily apply XML tags to InDesign tables. Then import XML content into the tables or export it from them.

These are useful features that are detailed in a XML.com article, but they do not expose their complete document model in XML, either directly nor even through a XML Schema of the DOM.

That might be the reason why someone that defines himself as a die hard InDesign fan has commented this article to say that Adobe InDesign is behind QuarkXPress in terms of XML features.

This comment has been written in August, 2004 but, has far as I can see on the web, this is still the case today.

See also: