Lawyers shouldn’t edit XML documents

One of my customers has found out that the DTD published by Sun to validate property files in JSE 1.5.0 is not well formed!

The javadoc explains :

Note that the system URI (http://java.sun.com/dtd/properties.dtd) is not accessed when exporting or importing properties; it merely serves as a string to uniquely identify the DTD, which is:

<?xml version="1.0" encoding="UTF-8"?>

<!-- DTD for properties -->

<!ELEMENT properties ( comment?, entry* ) >

<!ATTLIST properties version CDATA #FIXED "1.0">

<!ELEMENT comment (#PCDATA) >

<!ELEMENT entry (#PCDATA) >

<!ATTLIST entry key CDATA #REQUIRED>

Reducing the system URI to a mere identifier is a simplification that can lead to problems when you parse your document: XML parsers are free to load DTDs even if you specify standalone= »yes » in your XML declaration and even if you run them in non-validating mode.

In that case, including a system URI pointing to a non well formed DTD means that depending on you parser and on the options you’ll send it at parse time, you may get (or not) a well formness error.

Interestingly, the DTD listed above and borrowed from the javadoc is well formed.

The DTD published at http://java.sun.com/dtd/properties.dtd appears to have been modified to:

<!--
   Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
  -->

<?xml version="1.0" encoding="UTF-8"?>

<!-- DTD for properties -->

<!ELEMENT properties ( comment?, entry* ) >

<!ATTLIST properties version CDATA #FIXED "1.0">

<!ELEMENT comment (#PCDATA) >

<!ELEMENT entry (#PCDATA) >

<!ATTLIST entry key CDATA #REQUIRED>

See what has happened? Someone has probably insisted that they should add a copyright statement at the beginning of each of their documents, forgetting that XML forbids comments before the XML declaration…

We shouldn’t let lawyers edit XML documents!

5 thoughts on “Lawyers shouldn’t edit XML documents”

  1. Hmmm… Which processing instruction do you mean?

    Technically speaking, <?xml version= »1.0″ encoding= »UTF-8″?> isn’t considered as a PI but as a XML declaration and it’s not only allowed in a DTD but also considered a good practice.

    The error is the comment which is forbidden before the XML declaration in both XML documents and DTDs…

  2. A DTD isn’t a valid XML document. The problem is that it includes the XML processing instruction when it shouldn’t.

    Apart from anything else, an XML document must contain a single well-formed root element, which is not present in a DTD. Additionally, none of the declarations are valid XML content.

  3. But that lawyer was fanatical about validation and knew better than introducing a feature for well-formedness that would fail the stupid test: every introduction of a feature to support well-formedness, such as the optional prolog, tends to fail the stupid test.

    So maybe stupid people shouldn’t edit XML.

Répondre à John Cowan Annuler la réponse

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *