One of my customers has found out that the DTD published by Sun to validate property files in JSE 1.5.0 is not well formed!
The javadoc explains :
Note that the system URI (http://java.sun.com/dtd/properties.dtd) is not accessed when exporting or importing properties; it merely serves as a string to uniquely identify the DTD, which is:
<?xml version="1.0" encoding="UTF-8"?> <!-- DTD for properties --> <!ELEMENT properties ( comment?, entry* ) > <!ATTLIST properties version CDATA #FIXED "1.0"> <!ELEMENT comment (#PCDATA) > <!ELEMENT entry (#PCDATA) > <!ATTLIST entry key CDATA #REQUIRED>
Reducing the system URI to a mere identifier is a simplification that can lead to problems when you parse your document: XML parsers are free to load DTDs even if you specify standalone=”yes” in your XML declaration and even if you run them in non-validating mode.
In that case, including a system URI pointing to a non well formed DTD means that depending on you parser and on the options you’ll send it at parse time, you may get (or not) a well formness error.
Interestingly, the DTD listed above and borrowed from the javadoc is well formed.
The DTD published at http://java.sun.com/dtd/properties.dtd appears to have been modified to:
<!-- Copyright 2005 Sun Microsystems, Inc. All rights reserved. --> <?xml version="1.0" encoding="UTF-8"?> <!-- DTD for properties --> <!ELEMENT properties ( comment?, entry* ) > <!ATTLIST properties version CDATA #FIXED "1.0"> <!ELEMENT comment (#PCDATA) > <!ELEMENT entry (#PCDATA) > <!ATTLIST entry key CDATA #REQUIRED>
See what has happened? Someone has probably insisted that they should add a copyright statement at the beginning of each of their documents, forgetting that XML forbids comments before the XML declaration…
We shouldn’t let lawyers edit XML documents!
5 Comments
Write a Comment»Then again, SGML was invented by a lawyer.
John, that might be the reason!
But that lawyer was fanatical about validation and knew better than introducing a feature for well-formedness that would fail the stupid test: every introduction of a feature to support well-formedness, such as the optional prolog, tends to fail the stupid test.
So maybe stupid people shouldn’t edit XML.
A DTD isn’t a valid XML document. The problem is that it includes the XML processing instruction when it shouldn’t.
Apart from anything else, an XML document must contain a single well-formed root element, which is not present in a DTD. Additionally, none of the declarations are valid XML content.
Hmmm… Which processing instruction do you mean?
Technically speaking, <?xml version=”1.0″ encoding=”UTF-8″?> isn’t considered as a PI but as a XML declaration and it’s not only allowed in a DTD but also considered a good practice.
The error is the comment which is forbidden before the XML declaration in both XML documents and DTDs…