RELAX NG and W3C XML Schema compared (continued)

A lot of comparisons have already been published on this topic, but there are still plenty of misunderstanding when comparing W3C XML Schema so called Object Oriented features with RELAX NG patterns.

Many people complain that RELAX NG does not support complex type derivation nor substitution groups.

There are two ways to look at these features:

  1. If you focus on validation, these are ways to define sets of valid instance fragments.
  2. If you focus on modeling, these are ways to define design patterns and declare to potential applications what kind of relations exist between definitions.

RELAX NG (and DSDL in general) focuses on validation and its built in features provide equivalences to W3C XML Schema features in term of validation only.

Let’s see what this means on a simple example.

Derivation by extension

XW3C XML Schema:

   <xs:complexType name="BaseType">
        <xs:sequence>
            <xs:element name="FirstName" type="xs:token"/>
            <xs:element name="LastName" type="xs:token"/>
            <xs:element name="Mail" type="xs:token" minOccurs="0"/>
        </xs:sequence>
    </xs:complexType>

    <xs:complexType name="ExtendedType">
        <xs:complexContent>
            <xs:extension base="BaseType">
                <xs:sequence>
                    <xs:element name="Password" type="xs:token"/>
                </xs:sequence>
            </xs:extension>
        </xs:complexContent>
   </xs:complexType>            

The equivalent schema in RELAX NG is (compact syntax):

BaseType =
  element FirstName { xsd:token },
  element LastName { xsd:token },
  element Mail { xsd:token }?

ExtendedType =
  BaseType,
  element Password { xsd:token }
            

Or (XML syntax):

  <define name="BaseType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
    <optional>
      <element name="Mail">
        <data type="token"/>
      </element>
    </optional>
  </define>

  <define name="ExtendedType">
    <ref name="BaseType"/>
    <element name="Password">
      <data type="token"/>
    </element>
  </define> 

A derivation by extension translates in RELAX NG by creating a new pattern that adds content after a reference to the base pattern.

Derivation by restriction

XW3C XML Schema:

    <xs:complexType name="RestrictedType">
        <xs:complexContent>
            <xs:restriction base="BaseType">
                <xs:sequence>
                    <xs:element name="FirstName" type="xs:token"/>
                    <xs:element name="LastName" type="xs:token"/>
                </xs:sequence>
            </xs:restriction>
        </xs:complexContent>
    </xs:complexType>

The equivalent schema in RELAX NG is (compact syntax):

RestrictedType =
  element FirstName { xsd:token },
  element LastName { xsd:token }
            

Or (XML syntax):

  <define name="RestrictedType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
  </define> 

A derivation by restriction translates in RELAX NG by creating a new pattern that contains a definition that is a restriction of the base pattern.

Substitution groups

XW3C XML Schema:

    <xs:element name="Root">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="Head"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <xs:element name="Head" type="BaseType"/>

    <xs:element name="Restricted" type="RestrictedType" substitutionGroup="Head"/>

    <xs:element name="Extended" type="ExtendedType" substitutionGroup="Head"/>

The equivalent schema in RELAX NG is (compact syntax):

Head = element Head { BaseType }
Head |= element Restricted { RestrictedType }
Head |= element Extended { ExtendedType }
start = element Root { Head  }
            

Or (XML syntax):

   <define name="Head">
    <element name="Head">
      <ref name="BaseType"/>
    </element>
  </define>
  <define name="Head" combine="choice">
    <element name="Restricted">
      <ref name="RestrictedType"/>
    </element>
  </define>
  <define name="Head" combine="choice">
    <element name="Extended">
      <ref name="ExtendedType"/>
    </element>
  </define>
  <start>
    <element name="Root">
      <ref name="Head"/>
    </element>
  </start> 

A substitution group translates in RELAX NG by combining by choice the definition of the head of the substitution group with the definitions of the group members.

What did we miss

These schemas can be considered equivalent because they validate the same set of instance documents (with the difference that the RELAX NG schemas do not allow xsi attributes).

The main difference is that the relation between the base and derived types and between the members of the substitution group is made explicit in W3C XML Schema and is implicit in RELAX NG.

For the derivation by extension and substitution groups, the design patterns used in RELAX NG (content added after a reference for an extension and combination by choice of an element definition) could be considered characteristic enough so that tools can automatically detect them.

For the derivation by restriction, there isn’t much in the RELAX NG schema that could inform a tool that RestrictedType is a restriction of BaseType.

To make these relations or design patterns explicit, it is very easy to use annotations.

A complete schema with annotations for all three design patterns could be (compact syntax):

namespace oo = "http://ns.xmlschemata.org/object-orientation/"

BaseType =
  element FirstName { xsd:token },
  element LastName { xsd:token },
  element Mail { xsd:token }?

[ oo:extends = "BaseType" ]
ExtendedType =
  BaseType,
  element Password { xsd:token }

[ oo:restricts = "BaseType" ]
RestrictedType =
  element FirstName { xsd:token },
  element LastName { xsd:token }

Head = element Head { BaseType }

[ oo:substitutionGroup = "Head" ]
Head |= element Restricted { RestrictedType }

[ oo:substitutionGroup = "Head" ]
Head |= element Extended { ExtendedType }

start = element Root { Head }

            

Or (XML syntax):

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns:oo="http://ns.xmlschemata.org/object-orientation/"
  xmlns="http://relaxng.org/ns/structure/1.0"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <define name="BaseType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
    <optional>
      <element name="Mail">
        <data type="token"/>
      </element>
    </optional>
  </define>
  <define name="ExtendedType" oo:extends="BaseType">
    <ref name="BaseType"/>
    <element name="Password">
      <data type="token"/>
    </element>
  </define>
  <define name="RestrictedType" oo:restricts="BaseType">
    <element name="FirstName">
      <data type="token"/>
    </element>
    <element name="LastName">
      <data type="token"/>
    </element>
  </define>
  <define name="Head">
    <element name="Head">
      <ref name="BaseType"/>
    </element>
  </define>
  <define name="Head" combine="choice" oo:substitutionGroup="Head">
    <element name="Restricted">
      <ref name="RestrictedType"/>
    </element>
  </define>
  <define name="Head" combine="choice" oo:substitutionGroup="Head">
    <element name="Extended">
      <ref name="ExtendedType"/>
    </element>
  </define>
  <start>
    <element name="Root">
      <ref name="Head"/>
    </element>
  </start>
</grammar>

            

These annotations would be (as any annotation) ignored by RELAX NG processors but can be used by tools that need to understand the relation between type and element definitions (such as binding tools). These tools could also enforce the rules defined by W3C XML Schema and check that restrictions are actual restrictions (a number of papers have been published explaining how this can be implemented).

It should also be noted that annotations can be used to identity other design patterns than those implemented by W3C XML Schema.

References

This post is a consolidation of mails sent on the XML-DEV mailing list: [1] [2] [thread]

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

7 thoughts on “RELAX NG and W3C XML Schema compared (continued)”

  1. Bonjour,

    Je découvre votre blog avec beaucoup d’intérêt. Pourriez-vous m’éclairer sur le passage suivant de votre livre :”This use of the namespace prefixes, common to W3C XML Schema and XSLT, is very controversial, since it creates a dependency between W3C XML Schema (considered an application) and the prefixes chosen for the namespaces.”?

    Merci!

    Emmanuel

  2. Bonjour Emmanuel,

    Je veux dire que l’on crée une dépendance entre le balisage et le contenu, que cela me semble mauvais sur un plan architectural et pose un certain nombre de problèmes (il devient impossible d’interpréter ou de recopier le contenu sans analyser ou recopier des informations du balisage).

    C’est d’ailleurs ce qui motive la spécification “CURIE” du W3C (en cours de standardisation) : http://www.w3.org/TR/curie/ !

    Eric

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter your OpenID as your website to log and skip name and email validation and moderation!