This post has been updated to take into account a mail from Björn Höhrmann with a heads-up about missing elements in the XHTML 2.0 list of elements.
The future of (X)HTML appears to be searching its way between two conflicting visions:
- The W3C and its XHTML 2.0 Working Drafts.
- The WHATWG and its HTML 5 counter proposal
I have posted my views on the subject on XML-DEV and have been surprised by the answer from Björn Höhrmann. The server hosting XML-DEV and its archives is currently down but you can see this answer in Google’s cache.
The point I have found most surprising are his statistics: « XHTML 2 increases the element count by 50% compared to XHTML 1.0 Strict, and by 10% compared to HTML 2.0, HTML 3.2, HTML 4.01, and XHTML 1.1 combined, including the Frameset and Transitional variants. »
Other chapters from our upcoming Web 2.0 book kept me too busy to double check these figures but we have decided to mention this debate in our Chapter 5 and I really needed to analyse these statistics in more detail.
My sources for this exercise are:
- HTML 4.01 Index of elements
- XHTML 1.1 Abstract Modules
- XHTML 2.0 List Of Elements
- XHTML 2.0 RELAX NG schema
- XForms 1.0 W3C XML Schema
- Web Applications 1.0 (aka HTML 5)
The data concerning XHTML 2.0 is the consolidation between the list of XHTML 2.0 elements included in the Working Draft, the RELAX NG schema and the W3C XML Schema for XForms. This is needed because the list of elements is a simplified list where XForms and Ruby sub-elements are not included (see my mail to the HTML Working Group for more details). Many thanks to Björn Höhrmann for pointing that out.
By scraping these pages, I have extracted a consolidated list of elements that can be represented by the following table where in each cell you find the module into which the element belongs for the corresponding (X)HTML version or the mention « deprecated » if the element is deprecated:
| Element | HTML 4.01 | XHTML 1.1 | XHTML 2.0 | HTML 5 |
|---|---|---|---|---|
| a | Core | Hypertext | Hypertext | Phrase |
| abbr | Core | Text | Text | Phrase |
| access | Access | |||
| acronym | Core | Text | ||
| action | XForms | |||
| address | Core | Text | Structural | Sections |
| alert | XForms | |||
| applet | Deprecated | Deprecated | ||
| area | Core | Client-side Image Map | ||
| article | Sections | |||
| aside | Sections | |||
| b | Core | Presentation | ||
| base | Core | Base | Document metadata | |
| basefont | Deprecated | Deprecated | ||
| bdo | Core | Bi-directional Text | Phrase | |
| big | Core | Presentation | ||
| bind | XForms | |||
| blockcode | Structural | |||
| blockquote | Core | Text | Structural | Sections |
| body | Core | Structure | Document | Sections |
| br | Core | Text | Phrase | |
| button | Core | Forms | ||
| caption | Core | Tables | Tables | |
| case | XForms | |||
| center | Deprecated | Deprecated | ||
| choices | XForms | |||
| cite | Core | Text | Text | Phrase |
| code | Core | Text | Text | Phrase |
| col | Core | Tables | Tables | |
| colgroup | Core | Tables | Tables | |
| command | Interactive | |||
| copy | XForms | |||
| datagrid | Interactive | |||
| dd | Core | List | List | Lists |
| del | Core | Edit | Edits | |
| delete | XForms | |||
| details | Interactive | |||
| dfn | Core | Text | Text | Phrase |
| di | List | |||
| dir | Deprecated | Deprecated | ||
| dispatch | XForms | |||
| div | Core | Text | Structural | |
| dl | Core | List | List | Lists |
| dt | Core | List | List | Lists |
| em | Core | Text | Text | Phrase |
| ev:listener | XML Events | |||
| event-source | Server-sent DOM events | |||
| extension | XForms | |||
| fieldset | Core | Forms | ||
| filename | XForms | |||
| font | Deprecated | Deprecated | ||
| footer | Sections | |||
| form | Core | Forms | ||
| frame | Frames | Frames | ||
| frameset | Frames | Frames | ||
| group | XForms | |||
| h | Structural | |||
| h1 | Core | Text | Structural | Sections |
| h2 | Core | Text | Structural | Sections |
| h3 | Core | Text | Structural | Sections |
| h4 | Core | Text | Structural | Sections |
| h5 | Core | Text | Structural | Sections |
| h6 | Core | Text | Structural | Sections |
| handler | Handler | |||
| head | Core | Structure | Document | Document metadata |
| header | Sections | |||
| help | XForms | |||
| hint | XForms | |||
| hr | Core | Presentation | Paragraphs | |
| html | Core | Structure | Document | HTML documents and document fragments |
| i | Core | Presentation | Phrase | |
| iframe | Core | Iframe | ||
| img | Core | Image | Image | content[TBW] |
| input | Core | Forms | XForms | |
| ins | Core | Edit | Edits | |
| insert | XForms | |||
| instance | XForms | |||
| isindex | Deprecated | Deprecated | ||
| item | XForms | |||
| itemset | XForms | |||
| kbd | Core | Text | Text | Phrase |
| l | Text | |||
| label | Core | Forms | List | |
| legend | Core | Forms | ||
| li | Core | List | List | Lists |
| link | Core | Link | Metainformation | Document metadata |
| load | XForms | |||
| m | Phrase | |||
| map | Core | Client-side Image Map | ||
| mediatype | XForms | |||
| menu | Deprecated | Deprecated | Interactive | |
| message | XForms | |||
| meta | Core | Metainformation | Metainformation | Document metadata |
| meter | Phrase | |||
| model | XForms | |||
| nav | Sections | |||
| nl | List | |||
| noframes | Frames | Frames | ||
| noscript | Core | Scripting | Scripting | |
| object | Core | Object | Object | |
| ol | Core | List | List | Lists |
| optgroup | Core | Forms | ||
| option | Core | Forms | ||
| output | XForms | |||
| p | Core | Text | Structural | Paragraphs |
| param | Core | Object | Object | |
| pre | Core | Text | Structural | Preformatted text |
| progress | Phrase | |||
| q | Core | Text | Text | Phrase |
| range | XForms | |||
| rb | Ruby | |||
| rbc | Ruby | |||
| rebuild | XForms | |||
| recalculate | XForms | |||
| refresh | XForms | |||
| repeat | XForms | |||
| reset | XForms | |||
| revalidate | XForms | |||
| rp | Ruby | |||
| rt | Ruby | |||
| rtc | Ruby | |||
| ruby | Ruby | |||
| s | Deprecated | Deprecated | ||
| samp | Core | Text | Text | Phrase |
| script | Core | Scripting | Scripting | |
| secret | XForms | |||
| section | Structural | Sections | ||
| select | Core | Forms | XForms | |
| select1 | XForms | |||
| send | XForms | |||
| separator | Structural | |||
| setfocus | XForms | |||
| setindex | XForms | |||
| setvalue | XForms | |||
| small | Core | Presentation | Phrase | |
| span | Core | Text | Text | Phrase |
| standby | Object | |||
| strike | Deprecated | Deprecated | ||
| strong | Core | Text | Text | Phrase |
| style | Core | Style Sheet | Style Sheet | Document metadata |
| sub | Core | Presentation | Text | Phrase |
| submission | XForms | |||
| submit | XForms | |||
| summary | Tables | |||
| sup | Core | Presentation | Text | Phrase |
| switch | XForms | |||
| t | Phrase | |||
| table | Core | Tables | Tables | |
| tbody | Core | Tables | Tables | |
| td | Core | Tables | Tables | |
| textarea | Core | Forms | XForms | |
| tfoot | Core | Tables | Tables | |
| th | Core | Tables | Tables | |
| thead | Core | Tables | Tables | |
| title | Core | Structure | Document | Document metadata |
| toggle | XForms | |||
| tr | Core | Tables | Tables | |
| trigger | XForms | |||
| tt | Core | Presentation | ||
| u | Deprecated | Deprecated | ||
| ul | Core | List | List | Lists |
| upload | XForms | |||
| value | XForms | |||
| var | Core | Text | Text | Phrase |
The total numbers of elements are :
| HTML 4.01 | XHTML 1.1 | XHTML 2.0 | HTML 5 | |
|---|---|---|---|---|
| Number of elements | 91 | 91 | 115 | 63 |
Now, it should be noted that we are not comparing apples to apples: HTML 4.01 and XHTML 1.x include a number of deprecated elements that shouldn’t be used. They also include frames elements that have been taken out from XHTML 2.0 to be defined in the XFrames specification and are not part of HTML 5 either. It seems fair to remove all these elements from our numbers and that gives:
| HTML 4.01 | XHTML 1.1 | XHTML 2.0 | HTML 5 | |
|---|---|---|---|---|
| Number of non deprecated elements | 81 | 81 | 115 | 63 |
| Number of non deprecated non frames elements | 78 | 78 | 115 | 63 |
These figures confirm the increase of almost 50% between HTML 4.01 or XHTML 1.1 and XHTML 2.0 mentioned by Björn Höhrmann and it is worth searching where the increase comes from. If you look at the different modules in this table, you’ll see that whereas HTML 4.01 and XHTML 1.1 include 10 elements from their Forms module, XHTML 2.O includes 46 XForms elements. The increase in the number of elements comes entirely from the XHTML 2.0 Xforms support and there is an actual decrease in the number of elements in the other modules.
Furthermore to compare with HTML 5.0, you also need to remove table elements which are not yet defined in HTML 5.0 and the figures are quite different:
| HTML 4.01 | XHTML 1.1 | XHTML 2.0 | HTML 5 | |
|---|---|---|---|---|
| Number of non deprecated elements | 78 | 78 | 115 | 63 |
| Number of Forms or XForms elements | 10 | 10 | 46 | 0 |
| Number of non deprecated non frames non forms elements | 68 | 68 | 69 | 63 |
| Number of tables elements | 10 | 10 | 11 | 0 |
| Number of non deprecated non frames non forms non tables elements | 58 | 58 | 58 | 63 |
In other words, the debate of whether XHTML 2.0 is a simplification can be split into two different points:
- The number of elements for the classical non forms related features is the same between HTML 4.01 and XHTML 1.1 and XHTML 2.0.
- The replacement of the Forms module by XForms represente a complete paradigm change that undeniably leads to more complexity and an increase in the number of elements.
The last line shows that there is an actual increase in the number of elements between HTML 4.01 or XHTML 1.1 and HTML 5. If you look in the overall table, you’ll notice that this increase is due to the addition to quite a number of new elements that is compensated by removing elements that have been considered as either almost duplicated (for instance acronym has been removed and people advised to use abbr for both acronyms and abbreviations) or not very useful.
Of course, number of elements are 100% representative of the complexity of a vocabulary, but they give a good indication and the figures given by Björn did deserve some further analysis.
PS: I have sent an answer to XML-DEV that may find its way when their server will be up again.
PPS: I recommend reading Björn Höhrmann mails to the www-html@w3.org mailing list as a complement to this blog entry: