XHTML 2.0 and HTML 5: The figures

This post has been updated to take into account a mail from Björn Höhrmann with a heads-up about missing elements in the XHTML 2.0 list of elements.

The future of (X)HTML appears to be searching its way between two conflicting visions:

I have posted my views on the subject on XML-DEV and have been surprised by the answer from Björn Höhrmann. The server hosting XML-DEV and its archives is currently down but you can see this answer in Google’s cache.

The point I have found most surprising are his statistics: “XHTML 2 increases the element count by 50% compared to XHTML 1.0 Strict, and by 10% compared to HTML 2.0, HTML 3.2, HTML 4.01, and XHTML 1.1 combined, including the Frameset and Transitional variants.”

Other chapters from our upcoming Web 2.0 book kept me too busy to double check these figures but we have decided to mention this debate in our Chapter 5 and I really needed to analyse these statistics in more detail.

My sources for this exercise are:

The data concerning XHTML 2.0 is the consolidation between the list of XHTML 2.0 elements included in the Working Draft, the RELAX NG schema and the W3C XML Schema for XForms. This is needed because the list of elements is a simplified list where XForms and Ruby sub-elements are not included (see my mail to the HTML Working Group for more details). Many thanks to Björn Höhrmann for pointing that out.

By scraping these pages, I have extracted a consolidated list of elements that can be represented by the following table where in each cell you find the module into which the element belongs for the corresponding (X)HTML version or the mention “deprecated” if the element is deprecated:

Element HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
a Core Hypertext Hypertext Phrase
abbr Core Text Text Phrase
access Access
acronym Core Text
action XForms
address Core Text Structural Sections
alert XForms
applet Deprecated Deprecated
area Core Client-side Image Map
article Sections
aside Sections
b Core Presentation
base Core Base Document metadata
basefont Deprecated Deprecated
bdo Core Bi-directional Text Phrase
big Core Presentation
bind XForms
blockcode Structural
blockquote Core Text Structural Sections
body Core Structure Document Sections
br Core Text Phrase
button Core Forms
caption Core Tables Tables
case XForms
center Deprecated Deprecated
choices XForms
cite Core Text Text Phrase
code Core Text Text Phrase
col Core Tables Tables
colgroup Core Tables Tables
command Interactive
copy XForms
datagrid Interactive
dd Core List List Lists
del Core Edit Edits
delete XForms
details Interactive
dfn Core Text Text Phrase
di List
dir Deprecated Deprecated
dispatch XForms
div Core Text Structural
dl Core List List Lists
dt Core List List Lists
em Core Text Text Phrase
ev:listener XML Events
event-source Server-sent DOM events
extension XForms
fieldset Core Forms
filename XForms
font Deprecated Deprecated
footer Sections
form Core Forms
frame Frames Frames
frameset Frames Frames
group XForms
h Structural
h1 Core Text Structural Sections
h2 Core Text Structural Sections
h3 Core Text Structural Sections
h4 Core Text Structural Sections
h5 Core Text Structural Sections
h6 Core Text Structural Sections
handler Handler
head Core Structure Document Document metadata
header Sections
help XForms
hint XForms
hr Core Presentation Paragraphs
html Core Structure Document HTML documents and document fragments
i Core Presentation Phrase
iframe Core Iframe
img Core Image Image content[TBW]
input Core Forms XForms
ins Core Edit Edits
insert XForms
instance XForms
isindex Deprecated Deprecated
item XForms
itemset XForms
kbd Core Text Text Phrase
l Text
label Core Forms List
legend Core Forms
li Core List List Lists
link Core Link Metainformation Document metadata
load XForms
m Phrase
map Core Client-side Image Map
mediatype XForms
menu Deprecated Deprecated Interactive
message XForms
meta Core Metainformation Metainformation Document metadata
meter Phrase
model XForms
nav Sections
nl List
noframes Frames Frames
noscript Core Scripting Scripting
object Core Object Object
ol Core List List Lists
optgroup Core Forms
option Core Forms
output XForms
p Core Text Structural Paragraphs
param Core Object Object
pre Core Text Structural Preformatted text
progress Phrase
q Core Text Text Phrase
range XForms
rb Ruby
rbc Ruby
rebuild XForms
recalculate XForms
refresh XForms
repeat XForms
reset XForms
revalidate XForms
rp Ruby
rt Ruby
rtc Ruby
ruby Ruby
s Deprecated Deprecated
samp Core Text Text Phrase
script Core Scripting Scripting
secret XForms
section Structural Sections
select Core Forms XForms
select1 XForms
send XForms
separator Structural
setfocus XForms
setindex XForms
setvalue XForms
small Core Presentation Phrase
span Core Text Text Phrase
standby Object
strike Deprecated Deprecated
strong Core Text Text Phrase
style Core Style Sheet Style Sheet Document metadata
sub Core Presentation Text Phrase
submission XForms
submit XForms
summary Tables
sup Core Presentation Text Phrase
switch XForms
t Phrase
table Core Tables Tables
tbody Core Tables Tables
td Core Tables Tables
textarea Core Forms XForms
tfoot Core Tables Tables
th Core Tables Tables
thead Core Tables Tables
title Core Structure Document Document metadata
toggle XForms
tr Core Tables Tables
trigger XForms
tt Core Presentation
u Deprecated Deprecated
ul Core List List Lists
upload XForms
value XForms
var Core Text Text Phrase

The total numbers of elements are :

HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
Number of elements 91 91 115 63

Now, it should be noted that we are not comparing apples to apples: HTML 4.01 and XHTML 1.x include a number of deprecated elements that shouldn’t be used. They also include frames elements that have been taken out from XHTML 2.0 to be defined in the XFrames specification and are not part of HTML 5 either. It seems fair to remove all these elements from our numbers and that gives:

HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
Number of non deprecated elements 81 81 115 63
Number of non deprecated non frames elements 78 78 115 63

These figures confirm the increase of almost 50% between HTML 4.01 or XHTML 1.1 and XHTML 2.0 mentioned by Björn Höhrmann and it is worth searching where the increase comes from. If you look at the different modules in this table, you’ll see that whereas HTML 4.01 and XHTML 1.1 include 10 elements from their Forms module, XHTML 2.O includes 46 XForms elements. The increase in the number of elements comes entirely from the XHTML 2.0 Xforms support and there is an actual decrease in the number of elements in the other modules.

Furthermore to compare with HTML 5.0, you also need to remove table elements which are not yet defined in HTML 5.0 and the figures are quite different:

HTML 4.01 XHTML 1.1 XHTML 2.0 HTML 5
Number of non deprecated elements 78 78 115 63
Number of Forms or XForms elements 10 10 46 0
Number of non deprecated non frames non forms elements 68 68 69 63
Number of tables elements 10 10 11 0
Number of non deprecated non frames non forms non tables elements 58 58 58 63

In other words, the debate of whether XHTML 2.0 is a simplification can be split into two different points:

  • The number of elements for the classical non forms related features is the same between HTML 4.01 and XHTML 1.1 and XHTML 2.0.
  • The replacement of the Forms module by XForms represente a complete paradigm change that undeniably leads to more complexity and an increase in the number of elements.

The last line shows that there is an actual increase in the number of elements between HTML 4.01 or XHTML 1.1 and HTML 5. If you look in the overall table, you’ll notice that this increase is due to the addition to quite a number of new elements that is compensated by removing elements that have been considered as either almost duplicated (for instance acronym has been removed and people advised to use abbr for both acronyms and abbreviations) or not very useful.

Of course, number of elements are 100% representative of the complexity of a vocabulary, but they give a good indication and the figures given by Björn did deserve some further analysis.

PS: I have sent an answer to XML-DEV that may find its way when their server will be up again.

PPS: I recommend reading Björn Höhrmann mails to the www-html@w3.org mailing list as a complement to this blog entry:

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

One thought on “XHTML 2.0 and HTML 5: The figures”

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter your OpenID as your website to log and skip name and email validation and moderation!