Extensible HyperText Markup Language

X/HTML 5 Versus XHTML 2

The competition to become the next markup language for the Web is heating up. This article takes a look at what's cool and what's uncool about the competing technologies.

Background

Although HTML 4 and XHTML 1 have served us well to date, these specs still have their faults. To meet user demand for rich Web-based applications, to generate better search results, and to create a more accessible Web for people of all abilities and using all types of devices, these specs need to be updated or replaced.

There are two specifications vying to become the successor to HTML 4 and XHTML 1. They are XHTML 2.0 and Web Applications 1.0, more commonly referred to as X/HTML 5. These specifications take different approaches and will have different outcomes in terms of the future development of markup languages.

XHTML 2 is a bold step forward intended to create an architecture that will become the host language to many other W3C technologies already in use, or in the works. XHTML 2 is based solely on XML, a technology that most believe will enable the Web to reach its full potential. XHTML 2 is driven by how markup should be used, rather than by how markup is currently used.

X/HTML 5 is an extension of HTML 4 and XHTML 1. It is an incremental step forward rather than a grand leap forward in the style of XHTML 2. Working within the confines of HTML 4 and XHTML 1, X/HTML 5 has devised clever solutions to address some of the faults in HTML 4 and XHTML 1. X/HTML 5 can be also be served as HTML or XML. So, unlike XHTML 2, X/HTML 5 is influenced by the current state of the art (Web browser technology, etc.)  and how markup is currently used.

Both X/HTML 5 and XHTML 2 are at the stage of working drafts. Both specifications are expected to change, and several years will likely pass before they become recommendations. This article comments on the working drafts in force as of February 2007.

XHTML 2

What's Cool About XHTML 2

Navigation Lists

Navigation lists are designed to create navigation menus. Navigation lists are defined using an nl element and must contain a label element that contains the title for the list. For example:

  1. <nl>
  2. <label>You are here:</label>
  3. <li href="/">Home</li>
  4. <li href="/products/">Products</li>
  5. <li href="/products/widget/">Widgit</li>
  6. <li>Features</li>
  7. </nl>

Navigation lists are cool!

Enhancement To Definitions Lists

Definitions lists (dl element) define a term (dt element) and a definition (dd element). One term can have multiple definitions and multiple terms can have the same definition. XHTML 2 introduces the ability to group terms and definitions using the di element. This will help clarify the relationship between a term and its definition(s) and generate markup that is easier to read. For example:

  1. <dl>
  2. <di>
  3. <dt>center</dt>
  4. <dt>centre</dt>
  5. <dd>a building dedicated to a particular activity</dd>
  6. <dd>a point equidistant from its ends</dt>
  7. </di>
  8. <di>
  9. <dt>key</dt>
  10. <dd>metal device used to open a lock</dd>
  11. <dd>pitch of the voice</dd>
  12. </di>
  13. </dl>

This enhancement to definition lists is cool!

An href attribute can be added to any element to transform it into a hyperlink. For example:

  1. <q href="http://en.wikipedia.org/wiki/Neil_Armstrong">That's one small step for man, one giant leap for mankind</q>

This is very cool!

acronym Is Gone

Many content authors are confused as to how the acronym element should be used. XHTML 2 will use the abbr element to represent all types of abbreviations, including acronyms.

Cool!

b, i, small, big, tt, font and basefont Are Gone

XHTML 2 said goodbye to these elements, used strictly for formatting. The font element in particular has been misused in the past and has discouraged content authors from using appropriate markup.

Beyond cool!

iframe Is Gone

The iframe element, which has always caused problems for users of assistive technologies, will not be missed.

Cool!

A New Headings Construct

Content headings are the most important constructs when it comes to making Web pages accessible. Yet virtually no one uses headings correctly, because numbered headings constructs (h1 to h6 elements) are difficult to visualize for most people, and are almost impossible to author correctly using WYSIWYG editors. Physically, numbered headings are linear constructs (sibling elements) that are used to logically organize data into a hierarchy. So, in the following example, you have to make an effort to visualize the hierarchical structure of the content.

  1. <h1>...</h1>
  2. <p>...</p>
  3. <h2>...</h2>
  4. <p>...</p>
  5. <h2>...</h2>
  6. <p>...</p>
  7. <h3>...</h3>
  8. <p>...</p>
  9. <h4>...</h4>
  10. <p>...</p>
  11. <h3>...</h3>
  12. <p>...</p>
  13. <h2>...</h2>
  14. <p>...</p>

By contrast, the new heading construct, using the h element along with the grouping element section, makes the hierarchical relationship infinitely easier to grasp:

  1. <h>...</h>
  2. <p>...</p>
  3. <section>
  4. <h>...</h>
  5. <p>...</p>
  6. <h>...</h>
  7. <p>...</p>
  8. <section>
  9. <h>...</h>
  10. <p>...</p>
  11. <section>
  12. <h>...</h>
  13. <p>...</p>
  14. </section>
  15. <h>...</h>
  16. <p>...</p>
  17. </section>
  18. <h>...</h>
  19. <p>...</p>
  20. </section>

The h element is very cool!

Enhancement To Writing Computer Code Examples

The blockcode element can be used instead of pre and code to write blocks of computer code. For example:

  1. <blockcode>
  2. function get_random_name() {
  3. $rand_name = "";
  4. for ($i = 1; $i &lt;= 8; $i++) {
  5. $rand_name .= chr(rand(97, 122));
  6. }
  7. return $rand_name;
  8. }
  9. </blockcode>

This is cool!

hr Replaced By separator

The name of the hr element, "horizontal rule", has always caused problems for content authors and tool vendors. The name implies that it is a horizontal line, when in fact it was intended to be used to separate one part of a document from another. Using separator resolves these misunderstandings.

This is cool!

del And ins Replaced By edit Attribute

The edit attribute does a much better job than the del and ins elements in indicating where content has changed. It can be applied in the following manner:

  1. <p>This is <span edit="deleted">cool</span><span edit="inserted">way cool</span>!</p>
Ability To Add Additional Semantics To Existing Elements

The role attribute can add new semantics and metadata to existing elements, helping search engines and assistive technologies better process Web pages. The following example shows how it would be possible to indicate that the contents of a given navigation list should be used as breadcrumbs.

  1. <nl role="breadcrumbs">
  2. <label>You are here:</label>
  3. <li href="/">Home</li>
  4. <li href="/products/">Products</li>
  5. <li href="/products/widget/">Widgit</li>
  6. <li>Features</li>
  7. </nl>

The technical terminology for using the role attribute is called embedding RDF in XHTML. This makes XHTML 2 very extensible and may well be the single most important tool in bringing the Web closer to realizing its full potential.

The role attribute is tremendously cool!

What's Uncool About XHTML 2

The a Element Is Still Around

Because you can use the href attribute on any element, the a element is in fact no longer needed. Keeping this element in the spec will only confuse content authors. For example, in HTML 4 and XHTML 1, the id attribute can be used to make any element into an anchor. For example:

  1. <h2 id="introduction">Introduction</h2>

Yet most content authors still use the a element for anchors. For example:

  1. <h2><a name="introduction">Introduction</a></h2>

Keeping the a element is uncool!

The img Element Is Still Around

In XHTML 2, the object element can do everything that the img element can. According to the spec, the img element is retained in order to ease the transition to XHTML 2, but what it will actually do is confuse content authors. The retained img element is also no longer an empty element, but can contain alternate content. For example:

  1. <img src="W3C.png">W3C</img>

If an element in XHTML 2 has the same name as an element in HTML 4 or XHTML 1, but behaves differently, this is likely to be a source of confusion and debate.

Keeping the img element is uncool!

Support For Numbered Headings

Because the h element is a better approach to creating headings, numbered headings are not necessary. Supporting both the h element and numbered headings will only confuse content authors.

Numbered headings are very uncool!

The Closed Nature Of The Group Developing XHTML 2

Very little is made public about the XHTML 2 group that is developing what could become the next markup language of the Web. Guys, this is not skunk works for some secret weapon. Let the sun shine in!

X/HTML 5

What's Cool About X/HTML 5

The Idea Of Sectioning Elements

X/HTML 5 introduces new elements that partition Web page content into sections. These partitions should help search engines and assistive technologies to better process content. Using these new elements could make markup more readable.

The idea of sectioning content is cool! But see why techniques for implementing sectioning are uncool.

dialog Element

The dialog element represents a conversation. It contains dt elements which identify the speaker, and dd elements which represent the speakers' quotes. For example:

  1. <dialog>
  2. <dt>Costello</dt>
  3. <dd>Look, you gotta first baseman?</dd>
  4. <dt>Abbott</dt>
  5. <dd>Certainly.</dd>
  6. <dt>Costello</dt>
  7. <dd>Who's playing first?</dd>
  8. <dt>Abbott</dt>
  9. <dd>That's right.</dd>
  10. <dt>Costello</dt>
  11. <dd>When you pay off the first baseman every month, who gets the money?</dd>
  12. <dt>Abbott</dt>
  13. <dd>Every dollar of it.</dd>
  14. </dialog>

This is cool!

figure Element

In print publications (textbooks, newspapers, magazines, etc.) media objects (photos, illustrations, graphs, etc.) are usually accompanied by a caption. Web markup languages lacked the construct to generate these until now. The figure element with a child legend element can be used to caption images. For example:

  1. <figure>
  2. <legend>Credit: Media Inc., 2007</legend>
  3. <img src="smith.jpg" alt="Photo: J. Smith" />
  4. </figure>

This is very cool!

m Element

The m element represents a text marked or highlighted. This is quite a useful feature to have when Web pages are created dynamically in response to a keyword search, and the keyword can be identified in the page, using the m element. For example, in response to a user search on the keyword "snow", a Web page can be generated with content modified like this:

  1. <p>A <m>snow</m>man is a man-like sculpture constructed out of <m>snow</m>.</p>

This is cool!

Enhancements To input Element

The input element is enhanced to support email, url, date-related, time-related, and numeric data types. This means more validation can occur on the client instead of on the server.

Cool!

Open Process

The development process for X/HTML 5 is more open than for XHTML 2. Everyone is welcome to participate on the X/HTML 5 mailing list.

Open processes are cool!

What's Uncool About X/HTML 5

Implementation Of Sectioning Elements

The idea behind sectioning elements is great, how X/HTML 5 implements it is cumbersome. Some of the explanations leave you even more confused. For example:

The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography.

Wouldn't a div element with a role attribute be more extensible and easier to implement?

Another sectioning element being proposed is nav, representing a section of a page that links to other pages. Do we really need a nav element? The nl construct from XHTML 2 can do this better.

The implementation of sectioning elements is uncool and should be improved.

HTML 4 And XHTML 1 Faults Are Perpetuated Into A Future Spec

Because X/HTML 5 attempts to be backwards compatible, many of the faults of HTML 4 and XHTML 1 will be perpetuated into X/HTML 5. Specs don't need to be backwards compatible. Instead, the better solution is that user-agents should be backwards compatible, by supporting multiple specs.

Continuing support for HTML 4 and XHTML 1 faults such as numbered headings, the i, b, small, iframe and font elements is not cool at all!

X/HTML 5 Does Not Comply With The X/HTML 5 Charter

X/HTML 5 aims to be backwards compatible to HTML 4 and XHTML 1. Yet elements such as big, acronym, u and tt don't seem to be part of the spec, while other elements like i and small have had their semantics re-defined. For example, the HTML 4.01 spec defines i and small like this:

i: Renders as italic text style.

small: Renders text in a "small" font.

In X/HTML 5, i and small have new meanings:

The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized.

The small element represents small print (part of a document often describing legal restrictions, such as copyrights or other disadvantages), or other side comments.

By redefining what i and small elements mean, this breaks backwards compatibility with HTML 4 and XHTML 1. This is because backwards compatibility means that an HTML 5 user agent must interpret an HTML 4 document in the same way that an HTML 4 user agent would. So if HTML 5 is to claim backwards compatibly, a construct that was meaningless in HTML 4 should also be meaningless in HTML 5.

Not staying true to its own charter objectives is uncool!

What, The font Element Is Supported?

Yes, X/HTML 5 supports the font element if content is authored using a WYSIWYG editor. What is the rationale for this? Why would WYSIWYG editors get an exemption?

This is so uncool!

WYSIWYG Signature

Documents created by WYSIWYG editors must include the following WYSIWYG signature in the head element:

  1. <meta name="generator" content="(WYSIWYG editor)" />

or

  1. <meta name="generator" content="Sample Editor 1.0 (WYSIWYG editor)" />

What is the reasoning for this? Is this some kind of mark of shame? Is this supposed to signify to user-agents that they should expect bad markup to follow because the markup has been generated by a WYSIWYG editor? And what if only a part of the document was created by a WYSIWYG editor?

This is baffling and way too uncool!

Support For Predefined Class Names

Predefined class names are CSS class names that are reserved and may have semantic meaning to X/HTML 5 user agents. The class name "copyright" is a predefined class name in the following example:

  1. <p class="copyright>...</p>

Some other predefined class names are "error", "example", "issue", "note", "search" and "warning". To complicate matters, some predefined class names can be used with some elements but not with others. For example, the class name "copyright" can only be used with p and span. The class name "error" can only be used with p, section, span and strong.

One problem with predefined class names is that this means nothing:

  1. <p class="important">

...while this is suppose to mean something:

  1. <p class="copyright">

Overloading the class attribute makes it very difficult to interpret the meaning of the construct. For example, what does this mean:

  1. <p class="important copyright issue">

Predefined class names also limit authors' ability to freely use class names. Also, what happens if an author uses a non-predefined class name now and at some later date that class name becomes predefined? Will this change the meaning of the content that the author previously authored?

Since XHTML 2 has a much better solution in its role attribute, predefined class names in X/HTML 5 are very uncool!

HTML 5 Versus XHTML 5

In an attempt to finally resolve the HTML versus XHTML debate, the X/HTML 5 spec actually makes the issue more complex to understand. Indeed, the X/HTML 5 spec actually says "generally speaking, authors are discouraged from trying to use XML on the Web", even though W3C continues to herald XML as the future of the Web? This is exceptionally confusing and exceptionally uncool!

A Too Hasty Process

X/HTML 5 is a reaction to the slow progress made by W3C in delivering a replacement to HTML 4 and XHTML 1. As a result, the process of developing X/HTML 5 seems rushed, and many feel the spec came out of nowhere and is being fast tracked. Even some of the stakeholders directly involved feel the timelines and milestones for developing the spec are completely unrealistic.

Competition To Be The Next Markup Language

Both X/HTML 5 and XHTML 2 are competing to replace HTML 4 and XHTML 1. Even at this early stage of development, some browser vendors have already stated their preference for one spec over the other. As a result of the haste and closed nature of deliberations, this issue is starting to polarize the Web standards community. As the two specs progress, more development and marketing dollars will be invested into one spec than the other, and all the ingredients are in place for a standards war.

Since every one of us is a stakeholder in this process because the Web belongs to everyone, only an honest and open debate can ensure that the best spec emerges as the winner.

Notes

  • For readability, "HTML 4.x/XHTML 1.x" has been shorted to read "HTML 4 and XHTML 1".

Further Reading