Saturday, July 4, 2009

W3C, you ignorant slut!

[With apologies to Jane Curtin]

You know.... some people just don't get it. Most likely, most of the time, I am one of those people. This week, I get a pass. Because the management at the W3C have taken the cake, as it were. They have grabbed up all the "just don't get it" supply there is. The rest of us, for this week, can do whatever we want and still be as right as rain.

What did they do to achieve this? Well... They lost sight of their goals. They basically forgot that there was a plan that was going to take the web from HTML 3.2 to extensible grammars and follow-your-nose semantic magic. They forgot that there was a path to a web that was not just connected, but also accessible and meaningful. In a word, they gave in to the seductive siren call of HTML5.

(Disclaimer. I have been involved in the HTML and XHTML activities at the W3C since 1996. I am the lead editor for most of the XHTML specifications, and I have great passion for the X in XHTML - extensibility.)

What's wrong with HTML5? Nothing. Everything. Parts. Depends on what problem you are trying to solve. IMHO, at its core, HTML5 is just a really, really bad idea. The primary design principle for this language is "codify everything in use on the net, everywhere, no matter how broken, as long as Hixie has seen it at least once and thinks it is useful". How can that possibly be helpful (to anyone other than Hixie or Google)? I mean, sure... if you were writing a guide for the next browser manufacturer to come in and create a new browser that would be able to handle every broken web page on the planet, this would be a useful tool. But that's not a standard. That's an implementors guide. There are between 5 and 15 actual user agent implementors in the world. There are millions of web content authors. How is it that the 15 (I'm feeling magnaminous) are more important than the millions? I don't know. Let's ask TimBL - father of the web and master of all things W3C.

Oh. Wait. We can't. Tim recently got a promotion. Someone coughed up a bunch of money so he could form the World Wide Web Foundation (ironically, an organization that can't put up a web page that is valid!). He's off playing in a new sandbox. But I'm sure he hasn't forgotten us. After all, at the W3C the Director has absolute authority. Nothing can start or finish in the W3C without his approval. Oh Tim! Where are you when we need you most?

Or do we need you? It was under your leadership that this whole mess got started. It was you who decided to irreparably damage the brand(s) of the W3C by ceding control of the web to the WHATWG. What were you thinking? I assume you were under pressure from the browser vendors. I assume those 4 out of your ~400 members were saying "hey, we don't want to implement XML-based semantic web. It's haaaaaard (insert whine here)".

Well, guess what? It is hard. So What(WG)? The W3C has a clear mandate from its members. From its advisory committee. And that mandate is spelled out pretty well in the W3C's mission statement: "To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web." In what way is locking the web into a browser-developer controlled, non-extensible, non-XML language "ensur(ing) long-term growth for the Web?"

In my opinion, it's not. Instead, it is shackling the web content developers (like me) into the tag-soup architecture of the 90's. There is nothing about HTML5 that represents long-term growth. Nothing that represents industry consensus about how the structure of web content should mature so that it is accessible to the handicapped. Nothing that makes it easier to markup content with its semantics in an extensible way. Nothing that allows the use of long-agreed upon W3C Recommendations.

Actually, that's the saddest part of this whole story. The W3C is an organization that has spent many years developing "Recommendations" (read "standards" when I say that) that support its core architecture. XML, XML Namespaces, XHTML, MathML, SVG, RDF, OWL, etc. All of these are designed to work together to support the long term vision of the organization - one that promotes dynamic extensibility of the "web" by different groups at different times. The HTML5 activity ignores this fundamental guiding principle of the W3C. Instead, the HTML5 activity seems to believe that if it isn't written down in their specification, it doesn't exist. And if it is was written down elsewhere, but not in a way that is absolutely perfect according to the arbitrary and capricious rules of the HTML5 editor, then it needs to be re-written, solidified, and while they are at it changed in ways the original authors never intended (see their redefinition of what a URL is or their relegation of the definition of rel attribute values to the WHATWG). Or worse yet, replaced completely by something competing and incompatible (e.g., RDFa vs the much maligned microdata).

So, I was wrong. We do need TimBL, or someone in the W3C management to stand up and say "bullshit! This is wrong. The work that is going on in the HTML5 activity is inconsistent with the W3C goals for the web." The web community needs leadership with vision, not blinders. It needs an eye toward the future, not a detailed, Hubble-esque view of the distant past. Oh Sir Tim, where are you when we need you most?

29 comments:

Herko said...

thanks for giving us your side of the coin! This helps getting a clear picture of what is actually happening, and what the consequences are!

Joseph Karr said...

Bravo! Well written. Not the quickie hack job that I did at http://tinyurl.com/l25mmm but one that lovingly details where the process went wrong and just how far afield the process is right now.

dorian taylor said...

Thank you.

Jeff Schiller said...

For distributed extensibility, what's wrong with XHTML5?

Hixie said...

For what it's worth, while I obviously do support the HTML5 work, I've also been supportive of the XHTML2 work and was quite surprised to hear the announcement. I think it's a shame that the XHTML2WG isn't going to be renewed, and I wish that XHTML2 would be given more resources, not fewer.

I would encourage you to consider the same kind of approach that HTML5 took originally when the W3C said "no" to HTML5 — start an external group, get the momentum behind it, form a community to support the work, and demonstrate to the W3C that it can work. That's how we got the W3C to support HTML5 after they initially said it was a bad idea.

halindrome said...

Jeff,

(X)HTML5 does not really permit the extensibility that is at the heart of XML-based languages. Instead, it is just a serialization of the markup that is permitted to be in HTML5. Sure, the XML serialization allows you to specify XML namespaces, but there is no concept of having the incorporation of other grammars in other namespaces become part of the environment. There are two namespaces that are special-cased today - MathML and SVG. Beyond that, nothing is supported. At least, as far as I know. I would be giddy to be proven wrong.

halindrome said...

Hixie,

Thanks for your comments. While I am confident there is support in the community for a modular, extensible grammar that is part of the XML tool chain, it is really too early to say if such an activity should or even could happen outside of the W3C.

The W3C owns the trademark XHTML and the copyrights to the documents. It is expressly against the W3C licensing rules to create new versions of their specifications (e.g., HTML5 was a violation of the license before it was moved into the fold.)

If there is sufficient support, and there is a grant of rights from the W3C, then sure - there could be an external activity. And I would be there in spades. I'm a true believer.

TallTed said...

Well put. One quibble.

Your closing line threw me off a bit, and may well do so to others. Perhaps you'll adjust it?

The full quote -- "O Romeo, Romeo! wherefore art thou Romeo?
Deny thy father and refuse thy name;
Or, if thou wilt not, be but sworn my love,
And I'll no longer be a Capulet." -- may help reveal that "wherefore" doesn't mean "where," but "why" (and less about "Romeo" than about "Montague"...).

halindrome said...
This comment has been removed by the author.
halindrome said...

TallTed,

A colleague pointed this out to me... and I considered editing it. Actually, I think I will edit it - so all credit to you and Ian. Thanks!

Oh, and apologies to Mr. Nitardy, my 11th grade Shakespeare teacher. I can't believe I messed this one up! Its only been 30 years. He often said to me "You should always speak less than you know. In your case, silence!"

Jeff Schiller said...

SVG and MathML are special-cased to allow the text/html serialization to also use them.

Since XHTML5 is XML with Namespaces, to my mind there's nothing wrong with coming up with a grammar of new elements that can be used with XHTML5.

I've always been of the mind that it's the responsibility of the extending language to fully define the integration. Frankly, I fail to see how you could specify extensibility otherwise.

Maybe you could give an example of how you would like to see "the incorporation of other grammars in other namespaces" specified that doesn't involve another spec making those details clear?

And what do you mean by "environment"? Is that the DOM?

Hixie said...

XHTML5 permits the same XML-based extensibility that is permitted in XHTML1.x.

Regarding the licensing, with HTML5 when it was outside the W3C we sidestepped the issue by not calling it "HTML" (it was "Web Applications 1.0") and not reusing any of the text from HTML4 (it's all new text).

I see that that would be harder with XHTML2. I would definitely ask the W3C if they're willing to license the text. With HTML5, we're trying to get the W3C to accept licensing the spec under the MIT license, precisely so that if this ever happens we don't get blocked (amongst other reasons).

halindrome said...

Jeff,

Absolutely it is the responsibility of the other spec to make the rules of the incorporation clear - including possibly where in the content model the new elements could be inserted. See the W3C ITS Recommendation for a good example of how this has been done (http://www.w3.org/TR/its/).

I disagree with your conclusion that in XHTML5 there is some support for just integrating additional elements from other namespaces. While it might be permissible to do so, such content would be meaningless if the user agents that accept the content don't know how to interpret it. To apply CSS rules, for example, or to apply the behavior of attributes from the default namespace.

In XHTML M12N, XHTML 1.1, and XHTML 2, for example, there were core "global" attributes that have well known behavior. A conforming user agent would apply this behavior when encountering these attributes regardless of the progeny of the element they are on. So I could introduce an element shane:foo and put an @href on it and the user agent would know that the contents of that element are a "clickable" link. In theory, anyway.

As to environment, I really meant the collection of behavior and presentation and semantics that the user agent knows about. The guiding principle behind the semweb is that it is possible to follow-your-nose to get from a term to its underlying meaning; and that meaning would be expressed in a machine readable way.

As you introduce new structures into a document you are authoring via markup language extensions, you need some way for the interpreter of that document to *grok* the extensions. If those extensions are semantic, you have RDFa (for example). If they are presentational, you have CSS. If they are behavioral, you have XLink, XPath, XInclude, and XHTML global attributes. It is a rich collection of tools from which you can convey meaning. That was the promise of XML, XHTML, and the semweb in general. I do not believe that HTML5 / XHTML5 will deliver on this at all.

Again, I would be really pleased to be proven wrong. In the interim, I will continue to work with the tools I have that I know work already to achieve my markup extension goals: XHTML M12N, CSS, and assistive technology such as Uniquity to tie it all together.

halindrome said...

Hixie,

XHTML 1.* languages themselves do not permit extensibility except that they are layered atop XHTML M12N. The M12N architecture and rules are what defines the ways in which extensions cen be compatibly authored and plugged into *new* XHTML Family markup languages. I do not believe that XHTML5 has any similar capability.

Moreover, XHTML M12N, the XHTML Metainformation Module, and CURIEs provide a clean architecture of extensibility for the semantics of documents. While the microdata section of of the current HTML5 draft goes some way toward supporting similar extensibility, it does not do so in a way that is as consistent with the direction the (RDF, semweb) community is going.

So no, I do not agree that XHTML5 has the same extension capabilities as the XHTML Family has today and would have had going forward. However, I would love to be proven wrong.

(Being wrong used to bother me. After years of marriage however, I got used to it.)

Hixie said...

Yeah, XHTML5 doesn't have the M12N-level of extension mechanisms; only basic XML namespaces support.

Dan Morrill said...

"Moreover, XHTML M12N, the XHTML Metainformation Module, and CURIEs provide a clean architecture of extensibility for the semantics of documents."

That sentence is why XHTML failed.

Are you seriously kidding me? As a web developer, I know what HTML is, and I can very easily grok what HTML5 is. Apparently, to grasp XHTML I need to be able to expand M12N, intuit what a Metainformation Module is, and divine via dark eldritch techniques the meaning of CURIEs.

MyID.config.php said...

Dan Morrill:

"Are you seriously kidding me? As a web developer, I know what HTML is, and I can very easily grok what HTML5 is. Apparently, to grasp XHTML I need to be able to expand M12N, intuit what a Metainformation Module is, and divine via dark eldritch techniques the meaning of CURIEs."

Shane was not addressing that comment to the average web developer. He was addressing that comment to the only author, and therefore leading architect of the next generation of HTML, who damn well better know what Shane is talking about.

You need to read comments within context.

Shelley said...

Sorry, last comment was from me. Google doesn't play well with OpenID.

jdeisenberg said...

Thank you; I'm 100% in agreement with you on this. My biggest concern with HTML 5 is that it does a wonderful job of describing how the web is today. With the exception of <canvas>, HTML 5 is, to a large extent, backwards-looking; XML is forwards-looking.

The lack of a formal schema ("The WHATWG does not prescribe an implementation strategy for conformance checkers and does not endorse schema languages." link ) is also very bothersome.

jax said...

(Disclaimer. I have been involved in the HTML and XHTML activities at the W3C since 1996. I am the lead editor for most of the XHTML specifications, and I have great passion for the X in XHTML - extensibility.)
Disclosure. You've been the editor who has done most of the actual working instead of just talking, at least during my stint.

It may be a reasonable complaint that HTML5 isn't orthogonal enough, that it tries to do everything. But playing the number's game is disingenious:

There are between 5 and 15 actual user agent implementors in the world. There are millions of web content authors. How is it that the 15 (I'm feeling magnaminous) are more important than the millions? ... It was you who decided to irreparably damage the brand(s) of the W3C by ceding control of the web to the WHATWG. What were you thinking? I assume you were under pressure from the browser vendors. I assume those 4 out of your ~400 members were saying "hey, we don't want to implement XML-based semantic web.

Neither the browser vendors nor the W3C control the Web, never had and never will. But I think the browser vendors and the community around HTML5 are far more representative of the Web at large than the half-dozen in the renamed HTML Working Group and the few thousands (I'm feeling magnaminous) that care about XML-based semantic web. In the choice between the semantic web and having borders with rounded corners, I am in no doubt what choice most would make. The latter could possibly solve their problem, the former could not.

That I think is why (X)HTML5 is succeeding and XHTML2 is not. HTML5 solves existing problems, XHTML2 was looking for problems to solve.

Philosophically I think the HTML group made a grievous error. You too bought into the W3C group think that HTML was bad, a tag soup to be, if not solved, then circumvented. The W3C was war-scarred from the browser wars, but that prevented you from seeing the runaway success that HTML has been and still is. This format like no other in history has taken over the world. While I too have had my Purity of Essence moments, the problems with HTML are trivial compared with its potential.

The serialisation of HTML I see as a lesser issue, one or the other will likely win out in the long run, both have advantages and disadvantages.

HTML5 does bypass modularization. While I would see the benefits of cutting the spec into pieces, my phone for one runs out of batteries trying to download the thing, a brutal question is: What has XHTML M12N achieved? If (X)HTML5 could be modularised what would the real benefits be?

halindrome said...

jax,

Thanks for your comments. You asked if there would be a benefit to modularizing HTML5. It isn't modularization per se that is of benefit. It is the mindset that it implies. One in which the language has extension points and the language interpreters are required to allow those extensions.

If HTML5 was designed so that its content model were dynamically extensible the advantage would be that other groups, like the RDFa task force or the WAI-ARIA group, would have a clear mechanism for adding their semantic markup to the environment. As it stands, in order to make changes like that to HTML5 and you need to get them incorporated into the bloated monstrosity that already kills the battery on your phone. Surely that is not a long term recipe for success.

porneL said...

"locking the web into a browser-developer controlled, non-extensible, non-XML language "ensur(ing) long-term growth for the Web?"

HTML 5 defines XML serialisation, makes text/html support optional, and makes it easier to migrate from text/html to XML (polyglot documents, DOM consistency principle).

Authors are free to use XML (those HTML5-controlling vendors have imlemented it).

halindrome said...

porneL,

Actually, no - the browser vendors have not implemented the generic XML model that is needed. While some user agents have native XML parsers, it is not possible to author documents in HTML5 as defined today that are isomorphic between the HTML serialization and the XML serialization.

As to "polyglot" - again, I have to disagree. There is nothing in the current HTML5 draft that permits the use of rich compound documents that bring in multiple namespaces. There is also nothing that makes it possible to define extensions to the language in either of its forms. If there were, we wouldn't be having this conversation.

XHTML M12N permits these things and more. That extensibility is at the core of XHTML and the XHTML Family of modules and markup languages. XHTML conforming processors will happily adapt to new XHTML family markup languages, so I can sit in my little walled garden and add elements to my private markup language and it will still. just. work. That's the whole point. And HTML5 misses it. Completely.

Lachlan Hunt said...

halindrome wrote:
"As to "polyglot" - again, I have to disagree. There is nothing in the current HTML5 draft that permits the use of rich compound documents that bring in multiple namespaces."

I think you've misunderstood the term polyglot document as it has come to be used in relation to (X)HTML5. It doesn't refer to the "use of rich compound documents that bring in multiple namespaces".

It refers to a document that syntactically conforms with the syntax requirements for both the HTML and XHTML serialisations. Conceptually, it's similar to writing an XHTML 1.0 document that complys with the Appendix C guidelines, except that the additional syntax required for well-formed XHTML, that otherwise isn't essential for HTML (like xmlns attributes, trailing slashes, etc.) are considered conforming, though meaningless, in the HTML syntax.

halindrome said...

Lachlan,

Thanks for the clarification. You're right, I was unaware of that term. Interesting use of the word.

In that case, I think it is great that HTML5 is codifying support for XML-like syntax even in documents delivered as "text/html". Of course, since this has worked in user agents since XHTML 1.0 was first published...

As to "Appendix C" - a reminder that Appendix C has been updated in the recently refreshed note on XHTML Media Types. We have received some comments on that note, and plan on updating it again in the near future.

Larry said...

Please do not start a separate group, as Hixie has advocated.

Having two separate groups working on HTML, and taking it in different directions, is not benign, it is harmful. It was harmful when WhatWG started, and it would be harmful now. It was a shame that the browser vendor W3C members who started WhatWG chose -- instead of working within W3C process -- to instead form a separate group.

And splintering a separate "XHTML" group at this point would not be useful, it would be disruptive.

I see W3C management as having made the best they could out of a bad situation, and hope that the greater community can find a way of working together. The only viable path we have now is to start with the current state of HTML5.

The requirements that led to chosing well-formed XML and XHTML modularization were not well understood, and the needs of the communities those directions served not given adequate weight because of that.

I would like to work to bring more order to the chaos, focusing on actually reviewing the technical content without polemics, and establishing a path toward restricting authoring behavior in a way that will produce (X)HTML that is useful in broader contexts and workflows. It does require some cooperation of browser vendors to tie "new features" to "better content", and their understanding of the long-term value of resisting the tyranny of legacy content.

halindrome said...

Larry,

I agree with your sentiments. I hope that the XHTML-as-part-of-the-XML-toolchain folks can find a way to work within the W3C going forward. It is certainly my preferred venue.

Regardless of where we work, I think that it would be great if you would come and help. I think it would be great if everyone who cares would come and help. Without community involvement, its just a bunch of academics and self-interested vendors in a circle-jerk.

And to all of you who choose NOT to get involved... its very hard to complain later if you didn't at least try to fix it in the first place!

jax said...

The extension points of HTML is an interesting and important issue. I wouldn't say that HTML5 as current seems perfect. For instance your example of it locking down the link types doesn't seem warranted.

This isn't an architectural weakness, but a design decision that can be reevaluated, at least until CR time.

I made my first stab at discussing the extensibility of HTML, but it became apparent that I will need to make more stabs and probably some wild swings as well.

It would be useful to compare the current extensibility mechanisms of XHTML2 and HTML5. It isn't certain that HTML5 comes out the loser.

suit said...

just created a "Stop HTML 5" facebook group - feel free to join:
http://www.facebook.com/group.php?gid=172453693668