Showing posts with label W3C. Show all posts
Showing posts with label W3C. Show all posts

Wednesday, July 17, 2013

RDFa is (still) the best way to improve your SEO

The web is getting better and better.  Search is getting smarter.  Google's 'rich snippets' and Facebook's Open Graph Protocol are great examples of how the industry is helping web developers make their content more machine-readable.  Industry-standard 'vocabularies' like FOAF, Dublin Core Terms, Good Relations, and Schema.org are helping developers ensure that their content just makes sense. And all of those technologies are based largely upon, and designed to support other technologies that use, RDF.

What is RDF?  RDF stands for "Resource Description Framework", and an excellent overview of it is in the RDF Primer the W3C has published.  If you don't want to dig that deeply, just know that RDF is a method for describing in a very standard way the relationship between x and y.  Moreover, it can also describe the relationship between y and z.  And so on and so on, until you get all the way down to some well-known, standardized 'term' that has absolute meaning to something like the Google search engine.

How is this useful?  Well, it means in part that I can say something like "Shane McCarron is the author of this article".  And, because the concept of an author is a well-known, standardized 'term' that has an absolute meaning, and because this article has a permanent URI, anything that can understand RDF will immediately *know* this.  Further, if I also say "Shane McCarron" is identified by some other URI (e.g., http://blog.halindrome.com) RDF-aware processors will automatically associate the author with that URI.

That's all great, I hear you saying.  But how do I, as a web developer, *tell* Google things about my web pages?  Enter RDFa.  RDFa stands for "RDF in attributes" (some people might argue about that, but I was in the room and that's what I remember).  It is a way to embed RDF information right into your web page in a pretty straightforward way.  And, if you do it right, it means that search engines like Google and Facebook will know *more* about your web site and its information than they will know about your less-savvy competitors who don't bother to put this data in.

There are a lot of great resources out on the web that can help you get your annotations right.  I have included a list of some at the end of this article.  However, just to get you started, here is a simple example.

Let's say you have a picture you want to include on your web site.  You want people to be able to use that picture, but you want them to know what it is a picture of and who took it.  You could say something like:

<span about="#myPicture">
<img id="myPicture" 
     alt="Cabin on the lake"
     src="http://www.example.com/images/picture.png">
<span property="dc:title">Cabin on the lake</span> by
<span property="dc:creator">Shane McCarron</span>
</span>

In that example we use terms from the "Dublin Core" vocabulary.  These are well-known terms.  Any knowledge engine will know exactly what you are talking about.  If you wanted to be more explicit about who the creator was, you might extend that definition like:

<span about="#myPicture">
<img id="myPicture"
     alt="Cabin on the lake"
     src="http://www.example.com/images/picture.png">
<span property="dc:title">Cabin on the lake</span> by
<span rel="dc:creator"
      typeof="foaf:Person">
   <span rel="foaf:homepage"
         href="http://blog.halindrome.com"
         property="foaf:name">Shane McCarron</span>
</span>
</span>

Now we have also mixed in some terms from the "Friend of a Friend" vocabulary to say that the dc:creator we talked about before is a "Person" who has a "homepage" and a "name".  These are also well-known terms and will help the knowledge engines make better inferences about who "Shane McCarron" is.  If there are other things out there for which a 'foaf:Person' with those attributes is listed as the 'dc:creator', they were likely created by the same "Shane McCarron".  

There are LOTS of ways to use RDFa in your web pages and articles.  I plan to write more on this in the coming weeks.  For now, here are some resources that can help you get started.


Next up: How the W3C is using RDFa to help ensure its own documents are well annotated.


RDFa is (still) the best way to improve your SEO by Shane McCarron

Monday, July 13, 2009

I've still got the greatest enthusiasm and confidence in the mission

(part 2 in my continuing exploration of what went wrong with XHTML at the W3C)

Okay... it's a week later, and I have a little distance from the original event. The XHTML 2 Working Group had its regular meeting on Wednesday, just as we have for the last many, many years. At that meeting, we continued to make progress on resolving issues so that we can update some of the existing Recommendations and move other documents to Note status (see our Drafts page for what is being worked on).

While we were doing that, we of course were whining a little about the announcement. Mostly because the working group was not really consulted nor informed. Everyone in the group had learned about it from the press, not from the W3C. Yet another example of the masterful mismanagement. Don't get me wrong - we all had heard inklings, but there had been no decision made that we knew about. The FAQ that was published was produced without consulting the working group either. So basically we learned about our future work by reading that document too. Unbelievable.

Despite these (typical) events, I've still got the greatest enthusiasm and confidence in the mission (Dave). No, seriously. I do. The W3C is the worst form of standards production except all the others that have been tried (apologies to Winston Churchill). The model on which the W3C is built is one that makes sense if applied correctly. Get motivated, funded professionals who are experts in their field together and ask them to achieve consensus on the codification of some technology. Then show their work to a broader collection of experts and ensure it makes sense and integrates with the overall architecture. Assuming it does, call it a "standard" and get people to support and use it.

This model is simple and clean. It probably even works when there are relatively few of these groups of experts working on a cohesive set of deliverables (The Open Group is a great example of where this has succeeded by keeping the focus tight). Where it seems to fall down is when the keepers of the architecture lose control. At its outset, the keeper of the W3C vision was TimBL. Sure, he had minions to do his bidding, but in general they were mouthpieces for Tim. As the work of the W3C got more and more complex, the responsibility for the vision was passed on to various groups who were charged with maintaining their bit.

I desperately want to believe there was a long term strategy for the web motivating all the work the W3C's Advisory Committee, Technical Architecture Group, Advisory Board, HTML Coordination Group, etc. were chartering all these years. But I think that by pushing the responsibility further and further down the stack and at the same time getting distracted by other external activities like the WHATWG and the new World Wide Web Foundation, that strategy got miscommunicated or diluted or just lost. In any event, we now have a serious problem.

What's the problem? The organization with the primary responsibility for taking the web forward has two competing sets of activities. There's the browser-centric work - this includes HTML5, CSS, and the Rich Web Client Activity (HTML DOM stuff, Widgets, XMLHTTPRequest etc.). Then there's the web-centric work - this includes XML, XPath, Xinclude, XML Schema, RDF, OWL, etc. And while these sets of activities could be designed to dovetail together, the browser-centric work seems to be ignoring the rest of the work.

I have seen some people argue that the W3C's focus on the semantic web and the XML tool chain has neglected even the most basic maintenance of its (wildly successful) previous deliverable - HTML. I think I can safely say that this is true. Moreover, I am one of the people who helped make it true. The (former) HTML Working Group had responsibility for maintaining HTML 4, and we elected not to update it. It was too much work, and we were focusing upon XHTML, XHTML M12N, XForms, XML Events, etc. We had some members who volunteered to help process incoming comments on HTML 4 and produce errata, but in the end it never seemed to happen. So yeah, I and the rest of the (former) HTML Working Group are culpable.

Thank goodness the Google and the browser vendors came to our rescue! (yes, that was sarcasm). Now we have swung completely the other direction. Rather than focusing upon the future, we are clarifying the past. Oh, and while we are at it, introducing new untested concepts into the specification, sometimes despite there being standard alternatives already deployed. Does this bother anyone else? Surely, just as it is a mistake to lose sight of the past, it is a mistake to forget about the (envisioned, architected, long planned for) future?

HTML5 is here to stay. I get that. But that doesn't mean we have to continue to repeat our mistakes. Ignoring the HTML4 specification was a mistake. Ignoring the XHTML specification(s) is also a mistake. Pretending that the "XHTML5" part of HTML5 somehow continues the evolution of XHTML as part of the XML toolchain is a gigantic mistake. HTML5 has no extensibility model. It has no way to incorporate other public or private grammars into the content model. It has no model to define and connect RDF grammars that would expand the semantics of the language. It has no behavioral rules that describe how user agents must behave that will permit this extensibility going forward.

Right now, today, we need to find the strength to say "no! This is not good enough!". We need to ensure that the extensibility that is the cornerstone of the W3C's efforts to define the future of the web is not removed. Because we all know what happens when you remove a cornerstone, right?

Saturday, July 4, 2009

W3C, you ignorant slut!

[With apologies to Jane Curtin]

You know.... some people just don't get it. Most likely, most of the time, I am one of those people. This week, I get a pass. Because the management at the W3C have taken the cake, as it were. They have grabbed up all the "just don't get it" supply there is. The rest of us, for this week, can do whatever we want and still be as right as rain.

What did they do to achieve this? Well... They lost sight of their goals. They basically forgot that there was a plan that was going to take the web from HTML 3.2 to extensible grammars and follow-your-nose semantic magic. They forgot that there was a path to a web that was not just connected, but also accessible and meaningful. In a word, they gave in to the seductive siren call of HTML5.

(Disclaimer. I have been involved in the HTML and XHTML activities at the W3C since 1996. I am the lead editor for most of the XHTML specifications, and I have great passion for the X in XHTML - extensibility.)

What's wrong with HTML5? Nothing. Everything. Parts. Depends on what problem you are trying to solve. IMHO, at its core, HTML5 is just a really, really bad idea. The primary design principle for this language is "codify everything in use on the net, everywhere, no matter how broken, as long as Hixie has seen it at least once and thinks it is useful". How can that possibly be helpful (to anyone other than Hixie or Google)? I mean, sure... if you were writing a guide for the next browser manufacturer to come in and create a new browser that would be able to handle every broken web page on the planet, this would be a useful tool. But that's not a standard. That's an implementors guide. There are between 5 and 15 actual user agent implementors in the world. There are millions of web content authors. How is it that the 15 (I'm feeling magnaminous) are more important than the millions? I don't know. Let's ask TimBL - father of the web and master of all things W3C.

Oh. Wait. We can't. Tim recently got a promotion. Someone coughed up a bunch of money so he could form the World Wide Web Foundation (ironically, an organization that can't put up a web page that is valid!). He's off playing in a new sandbox. But I'm sure he hasn't forgotten us. After all, at the W3C the Director has absolute authority. Nothing can start or finish in the W3C without his approval. Oh Tim! Where are you when we need you most?

Or do we need you? It was under your leadership that this whole mess got started. It was you who decided to irreparably damage the brand(s) of the W3C by ceding control of the web to the WHATWG. What were you thinking? I assume you were under pressure from the browser vendors. I assume those 4 out of your ~400 members were saying "hey, we don't want to implement XML-based semantic web. It's haaaaaard (insert whine here)".

Well, guess what? It is hard. So What(WG)? The W3C has a clear mandate from its members. From its advisory committee. And that mandate is spelled out pretty well in the W3C's mission statement: "To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web." In what way is locking the web into a browser-developer controlled, non-extensible, non-XML language "ensur(ing) long-term growth for the Web?"

In my opinion, it's not. Instead, it is shackling the web content developers (like me) into the tag-soup architecture of the 90's. There is nothing about HTML5 that represents long-term growth. Nothing that represents industry consensus about how the structure of web content should mature so that it is accessible to the handicapped. Nothing that makes it easier to markup content with its semantics in an extensible way. Nothing that allows the use of long-agreed upon W3C Recommendations.

Actually, that's the saddest part of this whole story. The W3C is an organization that has spent many years developing "Recommendations" (read "standards" when I say that) that support its core architecture. XML, XML Namespaces, XHTML, MathML, SVG, RDF, OWL, etc. All of these are designed to work together to support the long term vision of the organization - one that promotes dynamic extensibility of the "web" by different groups at different times. The HTML5 activity ignores this fundamental guiding principle of the W3C. Instead, the HTML5 activity seems to believe that if it isn't written down in their specification, it doesn't exist. And if it is was written down elsewhere, but not in a way that is absolutely perfect according to the arbitrary and capricious rules of the HTML5 editor, then it needs to be re-written, solidified, and while they are at it changed in ways the original authors never intended (see their redefinition of what a URL is or their relegation of the definition of rel attribute values to the WHATWG). Or worse yet, replaced completely by something competing and incompatible (e.g., RDFa vs the much maligned microdata).

So, I was wrong. We do need TimBL, or someone in the W3C management to stand up and say "bullshit! This is wrong. The work that is going on in the HTML5 activity is inconsistent with the W3C goals for the web." The web community needs leadership with vision, not blinders. It needs an eye toward the future, not a detailed, Hubble-esque view of the distant past. Oh Sir Tim, where are you when we need you most?