Wednesday, July 17, 2013

RDFa is (still) the best way to improve your SEO

The web is getting better and better.  Search is getting smarter.  Google's 'rich snippets' and Facebook's Open Graph Protocol are great examples of how the industry is helping web developers make their content more machine-readable.  Industry-standard 'vocabularies' like FOAF, Dublin Core Terms, Good Relations, and Schema.org are helping developers ensure that their content just makes sense. And all of those technologies are based largely upon, and designed to support other technologies that use, RDF.

What is RDF?  RDF stands for "Resource Description Framework", and an excellent overview of it is in the RDF Primer the W3C has published.  If you don't want to dig that deeply, just know that RDF is a method for describing in a very standard way the relationship between x and y.  Moreover, it can also describe the relationship between y and z.  And so on and so on, until you get all the way down to some well-known, standardized 'term' that has absolute meaning to something like the Google search engine.

How is this useful?  Well, it means in part that I can say something like "Shane McCarron is the author of this article".  And, because the concept of an author is a well-known, standardized 'term' that has an absolute meaning, and because this article has a permanent URI, anything that can understand RDF will immediately *know* this.  Further, if I also say "Shane McCarron" is identified by some other URI (e.g., http://blog.halindrome.com) RDF-aware processors will automatically associate the author with that URI.

That's all great, I hear you saying.  But how do I, as a web developer, *tell* Google things about my web pages?  Enter RDFa.  RDFa stands for "RDF in attributes" (some people might argue about that, but I was in the room and that's what I remember).  It is a way to embed RDF information right into your web page in a pretty straightforward way.  And, if you do it right, it means that search engines like Google and Facebook will know *more* about your web site and its information than they will know about your less-savvy competitors who don't bother to put this data in.

There are a lot of great resources out on the web that can help you get your annotations right.  I have included a list of some at the end of this article.  However, just to get you started, here is a simple example.

Let's say you have a picture you want to include on your web site.  You want people to be able to use that picture, but you want them to know what it is a picture of and who took it.  You could say something like:

<span about="#myPicture">
<img id="myPicture" 
     alt="Cabin on the lake"
     src="http://www.example.com/images/picture.png">
<span property="dc:title">Cabin on the lake</span> by
<span property="dc:creator">Shane McCarron</span>
</span>

In that example we use terms from the "Dublin Core" vocabulary.  These are well-known terms.  Any knowledge engine will know exactly what you are talking about.  If you wanted to be more explicit about who the creator was, you might extend that definition like:

<span about="#myPicture">
<img id="myPicture"
     alt="Cabin on the lake"
     src="http://www.example.com/images/picture.png">
<span property="dc:title">Cabin on the lake</span> by
<span rel="dc:creator"
      typeof="foaf:Person">
   <span rel="foaf:homepage"
         href="http://blog.halindrome.com"
         property="foaf:name">Shane McCarron</span>
</span>
</span>

Now we have also mixed in some terms from the "Friend of a Friend" vocabulary to say that the dc:creator we talked about before is a "Person" who has a "homepage" and a "name".  These are also well-known terms and will help the knowledge engines make better inferences about who "Shane McCarron" is.  If there are other things out there for which a 'foaf:Person' with those attributes is listed as the 'dc:creator', they were likely created by the same "Shane McCarron".  

There are LOTS of ways to use RDFa in your web pages and articles.  I plan to write more on this in the coming weeks.  For now, here are some resources that can help you get started.


Next up: How the W3C is using RDFa to help ensure its own documents are well annotated.


RDFa is (still) the best way to improve your SEO by Shane McCarron

Thursday, July 12, 2012

The Role Attribute Module - years in the making

Way back in the olden days when we were developing XHTML Modularization and then XHTML2, the group recognized that there were lots of pieces of XHTML2 that were generally useful and that it should be possible to publish them separately.  The concept was that since these were modules they would be assembled by host language authors as needed, and then at some point pulled together into XHTML2.  Obviously XHTML2 never came to be, but a lot of the pieces did get worked on and pulled into other work (e.g., RDFa, Role, all sorts of stuff in HTML5 although no one will ever admit that).


Today I wanted to talk about the Role Attribute.  This piece of XHTML2 is a deceptively simple attribute.  It's purpose is to help content authors label an element with it's role or roles within a document. Why do elements have roles?  Well  - that's complicated.  First, some (more) history...

Sometimes W3C working groups have face to face meetings.  At one such meeting of the XHTML working group (at the AOL headquarters in Virginia I think) a group member was wearing their liaison hat between the XHTML working group and the group responsible for accessibility.  They REALLY wanted to be able to have sections of the document labeled so that assistive technologies (ATs) could more easily help people with various disabilities use the web more efficiently.  In particular, things like navigation areas, headers, footers, content, sidebars...  and of course controls.  The accessibility liaison thought it was important that we define a base collection of roles, but also that it be possible to dynamically extend that collection of roles easily. The group thought this was a fine idea, and added the role attribute to XHTML2.  The base collection has been modified over time - you can see its current form in the Vocabulary Document.

Fast forward to today.  The W3C has today published the Role Attribute as a Candidate Recommendation.  A lot has happened in the W3C community since we started work on this simple attribute, but the basic form of the Role Attribute and its capabilities has not changed much at all.  There is an attribute named 'role'.  It takes a list of zero or more values.  These values are either pre-defined TERMs (the things in that vocabulary document), a URI, or a CURIE (a CURIE is a compact URI - a concept defined by RDFa Core).  Assistive Technologies can use the values of role, in conjunction with other information from the ARIA Attributes, to more-or-less automatically make pages more accessible. RDFa processors can use the information in role attributes to automatically learn more about the semantics of a page.


This specification is stable now.  The attribute works in all modern user agents now.  It's values are interpreted by many assistive technologies already.  It is already supported by some RDFa processors, with more on the way.  Even though the W3C says people shouldn't rely upon stuff in Candidate Recommendations because they might not be fully cooked yet, I say go for it.  Role works and it can't hurt.  RDFa works well, and is processed by popular search engines like Google and Bing.  Telling the browser (and any Assistive Technologies that might be looking at the browser) what the role of the various parts of your web page isn't just polite, it might help someone use your site more effectively!

Tuesday, January 31, 2012

W3C Publishes Last Call versions of RDFa Core 1.1 and XHTML+RDFa 1.1

Today the W3C published new versions of RDFa Core and XHTML+RDFa. These versions are the result of 10 months of work by the W3C RDF Web Applications Working Group, and are expected to be in their nearly-final form. You can see the full announcement at http://www.w3.org/blog/SW/2012/01/31/new-rdfa-drafts-published/

You have three weeks to look these over and raise comments.  Otherwise, forever hold your peace.  We look forward to your input!

Wednesday, January 4, 2012

W3C Publishes First Draft of Media Accessibility Requirements

The W3C's Protocols and Formats working group has been working hard to accumulate the requirements for media accessibility.  A few months ago I took over as editor of the document.  That document has now been released as a 'First Public Working Draft' to get feedback from the community.  From the abstract:

This document aggregates the accessibility requirements of users with disabilities that the W3C HTML5 Accessibility Task Force has collected with respect to audio and video on the Web.
It first provides an introduction to the needs of users with disabilties in relation to audio and video.
Then it explains what alternative content technologies have been developed to help such users gain access to the content of audio and video.
A third section explains how these content technologies fit in the larger picture of accessibility, both technically within a Web user agent and from a production process point of view.
This document is most explicitly not a collection of baseline user agent or authoring tool requirements. It is important to recognize that not all user agents (nor all authoring tools) will support all the features discussed in this document. Rather, this document attempts to supply a comprehensive collection of user requirements needed to support media accessibility in the context of HTML5. As such, it should be expected that this document will continue to develop for some time.

Please take a look.

Thursday, December 22, 2011

RDFa 1.1-related specs nearing last call

After ages of development work and negotiation, the W3C's RDF Web Applications Working Group is finally almost done with RDFa 1.1. This seemingly simple, incremental change to RDFa 1.0 has taken about a year longer than I expected it to, but the work is solid and has a lot of community support. We pushed new working drafts recently of RDFa Core 1.1, XHTML+RDFa 1.1, RDFa Lite 1.1, and the RDFa Primer. We are looking for feedback on these over the coming weeks, with a plan to progress to Last Call in mid-January.

P.S. Yes, I forgot that I had this blog I should be updating. My bad!

P.P.S. Hooray for the Oxford Comma!

Friday, December 17, 2010

Final XHTML2 Working Group documents published

After a couple of weeks of furious editing and publishing, the XHTML2 work is FINALLY done! You can see most of the gory details at the W3C. What that article does NOT say, however, is that the working group also completed its work on the RelaxNG implementations of XHTML and XHTML Modularization. You can find that work as a free-standing document.

Unfortunately, one thing that did NOT get updated is the XHTML Media Types note. There are various corrections to this that were pending, but in the end it was too difficult to get it published before the end of the year. In the near future, I plan to publish those changes independently (and maintain the document as guidance to people who want to write XHTML and have it just work in current user agents). You can see the latest draft of this document if you are interested.

Sunday, November 28, 2010

W3C XHTML2 Activity finally winding down

It's been a long road. I first started working with the W3C and it's XHTML2 (then HTML) Working Group on 27 August 1998. At the time, I didn't know that it would become the work of 12+ years. Sure, I had been involved in standards for a long time. I started with the IEEE POSIX activity before it was even called POSIX (anyone remember IEEEIX?) - sometime in 1985. That hobby spiraled into a career, and has served me very, very well.

So, it is with some regret that I look to the end of my work on XHTML at the W3C. We did a lot of good work. Some of that work has been overcome by events, of course. This industry never sits still. But, for the record, here are some of the important things this activity developed and delivered:
  • HTML 4.01 - last updated in 1999, but still the basis of most of the web.
  • XHTML 1.0 - the first baby step toward a well-formed, valid web. At its inception, we were all convinced that XML would rule the world, and HTML needed to be based upon XML if it was to survive. We were a little bit wrong.
  • XHTML Media Types - a Note that explained how to deliver the new XHTML documents to legacy user agents. Still relevant and widely used today. We have a small update for it that might still get published - you can see it here.
  • XHTML Modularization - a set of building blocks and rules that language designers could use. M12N is the basis for many activities within and outside of the W3C. It continues to be used all over the place. It started out being used only for XML DTDs, but was expanded to XML Schema (finally published recently, but complete for many years). In the next weeks we will release a final installment of this, XHTML Modularization for RelaxNG.
  • XML Events - a declarative way to define events and bind them to elements and observers. It's first version was published by the XHTML activity. It has now been taken over by the XForms activity. I hope that they will get XML Events 2 out the door at some point.
  • XHTML 1.1 - a tight, XML-centric version of XHTML based upon XHTML Modularization. We published an update to this last week, but it is a stable grammar that can be used anywhere. It is also the basis for many extended XHTML Grammars. The update last week makes it possible to validate using XML Schema, and also to use the 'lang' attribute to improve use of XHTML 1.1 documents by assistive technologies.
  • XHTML Basic - a version of XHTML targeted at the mobile community.
  • XHTML Print - a version of XHTML targeted at rendering consistently on printers.
  • XForms - an independent activity, but one that started within the XHTML activity.
  • XHTML+RDFa - another independent activity. Originally a joint task force of the XHTML2 and Semantic Web working groups. The latest version builds up this early work, but continues to take advantage of XHTML Modularization for its definitions.
  • CURIEs - a compact expression syntax for URIs. Used by RDFa, but also potentially by other specifications that need to readily reference resources without using long URIs in attributes.
  • The Role Attribute module - an independent module to add a role attribute. Useful for accessibility, but also for general semantic notation. This work has been taken on by the Protocols and Formats Working Group.
  • The Access module - an independent module to add an access element. This element would allow binding of 'keys' and events to elements. The original module has no owner, but the general work has been picked up by the Protocols and Formats Working Group.
  • XHTML 2 - a sweeping revision of XHTML. This work was never completed, but will be published in its current state as a Note.
  • XHTML Modularization 2.0 - an update to the modularization framework to accompany XHTML 2.
  • XFrames - an improvement on traditional HTML framesets.
  • XHTML 1.2 - a version of XHTML that added the role attribute, the access element, RDFa, and ARIA. This was never a formal deliverable, but was a logical extension of the work.
Wow. I have never typed that all out before! Along the way we developed an entire publication infrastructure, including our own internal markup language (xhtmlspec) for annotating sources. I am proud to have served with my colleagues on this activity. I think we did solid work. While some of this work will never come to fruition, most of it was and will continue to be used throughout the internet every day.

I will of course continue to work with this community going forward. I remain active in the RDFa and Protocols and Formats activity. I hope to assist ISO in its publication of the RelaxNG Modularization framework. And I am keeping an eye out for the next interesting 12 year project. And no, it's not HTML5!