March 18, 2005, 12:33 AM ET
Microformats could describe online news intelligently
A lot of the buzz at South by Southwest was about the concept of microformats, which are lightweight, informal standards for adding metadata to Web pages by using existing XHTML elements. Tantek Çelik and Eric Meyer both spoke enthusiastically about the idea, and I'm grateful I had a chance to speak with both personally.
A good example is XFN, a way of identifying human relationships within a Web page's code by putting a rel attribute on <a> tags. (I've written about XFN previously.) For instance, I'm friends with Simon Willison, so I put rel="friend" in the link to his Web site, and services such as rubhub do cool things with the aggregated data.
It works, because it's easy for humans to understand, easy to implement, uses existing infrastructure (XHTML) and solves a small, specific problem.
A few other microformats have been invented so far. Some examples:
- hCalendar -- A way to designate calendar information within a Web page
- hCard -- A way to designate contact information, as in a vCard
- VoteLinks -- A way to designate whether you agree or disagree with something you're linking to
Why spend the time adding that metadata to your Web pages? Because it makes it easy for automated tools to aggregate information, and it creates a bunch of interesting possibilities.
I love the idea of microformats, and, as I'm involved in the online-news industry, I'm naturally interested in their possible applications in a journalism context. Here are a few ideas I've been bouncing around; I'd love to see what people think.
Background-story relationships
News sites -- well, decent ones, anyway -- often link stories to previous coverage on the same issue. But there's no reliable way to automate the detection of previous coverage. I was thinking a rel="background-story" attribute for link tags could work.
A rel="background-story" could be used on internal or external links -- in the latter case, it'd be used if a newspaper is following up on the work of another news outlet.
I brought this up with Tantek at South by Southwest, and he suggested that <link rel="prev"> might be a better way of marking it up. The problem I see with that is that it implies a linear "previous" relationship, whereas a news story generally has more than one background story. It's true that some series of news stories are linear, but it's probably not a good idea to pigeonhole all news stories in this way. Journalism describes life, and life isn't linear.
(That said, <link rel="prev"> could, and probably should, be used for multi-part news stories, such as those in a series.)
Possible applications: Automated visualization tools that create news "trees;" much more intelligent news automation by aggregators such as Google News and Topix.net.
Story reaction relationships
It'd be beneficial to designate relationships between a news story and opinion pieces commenting on that news story. The news story itself could include <link rel="opinion-reaction"> in its link to the opinion page, and the opinion page could include <link rev="opinion-reaction"> in its link back to the facts. (The rev means "this describes the current page's relationship to the other page.")
Possible applications: If news aggregators used this, they'd be less likely to publish "opinion" content as news, which would solve a significant problem many journalists have with automated news sites such as Google News.
Reporter relationships
It's rather surprising to think about, but there's no good way to designate the reporter(s) of a news story in a machine-readable format. Yes, there's <meta name="Author">, but that could as easily be set to "Chicago Tribune" as to "Mike Royko." (In a newspaper, who's the "author"? The reporter? The publication? The editors/publishers?)
I propose a <rel="reporter">, which would be inserted on the link to the reporter's bio page -- on news sites that give their reporters bio pages, of course.
Possible applications: Web-wide aggregation of content reported by a particular author. More intelligent parsing of reporter information by news aggregators.
Factual relationships
This one gets a little abstract. It'd be fascinating to mark up every standalone "fact" in a news article and link it to a page of "proof" or "support." The link would have <rel="proof">, and automated agents could traverse the Web to build a gigantic "proof structure," tracing this fact back to an earlier fact, which would in turn be traced back to a previous fact. It's obvious this metadata would be incredibly expensive to maintain. But maybe news organizations should start thinking about creating infrastructure to gather this type of data.
