Web site shows your page's metadata

Written by Adrian Holovaty on March 10, 2004

Like many Web-development bloggers, I've written about the importance of writing valid and semantically-rich HTML -- code that's heavy on self-description.

It's generally frustrating to evangelize semantic markup, though, because its advantages aren't immediately apparent. Why should people take time to put <q> tags around quotes if Web browsers don't really do anything useful with them? The main point of semantic markup is to make documents easily understood by computers -- but computers don't seem to be doing anything exciting with the markup yet, on a large scale.

Well, now there's this thing, the W3C's Semantic data extractor. Pop in a URL, and it'll attempt to glean as much semantic information from the page as it can. It's a great way of visualizing what sort of data a computer can extract from a Web page.

What an outstanding idea. Developers should know about this.

Comments

Posted by Mike P. on March 10, 2004, at 9:01 a.m.:

Great link, I had come across this a few weeks ago and somehow lost the bookmark (too many browsers on the go!)

"but computers don't seem to be doing anything exciting with the markup yet, on a large scale"

For an example, check out the link I posted in my blog, which spells out another advantage for using proper sematic markup.

Posted by Pete Prodoehl on March 10, 2004, at 4:33 p.m.:

What might also be of interest is a tool that clearly displayed the outline of a document (as the W3C Validator does) and explains that the words within <hn> tags have more meaning to search engines and other software, than and tags.

Again, the basic idea of "here's what a computer sees as important" with the extra push of "here's what search engines see as important."

Posted by thomas on March 25, 2004, at 11 p.m.:

It's also much easier to read '<q>Stuff here.</q>' than something like 'Stuff here.'. And it isnt nearly so hard on the eyes. But the fact that search engines can (or have the ability to) use it to understand pages better. As well as aural browsers, etc. ... I have not used one, but my guess is that they say "quote" preceeding text in <q> tags, etc. Also, its much cleaner/easier to style tags when its 'q{margin-left:10px;}' instead of 'span.quotething{margin-left:10px;}'.

Posted by Scott Johnson on April 5, 2004, at 10:52 p.m.:

That's a great link. I just wish I could make it work with my site. I ran holovaty.com through it and saw the nice output that this site gives, but my site is a No Go. The error give is something like this:

Are you using this properly?

Choked processing http://www.w3.org/2002/08/extract-semantic.xsl transforming http://cgi.w3.org/cgi-bin/tidy-if?docAddr=http%3A%2F%2Fspeed.insane.com%2F

...

org.xml.sax.SAXParseException: character not allowed

at com.jclark.xml.sax.Driver.parse(Driver.java)

...

Not very useful error output. For me, this is a tool that doesn't work.