Tagging quotes in a news story

Written by Adrian Holovaty on December 1, 2003

During the recent Medill Storytelling Symposium at Northwestern University, somebody incidentally mentioned the tendency for readers to skim through news articles and stop only to read direct quotes. It's no surprise why: Quotes are often the most colorful parts of a story, and disillusioned readers might think direct quotes are the only pieces of a story not infused with a reporter's bias.

That got me thinking. If some readers read only the quotes in newspaper stories, why not make that easier on them?

Wouldn't it be cool if...

  • ...Online news stories had an option to "highlight all quotes," which would, for example, subtly gray-out everything that wasn't a quotation? That'd guide the quote-skippers' eyes to the content they really wanted, while maintaining context.
  • ...There were an "All recent quotes by Mayor Smith" page? Sounds valuable to readers and reporters alike. Heck, I'm sure Mayor Smith herself would find it useful.
  • ...There were an "All quotes in today's newspaper" page? With links to full articles, of course, for context.

If news providers tagged and fielded their quotes somehow, all this would be possible. HTML already provides the <q> tag for marking-up quotations, but the problem is that someone has to insert the tags in the first place. That's no big deal if a single person is in charge of editing content -- Mark Pilgrim's Posts by Quotation archive succeeds because he puts much effort into marking-up blog entries -- but changing the workflow of a multi-person online-news production staff is significantly more difficult. And the extra time it'd take for a Web producer to tag all the quotes in a story manually just wouldn't be worth it.

So is this a pie-in-the-sky idea? Of course not! I think technology is the answer. It seems to me 80 percent of quotes in news articles are in exactly the same quote-citation-quote format:

"They've got a building down in New York City," said Arlo Guthrie. "It's called Whitehall Street, where you walk in, you get injected, inspected, detected, infected, neglected and selected."

Begin quote mark; text; end quote mark; "said"; source name; begin quote mark; text; end quote mark. Looks very parsable.

Of course, the other 20 percent of quotes aren't as nicely formatted. But I'm not so sure that automating the tagging of quotes in a news article is impossible. I think I might try it.


Posted by mini-d on December 1, 2003, at 10:23 a.m.:

Not to mention you can use cite="..." and title="..." and lang="..." atributes for making the quote better than a simpliers "...".

I've talked with John Gruber for give the option to replace "..." to <q></q> quotes...

Posted by Craig on December 1, 2003, at 7:34 p.m.:

There are two hurdles (which are surpassable) with this great idea:

The first is that Internet Explorer doesn't place quotation marks around <q> as Mozilla and Opera do (this can be fixed with JavaScript).

The second involves punctuation inside the quotes. In the example you cited, there's a trailing comma before the attribution which could produce a grammatically strange result:

"They've got a building down in New York City, It's called Whitehall Street..."

You may need to add another parameter:

Begin quote mark; text; punctuation; end quote mark; "said"; source name; begin quote mark; text; punctuation; end quote mark; punctuation.

Also. you'll probably find it's more common to see: source name; "said";

Good luck with this and please keep us posted about how this goes (even if it isn't implemented).

Posted by Dave Moreman on December 4, 2003, at 3:06 p.m.:


Some stuff here on how a text to speech program detects quotes...


Posted by Joe Clark on December 20, 2003, at 6:06 a.m.:

As I explain in my book, if you're using HTML entities for opening and closing quotation marks, you know with certainty where they begin and end. (Ambiguities present with British single quotes.) You don't even need to use the largely-hypothetical q element.

Comments have been turned off for this page.