|Up: The Semantic Web||[Related] «^» «T»|
Monday, August 26, 2002
By Paul Ford
An idea for the Semantic Web.
A friend, trying to follow the news, wrote: “it gets tough and frustrating to try to follow an international story for more than a couple days, unless it's enormous.” She's right. For a politics or news junkie, it's a pain. One day, it's near-nuclear war in Pakistan and India on the front page; the next day it's a baby tiger born at the zoo.
I have a way to solve the problem. Or rather, I will now dictate my ideas to an uncaring world, for the fun of it. Here's what happens: someone sets up a small organization that assigns permanent URLS to every major news event. The URLS look like this: http://newspurl.org/pakistan-india-nuclear, http://newspurl.org/kyoto-treaty or http://newspurl.org/us-china-trade. Simple stuff. The URLs don't have to be incredibly granular or complicated.
Then, whenever anyone publishes a story on a topic to the Web, they include a bit of metadata in their web page indicating that this page is covering a Newspurl-identified story. From Newspurl, a fairly simple web crawler can go out, chew through the world's news sites, and update a master list of which site has which article. If the crawler is thoughtful, it can figure out the date of the article and its author, its source publication. Sites like DayPop and BlogDex do that sort of thing now.
So now, you want to follow a story, you go to Newspurl.org and look it up in their ever-growing database of stories. And there it is, organized by date. Or even better, as you were reading a story on some news site, your own browser picked up the metadata inside the page and put it on a list of the stories you were following. Perhaps there's even a discussion forum of some sort there on Newspurl.
Could it work? Technically, sure, no problem, if people started using Newspurl URIs encoded as metadata in their Web pages and someone set up the Newspurl site. There's no huge barrier to it, and it wouldn't take a big staff. Right now, Yahoo! does it with their news stories, but it's done by hand. With this approach, much of the finding would be automated. The work would be in making sure that people weren't creating duplicate Newspurls, or spamming the system. A simple “flag for review” system would kill most of that, though.
And of course you'd have to convince news organizations to post metadata about their pages. So, what's in it for them? Increased traffic, for the cost of a tiny piece of bandwidth and a few extra minutes of an editor's time per story. Now, in truth, I doubt the New York Times would get too excited about the project. They're the big kid, and the traffic from Newspurls will be small, especially for the first few years. That's okay - volunteers could probably be called on to sort the Times online content into Newspurls if needed.
So the Times might be a hard sell. But for the smaller players, the concept would work in their favor. Let's pretend there's an Independent Asian Economic Policy Institute that publishes an online newsletter. In 2010, when China is threatening to drop nuclear weapons on Taiwan, IAEPI's monthly audience goes from 1,200 professors to 150,000 nervous folks wondering what happens when the world's most prolific semiconductor plants vaporize. Because they snapped to with content with a hook into http://newspurl.org/china-taiwan-nuclear, IAEPI is suddenly listed right below the Washington Post as a source.
That's all I have to say on the matter. Since I have thousands of Semantic Web ideas, I'm going to try to keep a record of them, try to sort out my thoughts where people can yell at me.