« On improving the commentators feed | Main | Contact information »

The Library of Google

>> http://azeem.azhar.co.uk/archives/000261.php
>> http://blog.mediacooperative.com/mt-comments.cgi?entry_id=3379

Google's been very friendly towards the blog community, as evidenced most recently by their plans (stated at the Supernova conference) to provide a SOAP interface to asking for specific pages to be re-spidered.

I'd like to see them extend DMOZ to support trackback article pings. The blogging community would promptly have to extend the DMOZ categories, I suspect -- what are the chances of FOAP being listed, for instance? With the ability to ping multiple categories, you could, perhaps, escape at least some of the shoehorn effect as well.

There's another side effect as well, one that Google would likely benefit from. The DMOZ-categorized trackback pings would form the equivalent of a dynamic, online scientific journal. You could publish the top fifty most popular links in a newsletter -- electronic or paper -- each month, while retaining all indexed content on the web.

It's like crossing Daypop, Science News, and Trackback.

The common taxonomy problem is solved here as well; I find the DMOZ categories quite usable, and I observed that they're peer-modifiable -- which means we can add all the categories we need to integrate the blogging/smartmob community into Google's database.

I would die for the ability to do topical searches within the selection of blog posts available; I've found with NNW that I can't locate old articles, and I'd rather see that solved by Google (masters of the search engine) than my client program.

The educational community would suddenly find that they have an incredibly useful resource: the ability to easily research a hundred thousand peer-reviewed articles, indexed by a sane categorization system.

Then DMOZ categories become the call numbers of the net.

Comments

"The common taxonomy problem is solved here as well; I find the DMOZ categories quite usable, and I observed that they're peer-modifiable"

This breaks. A peer-modifiable taxonomy cannot be a common one - because of the passage of time. There will always be a breakage when a new category is created, because anything categorised before the new category is made, that might have gone into that new category, will be wrongly indexed, sometimes radically.

A periodic review of an article's categorization within the taxonomy would address this issue, I suppose. Go back and check, from time to time, to see if an article's categorizations are still accurate.

You could apply the idea behind the distributed proofreading project -- which has shown that people will, given a very small amount of easy work to do, will readily donate a minute of their time to you.

Tag a date on the latest review of an article's categorization. The longer an articles goes without being reviewed, the more it becomes important to review its categorization.

Now do as distributed proofreaders, distributed.net and SETI: assign work packets to people, a page at a time. Weight the choices towards pages recently out of date, if you like, but make sure a few of the long-term out of date's go out as well.

Give reviewer has the option to return the page unreviewed, though; there are some things I won't be able to categorize effectively, and I'd rather skip it than be unsure.

You can't have static categorization and dynamic categories; the categorizations will have to be dynamic, peer-reviewed and kept up-to-date with the categories, and that's a lot of effort.

I'd be a reviewer, if there was an easy interface to it. The proofreading people got it right, at least.

The comments to this entry are closed.

My Photo

Categories

  • Activism
  • Essays
  • Lazyweb
  • Politics
  • Science
  • Tutorials
  • Weblogs

Recent Posts

Powered by TypePad

Locals

Legal

Metadata

  • Antispam
  • Cloudmark
  • Shadows
  • Styles
  • You were here
  • floating atoll

Google

  • Search


    Google

  • Ads