Why is there no universal video news search?

Broadcast | Language/Structure

There are a few sites which provide a summary of, and a search within, print/online news stories: Google News and Topix are the obvious leaders. But there's nothing comparable for television video, let alone audio.

The Vanderbilt Television News Archive is the best model. It not only has a search, but it allows the user to browse any month of news coverage since August 1968.

There's one catch: it only indexes what it has archives for, and it only archives what it can index. In assessing the state of television archiving in 2005, Jeff Ubois learned that VTNA takes 5 human hours to catalog one hour of news. Thus the collection is limited to the three network broadcasts, it has CNN since 1995 and Fox News since 2004. It does not have MSNBC, C-SPAN, PBS, the newsmagazines or local TV stations; though it does expand its trawl during special events  like wars and Presidential conventions & campaigns.

The Moving Image Collections (MIC) was initiated by the Library of Congress in 2002 as a program to unify the nation's film and video archives. It was developed by Rutgers University, Georgia Tech and the University of Washington. The core project is the MIC Union Catalog, which has been designed as a gateway to other online catalogs. Still, the catalog only aggregates 14 collections, and only one of them, is a commercial news organization– CNN. A search for Dick Cheney brings up a paltry 10 items.

So we're struck by two obvious questions. Why isn't more video news data catalogued in the MIC format for the Union Catalog to use? And isn't there indexed information at the source that can be leveraged?

Granted, the major news aggregators, Google News and Topix, do not leverage source indexing at all; they have their own AI algorithms to cluster news items and power search. The same approach may prove useful for video (and audio). EveryZing, a spinoff of BBN (famous for its pioneering work in both acoustics and computer networks) has been applying speech recognition technology to Internet audio and video. But this corpus, at present, lacks most of the broadcast news– unless a Internet viewer somewhere has ripped it onto a video-sharing service.


Update, August 28th: I was curious enough to find out I wrote a 2200-word article for PBS MediaShift, The Tangled State of Archived News Footage Online.