Intellipedia Oversight

I had some questions for the Intellipedia project, the wiki-based, open-editable encyclopedia for use in the U.S. intelligence community.

I’m a bit late in responding to Clive Thompson’s article Open Source Spying in the New York Times Magazine from December. Naturally the subtitle "Could blogs and wikis prevent the next 9/11?” caught my eye, due to my work in teasing out the different claims of technology boosters claiming to have solved the larger problems of information retrieval. Case in point? It’s not just the ability to find information; I was able to find the article by giving Google the search terms Thompson Times Magazine. I also needed to evaluate the quality of that article, and whether any information was out of date. That problem is hard (ie., not yet automated). The blogs, by themselves, don’t do anything at all to solve it. The favored blog search engine, Technorati, lists 341 blog posts linking back. How do I find the needle in that haystack? (Wikipedia was at least helpful by suggesting related sources for Intellipedia.) The general problem of a blog community as an echo-chamber I have discussed at length in the New Gatekeepers series of two years ago.

Nonetheless, Thompson ran with the blogs-and-wiki ball. Some of the folks he spoked to had pitched it upwards in the CIA, so that drove the narrative. “You can almost imagine how 9/11 might have played out differently.” Thompson casually mentioned the Phoenix Memo, the tracking of Pentagon hijackers Nawaf al-Hazmi and Khalid Al-Mihdhar, and imagined various blogging going to to make sense of it all. Would that it were so easy. The Office of Inspector General in the Department of Justice produced a 371-page report in November 2004 (released, redacted, in June 2005) to walk step-by-step as to how the FBI had handled this information within the counter-terrorism intelligence community. The conclusions faulted operating procedures at the time, as well as information management problems. Most of the conjectures about the magic assistance of blogs presumes that either (a) employees were willing to bypass the legal “wall” between the Agency and the Bureau or (b) that the Bureau had agents who had spare time to do analysis unrelated to their jobs.

That said, it doesn’t cost anything to try. Hence, Intellipedia.

Thompson’s article brought out some top supporters of the Intellipedia project. Ironically, some of the early praise for it matches the praise given to its now-maligned predecessor, Intelink. In 1998, after Frederick Thomas Martin, a former NSA official, published the book Top Secret Intranet, a little sunlight came in on the service. The Washington Post reported in 1998: “Now, with Intelink, documents are posted instantaneously. Also, analysts at different agencies are starting to produce intelligence reports collaboratively over the network.” Sound familiar?

Back in the 1990’s AltaVista was the cutting edge search engine, but it was surpassed by a company that’s a little more familiar now. In fact, a couple of months before the Times article, Government Computer News reported that the Director of National Intelligence had installed the Google appliance on Intellink back in 2003 (calling it “Oogle” for some reason). Also, there’s several search technologies being funded by the Q-Tel, the CIA’s venture capital fund. I saw a demo last year from Attensity, which, unlike Google, does natural-language searches. I was surprised Thompson’s article didn’t mention these.

Still, Intellipedia intrigues me, enough to put together a list of questions, for a couple of reasons. First, I figured that in a few years the GIO or a Congressional Oversight Committee will have a look at the intelligence communities’ knowledge systems– and they are going to be less interested in the warm-and-fuzzies of the wiki culture and request more quantitative metrics.

Secondly these sorts of questions might be useful for anyone considering an enterprise knowledge management system where wiki plays a part. A wiki is just a Content Management System (CMS) that does a few things really well, but cuts out a lot of advanced features which eventually get needed.

Granted, this is something that a somebody at Google is concerned about more than I (in fact, they have a job opening for a product marketing manager for just that). I’m just a private citizen.

I don’t need the answers right now, but I’ll email them to Calvin Andrus at the CIA, and also Congressman Tierney’s office and have him forward it to the House Select Committee on Intelligence. None of these questions come close to betraying operational intelligence. They are all questions about system aspects. And maybe I’m barely scratching the surface here.

  1. There are 28,000 articles. (Intelink had 440,000 as early as 1998) How many are people, places, or things? Of the people, how many are U.S. citizens, and how many are foreign nationals? I understand that by statute the FBI can have files on American citizens, but the C.I.A. cannot; so would citizens not be listed?

  2. How many are of the articles define external things (what one expects in an encyclopedia), and how many articles are about native information (meeting minutes, etc.)? Is any native information entered here first, or is it entered into departmental servers?

  3. The report mentions 3,600 contributors. What fraction of the intelligence community does this represent? (Some of the articles from 1998 suggested 50,000 or 300,000) How many are contributors, and how many are editors?

  4. What percent of edits are factual, grammatical, or analytical?

  5. The articles have cited Intellipedia for updating 80 times to react to the East River plane crash of Cory Lidle. Was this the best way to disseminate live news? Does it accommodate mobile users as well?

  6. How many documents do analysts review in a week? Does Intellipedia add to the glut or reduce it?

  7. Has the emailing of document attachments been reduced?

  8. Are legacy document management systems planning to be retired?

  9. One can measure the success of a system by counting how many times questions are answered by a search– or not. Has that been done?

  10. The industry-leading content management systems– such as Stellant, Vignette– include features to help audit who accesses what. It’s handy from an auditing point of view, to prevent abuses. Is this built into Intellipedia?

  11. Wikipedia tends to not practice “latching” versions– stamping a version with an ID so that it can be referred to directly — though perhaps other wiki setups do. Has Intellipedia done this? e.g., the example of the NIE on Nigeria– did that get a 2006-Final stamp?

  12. The FBI response to the OIG report promised that “the FBI is applying XML data standards and metadata tagging to facilitate the exchange of information with the intelligence community.” (Granted, the same document from 2004 promised the deployment of the Virtual Case File system, which was subsequently canned).

  13. Simple bookmark sharing/tagging systems are much more efficient and organizing outside content than blogging or wikis. Has DNI deployed that?

  14. Wikipedia does not integrate with shared bookmarking systems. That may be Jimbo Wales’s prerogative, though it would be highly useful in a heterogeneous environment. does Intellipedia do that?

  15. It would seem straightforward (though not trivial) to have the Secret version of an article pull up the Unclassified version of an article, and similarly, have the Top Secret/SCI level pull up the other two. Is that done/planned? Would this layered access architecture have use outside the intelligence community?

  16. Suppose an analyst search for information which exists in a document for which he doesn’t have clearance. Are you informed of the documents? Is there some special request to be made?

  17. Reading various documents released from the intelligence community, it appears that information is redacted at the release time. Are documents ever marked up at the creation time to indicate various security clearances? That would make the release of documents to lower security clearances automatic.

  18. Wikipedia doesn’t really manage documents containing questions to be asked about the topic (such as what you’re reading). Generally questions are strewn throughout the Talk page. Ideally one would design a “task list” attached to an article, of information to find/confirm, or otherwise the necessary caveats would be part of the prose.