This article takes a brief look at basic web findability guidelines, how they are addressed by federal guidelines in the United States, and how the FCC, an agency which most commonly deals with Internet issues, fails some basic tests of such guidelines.
Introduction
As an information architect, I have in my mind a common list of what guidelines ought to be involved in making public documents available on the web such that they can be easily found by users:
- Publish documents in commonly recognized formats (HTML, PDF, etc.) with descriptive metadata (title, author, publication date, keywords, etc.)
- Use metadata descriptive languages like RDF/DublinCore or SiteMaps to represent the whole collection of documents.
- Enable the natural hyperlinking among documents, such that a document points to a document it is responding to, and, in turn, that documents links to all that respond to it.
- Enable browsing and searching, preferably with a search engine that allows for faceted search.
- Make documents accessible to public search engines, such as Google (i.e., use the Robots Exclusion standard judiciously.)
Grant that #1 and #5 suffice for most publishers; as search engine maven Danny Sullivan told USA Today in 2003, “It’s gotten to the point where people think if it’s not in Google, it doesn’t exist.” Google prefers that documents be organized in sitemaps, but of course continues to crawl unstructured documents.
With those in mind, we are interested in how well public agencies follow these common guidelines.
Federal Guidelines
The E-Government Act of 2002 provided some basic standards for public access to electronic information and the design of agency websites; it also established the Interagency Committee on Government Information to develop such guidelines. The ICGI recommended the formation of a Web Content Managers Advisory Council, which maintains the guidelines at USA.gov/WebContent. Here is how the current federal guidelines map to the suggestions above.
- The Guidlines call for the use of the appropriate file formats: they prefer HTML, XML, and then PDF over proprietary formats that require licensing of software. (“Presenting documents in open, industry standard formats allows every person with a browser to read the documents.”)
- The Guidelines suggest DublinCore for MetaData (“It is true that commercial searches like Google and Yahoo no longer use metadata since many web developers were trying to fool search engines with meta tags unrelated to their content and services. However, this could change in the future as search engines explore different ways of categorizing and cataloging search results.”), and separately, they advocate sitemaps, which are used by the public search engines.
- The Guidelines recommendations on linking only goes as far as suggesting that agencies provide a linking policy, which governs how links to external sites are employed. They do not explictly call for documents to be “naturally” linked to each other as appropriate. Such connective links are sometimes managed by CMS software; the Advisory council is still developing CMS guidelines.
- The Guidelines require that each agency website provide a search function on the prominent points of entry (“By December 31, 2005, this search function should, to the extent practicable and necessary to achieve intended purposes, permit searching of all files intended for public use on the website, display search results in order of relevancy to search criteria, and provide response times appropriately equivalent to industry best practices.”) Additionally, these best practices are suggested for implementing them.
- The Guidelines call for accomodation of public search engines. This cites OMB M-06-06 : “when disseminating information to the public-at-large, publish your information directly to the Internet. This procedure exposes information to freely available and other search functions and adequately organizes and categorizes your information.” The Guidelines further spells out: ” If you are disallowing search engine crawlers, you are not exposing information to search engines, and therefore not complying with this guidance.”
Searching for an FCC Document
Recently we came across a PDF document hosted by PublicKnowledge.org website, a public comment from NBC Universal, responding to the FCC’s request for comment on Broadband Industry Practices. The footnotes include a reference to an earlier document: “Appropriate Framework for Broadband Access to the Internet over Wireline Facilities” published in 2005. We would like to find that document.
A search on the FCC website brings up no obvious match. The first result is an index page of Commissioner Michael Copps Statements 2002. It in fact does link to his statement on the 2002 version of the Appropriate Framework document. But that page does not directly link to the document he is commenting upon. (A search on the subsequent hits doesn’t get us any closer to the document. A search on the FCC EDOCS form brings up nothing.)
A Google search for the document title , brings up as its first hit one of the many public responses to it; the second document, from Cybertelecom, is what we are looking for. It provides the press release for the Appropriate Framework document, and numerous related documents, along with a hyperlink to the document itself: Appropriate Framework for Broadband Access to the Internet over Wireline Facilities.
Note that the URL, is on the server hraunfoss.fcc.gov. What is hraunfoss? Further research brings up that another popular document server under fcc.gov is named fjallfoss. A Google search of these two terms quickly points us to a common theme — they are Icelandic waterfalls. This is not so unusual — many Internet servers have employed “pet” names beyond the generic www (The original site of Yahoo! was akebono.stanford.edu, named after the Hawaiian sumo wrestler). What is striking is that it appears that both servers, through their respective robots.txt files, exclude search engines! So no documents appear in a common search:
http://hraunfoss.fcc.gov/robots.txt | http://www.google.com/search?q=site:hraunfoss.fcc.gov |
http://fjallfoss.fcc.gov/robots.txt | http://www.google.com/search?q=site:fjallfoss.fcc.gov |
[A later search for the document on the FCC server brings up as the third link the EDOCS index page for this document, FCC-05-150.]
Organizing FCC Documents
On the larger point, the FCC has asked for comment regarding Broadband Industry Practices. In theory, the FCC has organized these in some cohesive manner, and if they have, there should be no reason why the general public couldn’t view them on the FCC website in the same manner. There should be a definitive FCC page on this topic (there appears to be not), it should have all of the direct responses to it, and each secondary responses to those primary responses should be linked. There is nothing of the sort. The general public has one simple remedy: use a public search engine. Thus, the curious citizen, otherwise uninitiated in labyrinthine nature of federal documents, has to rely on the very sort of vendors that FCC could be one day regulating.
There’s a further irony here. We have found Cybertelecom.org to be a very good website for organizing public information surrounding federal Internet law & policy. It is run as a volunteer project by Robert Cannon, who is also the Senior Counsel for Internet Law in the FCC’s Office of Strategic Planning and Policy Analysis. Thus we have an FCC employee (and other public volunteers) managing information in perhaps a better way than the FCC’s own website does.
Such is the beauty of the Internet, of course. The people best able to organize information around a topic may not be the same people officially responsible for doing so. But if the federal guidelines are to be followed, it’s the official websites which ought to be setting the bar for volunteer efforts to aim for — and not the other way around.
[UPDATE, Jan 14th: Cannon wrote me back, pointing out some of the particulars about FCC documents. I don’t doubt that any intelligent researcher can figure it out. It’s just that it’s not as intuitively linked as the Library of Congress THOMAS project for federal legislation.]
Recent comments
14 weeks 6 days ago
15 weeks 6 days ago
16 weeks 1 day ago
22 weeks 6 days ago
25 weeks 3 days ago
28 weeks 3 days ago
29 weeks 1 day ago
29 weeks 2 days ago
29 weeks 4 days ago
30 weeks 2 days ago