Architecture: How is your content structured?

Media | Language/Structure
Proposing a series of questions that should be asked for analyzing online content. These questions should encourage the development of standards and software for qualifying online content such that it can be automatically analyzed. I am tempted to call this discipline media architecture, and I will check with others in the field about the appropriateness of the term.


Most media studies rely on indirect observations; for example, what people write in their “Nielsen Diaries” about when they watch a certain television program. This in itself must be processed though some semantic analysis, which is an academic way of describing how text/speech is to be understood on the basis of what it is being said. These are time-intensive, expensive, and subject to subjective bias on the part of the researchers.

Today, technology is allowing more direct, systematic measurements. On the television example, the Tivo network knows exactly what their viewers are watching– and whether such viewers are watching intensively enough that they hit the pause and rewind buttons.

We are now at a crossroads in media architecture. Thanks to the popularization of blogs and the RSS syndication standard, online content is now largely being made available through publishing software. This software makes it easy for authors/publishers to qualify their articles/posts. There is no need to limit the analysis to just blogs, popular as they may be, for most online textual content is suitable for analysis– whether from newspapers, magazines, or database-backed knowledgebases. (See the comparative studies index for more context.)

Consider Harold Lasswell classic formulation of communications theory: Who says What to Whom with What Effect?. We’ve taken the first half of this statement to consider criteria for measuring construction of content, and the second half to consider the criteria for measuring the reception of the content by the audience.


To my knowledge, there has been very little research into the how online content is constructed. In the Online Political Writers Scorecard, I was able to use a computed measure of the frequency of each site, but had to resort to detailed observations to figure out the rest.

  • FREQUENCY. How frequently does the author post?
  • LENGTH. What’s the average number of words in a post/article?
  • SOURCES. How many sources does the piece cite? It is easy to account for hyperlinks, but we would need authors to add a special tag to represent offline sources as well as direct sources (interviews).
  • SOURCE RELIANCE. How much is quotes from other sources vs. how much is original? Any quote longer than two lines, I set off in a <p class=”quote”>. Some bloggers use similar conventions, but there is no standard for word processing (or HTML editing) software to implement.
  • AGENDA. What is the agenda of the author? The leading newspaper columnists are rated according to their apparent partisan agendas on the Lying In Ponds. This is the due to the incomparable work of Ken Waight’s semantic analysis of these columns. It may be unscaleable to do for the more numerous and more frequent blog posts. Nonetheless, if bloggers are famously open about having a bias, why shouldn’t they qualify each post based on the agenda they are promoting?

This is a start, but the data is likely to be noisy until online writers begin qualifying their posts as different types of content. Periodicals have long done this by establishing departments. The Civ structure demands this, and specifies different story types: pointers, thoughts, questions, first person accounts, proposals, reviews, analyses, definitions. The first two types would most resemble what is commonly thought of as the “blog post” (high-frequency, short length), spanning towards the styles which are closer to journalism and research (longer, more sources, etc.)


Most of the research in reception has focused on how many hits a site gets, and how many hyperlinks point to that site– both of these are collected by the Truth Laid Bear website, which has been used by academic papers such as Drezner and Farraell’s The Power and Politics of Blogs (July 2004). This paper also conducted surveys to study readership, as did the Pew Research Center’s The State of Blogging report (January 2005). More recently, I analyzed the data from Technorati in evaluating the relative popularity of some of the thought leaders in social media. I advocated that we need to introduce more rigorous metrics. These could be some systematic measures:

  • READERSHIP. What exactly are influential people reading? Bloglines is in an excellent position to track this. They record not only how many people subscribe to a particular “channel”, but can track how much subscribers actually read.
  • FEEDBACK. What exactly are people saying in response? Researchers have long depended on semantic analysis to determine what is being said in Internet forums. But this is unnecessary if users have the means to qualify their responses, which the Civ structure provides through ViewPoints. Using ViewPoints (see the bottom of this page), users can rate the main article based on diverse criteria such as the viability of the ideas, the validity of the facts, the quality of the argument, the appropriateness of it all.


As people consider the continued evolution, of technologies and standards of online publishing, they should be mindful how this can benefit analytical research. Similarly, the readers and producers of online content should applaud the democratization of media research. Instead of being reserved for research institutes which take lengthy amounts of time to analyze only a select group of sites, the necessary analysis can be run by anybody on an as-needed basis. The only price to pay is for users to embrace the tools for qualifying content, and embrace the practice of it as well.