Genre Classifications for News Content

Newspapers have sections; magazines have departments; weblogs have neither. All of these publishing forms carry content of interest to readers, yet none use the same name to describe the essential nature of that content.

The newspaper section section conveys the broad subject/topic of a given article. Yet it is the department of a magazine which best describes the essential nature: its genre.

How essential is the genre? Let us provide some examples first: reporting, on-the-scene reporting, analysis, opinion pieces editorial, satire, humor, box scores, wedding announcements, obituaries, advice columns, explainers, reviews, responses, Q&A, chat transcripts, interviews, press releases. The genre provides the frame of expectation for the reader.

Pioneering webzines like Slate and Salon carried the concept of the genre forward into the online era. Yet magazine departments often use idiosyncratic names, and newspapers inconsistently supply the genre for an article. Most human readers are able to guess the genre from the form. But a machine often can’t, and that’s an increasing amount of Internet news content is being read, parsed, and repackaged by computer algorithms.

The Problems

The following are specific problems with the genre not being specified by the publishers.

  1. The most common machine reader are news aggregators must make guesses. Many of the leading aggregators use topical clustering to automatically classify news stories around a current news topic; Google and Topix figure out the location (geocode) with some degree of effectiveness. But none come anywhere close to guessing the genre. Thus news aggregators can show, at best, either a linear list or a “conversational tree” (e.g., Megite, BlogPulse Conversation Tracker). In addition, as far as Google is concerned, it maintains a separate News search from its Blog search.
  2. Search engines provide the user with a summary of articles matching the user’s search criteria. Very few supply the genre in the results, let alone allow the user to specify it in their search (see also Faceted Archives)
  3. Rating services like NewsTrust expect people to rate articles of different genres. It allows the submitter to pick the genre from a brief list– as it’s not supplied from the publisher. At a recent strategy session with NewsTrust, several people argued that the genre should affect how people rate the articles.

Technical Barriers

  1. RSS is simple — it was deliberately designed to be so.
  2. RDF metadata standards are diverse, but there is not a common adoption by aggregators.
  3. CMS tools do not generally support a common metadata standard.
  4. Writers and editors don’t have the wherewithal to manually code standard metadata (folksonomy advocacy has partly chilled enthusiasm for controlled vocabularies).

Potential Interested Parties

(In italics are parties I have contacted.)

  • General News Aggregators: Google, Yahoo, Topix, BlogPulse
  • Niche News Aggregators: Daylife, Megite, Techmeme, Technorati, Feedster
  • News Search Brokers: Google, HighBeam, FindArticles, Dow Jones Factiva
  • Wire Services: AP, UPI, Reuters, AFP, NYT, Dow Jones
  • News industry organizations: International Press Telecommunications Council (IPTC), Newspaper Association of America (NAA)
  • CMS producers: Vignette, Drupal, MovableType, WordPress

Potential Standards

There are a number of potential standards in use: 

  • IPTC NewsCodes Genre – The IPTC standards are used by newspaper industry, primarily for wire service syndication. Unclear about interest in other publishing formats.
  • PRISM Genre – PRISM stands for the Publishing Requirements for Industry Standard Metadata, a standard of the Industry Digital Enterprise Alliance. It is strong in use by magazines, trade publishers, photography publishers. Version 2.0 of the standard was released February 2008. The Nature Publishing Group authored a PRISM module for RSS in 2004.
  • Google News SiteMaps schema – Google is evolving the Sitemaps for Google News schema, but hasn’t been clear on how open this process will be.
  • Microformats — Unclear whether genre is covered.
  • Structured Blogging — defunct?
  • RSS Extensions wiki — A collection of RSS extensions, none of which cover genre. (It has been spam-riddled for the past year.)
  • Reuters Calais — An ontology for business reporting. Does not include genres, but otherwise is an example of an ontology that benefitted from a well-publicized launch in early 2008.

PRISM and NewsCodes Compared

There are 59 genre tags in PRISM and 44 in NewsCodes. Between the two of them, they share only 10– and half of those have different names in each (these are in bold below). NewsCodes allows for spaces and is generally capitalized; PRISM always begins with a lower case and joins its words via camelCase.

The PRISM website says this about IPTC-NewsML:

NewsML [IPTC-NEWSML] is a specification from the International Press Telecommunications Council (IPTC) aimed at the transmission of news stories and the automation of newswire services. PRISM focuses on describing content and how it may be reused. While there is some overlap between the two standards, PRISM and NewsML are largely complementary. PRISM’s controlled vocabularies have been specified in such a way that they can be used in NewsML. PRISM profile one compliance permits the incorporation of PRISM elements into NewsML, should the IPTC elect to do so. The PRISM Working Group and the IPTC are working together to investigate a common format and metadata vocabulary to satisfy the needs of the members of both organizations.

 

 

 

abstract
acknowledgement
adaptation
advertisement
advertorial
analysis
authorBio
autobiography
bibliography
biography
blogEntry
brief
chronology
classifiedAd
column 
correction
cover
coverStory
coverPackage
electionResults
eventsCalendar
essay
excerpt
fashionShoot
feature
featurePackage
financialStatement
homePage
index
insideCover

 

interactiveContent
interview
legalDocument
letters
masthead
newsBulletin
notice
obituary
opinion
photoEssay
poem
poll
pressRelease
productDescription
profile
quotation
ranking
recipe
reprint
response
review
schedule
sectionTableOfContents
sidebar
stockQuote
tableOfContents
transcript
webliography
wireStory

IPTC NewsCodes

Actuality
Advice
almanac
Analysis
Anniversary
Archive material
Background
Current
Curtain Raiser
Daybook
Exclusive
Feature
Fixture
Forecast
From the Scene 
History
horoscope
Interview
Music
Obituary
Opinion
Polls and Surveys
Press Release
Press-Digest
Profile
Program
Question and Answer Session
Quote
Raw Sound
Response to a Question
Results Listings and Statistics
Retrospective
Review 
Scener
Side bar and supporting information
Special Report
Summary
Synopsis
Text only
Transcript and Verbatim
Update
Voicer
Wrap
Wrapup

 

It’s unfortunate how little overlap there is; many categories in one could equally apply to the other.

The inclusion of “blogEntry” within PRISM is helpful, to a point. It is too vague a term. Any “blog post” which is a review or feature or interview should be marked as such. Certainly, blogs have popularized new forms into public media: the diary, the link, even the rant. These terms should be include in any common genre taxonomy for online publishing.

Indeed… by Jon Garfunkel

Comment viewing options

Select your preferred way to display the comments and click “Save settings” to activate your changes.