The Search, for News: an interjection to John Battelle’s Book

Why do we read what we read?

This question interests me as a reader; it must interest John Battelle even more as a publisher. Battelle was a cofounding editor of Wired and founder of The Industry Standard; he now helms the online publishing venture Federated Media. Two years ago, he wrote The Search, about Google and its impact on business and culture. At the close of the book he appealed to his readers to send him corrections and updates to the book: “I am committed to updating this work at the Searchblog site.”

Two years afterwards, I humbly submit to the author an interjection into The Search.

Battelle had considered the above question in a blog post on October 11, 2004, and he in adapted it for the book (p. 173-175). He and a colleage had wondered why they hadn’t read anything from the Wall Street Journal or the Economist in quite some time. I have excerpted four of his points below, adding in brackets what appears solely the book, and omitting a few minor grammatical transformations. (Battelle’s editor appears to have had a fancy for substituting other words for “blogosphere.”) Here they are:

  1. “[Because they are fearful of losing revenue as a result of search,] both require paid subscriptions, and therefore, both do not support deep linking [that drives news stories to the top of search results at Google and its brethren]. In other words, both are nearly impossible to find if you get your daily dose of news, analysis and opinion from the blogosphere [Internet].”

  2. “Take the plunge and allow deep linking [– let others on the Web link to your stuff. (The Journal, to its credit, has begun a limited implementation of this idea)]. Notice I did not say abandon paid registration, in fact, I support it.”

  3. “Even if I did read the Journal’s feeds, I wouldn’t refer to them in any posts of mine, as my readers and community can’t read what I read. More and more, I find that if I can’t share something (i.e. can’t point to something), it’s not worth my time.” [not in the book]

  4. “I’d be willing to wager that the benefit of allowing the blogosphere [world] to link [point] to you will more than make up for potential lost subscribers… In short, if a reader finds him or herself pointed to the Journal on a regular basis, that reader knows that by subscribing to the Journal, he or she would be more in the know. … I think allowing deep linking will drive subscription sales, rather than attenuate them.”

The first two statements contradict each other. It is simply not the case that deep linking is incompatible with paid subscriptions; Battelle acknowledges such in the second statement. Part of the problem here may be some confusion as to what deep linking is and is not. Let me offer some definitions:

  • Deep Links: These are hyperlinks that point directly to articles (and not the front page). This is unremarkable, except that a decade ago, ecommerce website operators like eBay and Ticketmaster had prevented competitors from linking directly to their internal pages. But this point is mostly moot today. Most content, particularly from news publications, is accessible via its direct links.

  • Freely Accessible Links: The links are viewable by whomever clicks on them, subscriber or otherwise.

  • Permanent Links: These are links that do not change over time. The creator of the web, Tim Berners-Lee, originally figured that content could change addresses, and that in fact is one of the great flexibilities of the web (versus other proposed hypertext systems). Still, in 1999 he clearly articulated that links should stay fairly constant: “It is the duty of a Webmaster to allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years. This needs thought, and organization, and commitment.” What he didn’t articulate then, but surely anticipated, was that many tools would rely on the URL as a unique identifier (URI stands for Uniform Resource Identifier; for all practical purposes it is the same as the URL, in which the L stands for Locator)

Whether people choose to make a hyperlink to a resource depends on a combination of all three. They need access to the document in the first place to be able to link to it. But they also need to be confident that the link is permanent, and won’t “rot” over time. Many news publications have offered current as static files in one location, and archived content via a dynamic application elsewhere. When the current content “expired” it was moved into the dynamic system, which had different links. By doing so, they made the mistake of breaking permanent links. (As a result, a healthy number of publishers have migrated to “seemless” link architectures.)

But it’s not even clear that that alone can explain the reticence of links to those articles. Battelle said as much in statement #3 above: he had avoided linking to the Journal since “readers can’t read what I read.” He probably meant that many of them don’t also have subscriptions. In conventional blogging, the link is perceived as fundamental, and thus, if no link, no blog post. But the above example clearly illustrates the fallacy. A book is typically more reviewed, edited and considered than what is written in a blog post. But since there is often no possible “hyperlink” to the content of a book, there must be a bias against pointing to books. But bloggers cite books, conferences, people and other sources not strictly on the web, all of the time, so there’s no rational reason they can’t cite a physical newspaper. For example, Battelle is currently reading Steven Pinker’s The Language Instinct, and in a blog post hyperlinks to its Wikipedia page.

It is probable that Battelle, like many bloggers, has two channels: external web links that he discovers or that people pass him, and original scoops from his own experiences. [The article you’re reading is published on a website which generally eschews the blogger tendencies, and mostly publishes essays pulling from multiple sources: online and off; old, and new.] And the WSJ does not make the first channel, perhaps because the people he reads aren’t reading it. There are exceptions: in 2004 a reader sent him a link to a WSJ article by Kara Swisher, so he blogged it. It was about wikis.


Now to address Battelle’s wager in #4. Restated, he said that a high quality publication can draw new readers – and mint enough as subscribers – by getting regular links from the blogosphere.

My series on TimesSelect, which took several weeks of research, may help answer his wager. TimesSelect was the two-year experiment by the New York Times to charge for the Op-Ed and other content. It also, less well known, marked the adoption of seemless links in, so that people could link to articles without fear that the links would rot. (Incidentally, Battelle was not critical of TimesSelect.)

The first question is whether the deep/permanent links alone were sufficient to encourage linking. Many bloggers lamented pointing to, and even some went as far as pledging to boycott it. Some felt that they didn’t want to play a part in helping “market” the NYT. No one seriously kept the boycott. In the end, most bloggers figured that linking to sources befitted the integrity of the web; it enabled applications like Google and Technorati and BlogPulse (and any others) to produce intelligence from link analysis.

The next question is how many. In my research, I lacked the dexterity to count links precisely, so I counted the number of name references; I stated that this was likely commensurate with links (e.g., references to “Frank Rich” shows weekly spikes). Comparing the two years of TimesSelect to the two years prior, I found that the references to the 7 Times Op-Ed columnists went up by a factor of 8; a sample community of fifty pundits overall grew by a factor of 10. (I’ll repeat here that these growth rates are due as much to the growth of Google’s blog index in that time) That demonstrates a 20% drop in linkability. Then again, the readership drop, as estimated by last month, was 45%. Thus, bloggers who were fans of Thomas Friedman, David Brooks et al actually helped their favorite columnists stay linked in the blogosphere – this is fairly close to Battelle’s prediction.

The last question is whether this brought in any new subscribers. To meet this test, the typical political bloggers would have started to think along these lines: “all the top bloggers are linking to the Times, I should get a subscription, so I can, too.” Thus the TimesSelect subscribers would have increased over time. I’ve been unable to get any data from the NYT, but I suppose that this wasn’t the case, since the service was canceled. In addition, most bloggers were more apt to begrudge the paper. That said, there would be much better data from other publications that have had a dual open/pay business model over the last decade.

There’s one little problem with Battelle’s wager. He assumed that most of the valuable links were coming from the blogosphere. There’s another gatekeeper he should well know about.


Let’s re-read the first statement above, in full, as it was in the book, and, for clarity, enumerate each clause:

[1] Because they are fearful of losing revenue [2] as a result of search, [3] both require paid subscriptions, and therefore, [4] neither supports deep linking that [5] drives news stories to the top of search results at Google and its brethren.

Battelle’s original statement in the blog was clauses 3 & 4. We already demonstrated that these were incongruent. The only clauses that make sense together are 1 & 3. Online publications have wanted to retain revenue and thus require paid subscriptions; these decisions were made before Google was even born. But we haven’t yet dealt with 5: what gets to the top of search results at Google. It’s a vast simplification to combine Google web search with Google News. The latter presents breaking news stories as they hit the wire: there are no are no outside links to them. The factors that drive news stories to the top of Google News are based an entirely different ranking altogether.

What are those factors? If it’s the lack of support for deep links, Battelle neglected to provide any example of that (as a subscriber, he could simply have produced a link on his blog). It turns out that what shows up in Google News is understood by much fewer people. NewsKnife, a service of Industry Standard Computing (a New Zealand software development firm, which once sent a press release to Battelle about the service), has done diligent work in collecting and analyzing the data of what shows up. It reports [sub req’d] that in the first 10 months of 2007, the Times had 478 references on the front page of Google News, the most of any news source, with 137 being the first-listed source. The Wall Street Journal, by contrast, had 68 references, and only 2 listed as the first-listed source. The Journal was ranked #38 on this scale (after Gulf News of the UAE and Reuters Canada). The two first-listed source mentions for the WSJ are two more than they had in 2006.

What explains this disparity? In April 2005, Barry Fox of New Scientist revealed that Google had filed for a US Patent and a world patent on the algorithm used in Google News. It does not reference Patent #6,285,999, “Method for node ranking in a linked database,” otherwise known as the PageRank algorithm, assigned to Stanford University and licensed to Google; it does not reference any other patent. The algorithm was based on creating a source rank value separate from PageRank, calculated from “based at least in part on one or more” of the following:

  1. a number of articles produced by the identified source during a first time period,
  2. an average length of an article produced by the identified source,
  3. an amount of important coverage that the identified source produces in a second time period,
  4. a breaking news score,
  5. network traffic to the identified source,
  6. a human opinion of the identified source,
  7. circulation statistics of the identified source,
  8. a size of a staff associated with the identified source,
  9. a number of bureaus associated with the identified source,
  10. a number of original named entities in a group of articles associated with the identified source,
  11. a breadth of coverage by the identified source,
  12. a number of different countries from which traffic to the identified source originates,
  13. and a writing style used by the identified source

It’s possible that the WSJ‘s rankings have been depressed by factor 5. But according to Compete, is on par with the websites of wire service Bloomberg news and the Houston Chronicle, ranked #5 and #6, respectively. What other possible factor can explain why the Journal is so low? Has Google been discriminating against paid content?

Battelle did make note of Google News in The Search, specifically, he discussed the implications of its introduction to the Chinese market. It was banned by the Chinese government for two weeks until Google removed the headlines from blocked sources. Google, in their public statements, felt that “aside from the politics” the presentation of unreachable headlines led to a “serious user experience problem.” Thus, deletion was the better part of valor. Battelle asked, “isn’t it better to know that something exists, even if it is blocked, than to know it at all?” (p. 206) Yet, he had no qualms with Google News not bothering to list the Wall Street Journal or the Economist – which he had subscriptions to, after all. He never asked anybody; he just assumed that it was due to the lack of deep links.

Another bit of development in summer 2005 was missed by Battelle in his book. The forums on Search Engine Watch first noticed Google News’s “First Click Free” program. (To date, Battelle has not mentioned it on his own blog.) Editor Danny Sullivan later explained in March 2007, “I did have several off-the-record conversations with Google about this. The main thing that came out that I can report was that Google really felt most users should see what their spiders saw WITHOUT having to register or pay for access.” First-click free is a stealthy whip that Google carries to allow free access to paid content; it enables any diligent reader to read a publication for free, simply by clicking through Google News.

Recently, the Wall Street Journal agreed to the first click free program (a Google watchdog noticed it on July 30th of this year). Thus, via Google News, you can read articles from the Wall Street Journal online for free. You also get the first click free through Congoo, and last month, as Battelle noted, through Digg. In other words, readers get the first-click free through Digg, but not from J. Random Blogger.

This begs the issue of a real wager, one that is currently being undertaken by the purveyers of paid content. Does joining the first click free program add subscriptions or cannibalize them? Do the ad views of those first clicks make up for lost subscribers? For the, one quarter of monthly visits come from a search engine (Compete’s numbers show 17% from Google), and thus make up some fraction of revenue. How many subscriptions have been lost is known only to the publishers. Anybody who wishes to write the next book on Google needs to produce the hard numbers on this.


Shortly after Battelle’s original blog post, Mark Glaser did pose the deep link theory to managing editor Bill Grueskin in an article in the Online Journalism Review. Battelle’s response, in part:

The Journal currently makes one story a day (ie, one story to all, not one per) available to bloggers, a practice I find a bit imperious – it misses the point of this multi-faceted conversation – the power is in the tail, not the head, and the tail needs more than one story to power it.

Putting aside the head and tail power (these have not been debunked enough), this quote still should be parsed. Then, as now, the Journal makes all its stories available to its readers. Some of them may choose to blog about the stories and link to them. Bloggers have always had the choice whether to subscribe or not. In May 2005, a blogger named Dave Friedman found a story in the WSJ which he linked to even though it was, as he noted, “ironically, a paid subscription link only.” The irony was that the article was about the announcement from the Times to start charging for content. Friedman wrote, without a shred of his own irony, “In order to remain influential in the great political debates of the day, editorialists, pundits, bloggers, and any other interested party needs to be present in the great cauldron of ideas and debate that is the blogosphere.” But clearly nothing was preventing the blogger from bringing a piece of the Wall Street Journal into the “great cauldron of ideas.” Nothing was preventing any WSJ reporter from responding directly in any blog on the Internet.

Certainly, just because stories can be blogged, doesn’t mean they will be. It’s a numbers game as to what gets amplified and blogged. The more readers you have who are bloggers, the more your articles are blogged. The higher your placement is in Google News, the more likely you will reach new readers. But these alone don’t drive the infectiousness of a story. Thomas Friedman gets four times the references that Nicholas Kristof does (though they appear on the same Op-Ed page with roughly the same frequency), I suppose due in part to how they cover foreign affairs. Friedman writes of aspiration; Kristof writes of desperation. The blogosphere rewards the former.

It’s not clear that a better Google News placement will make the Journal and the Economist instant blog favorites. The leading Times stories convey a sense of moralizing agitation, while the typical Journal story conveys a sense of detached comfort. The former is more grist for the bloggers. Take tomorrow’s leading story on the Mitchell report on the “Steroid Era” in Major League Baseball. NYT: “Steroid Report Implicates Top Players.” WSJ’s lede: “The Mitchell report is unlikely to affect baseball in the long term, experts say.” Yawn. Nothing to blog here. Rupert Murdoch may end up making the online version free, and this will increase the pool of blogger-readers. But it may take stylistic changes as well to get the stories actually blogged and “in the conversation.”

Update, February 21, 2008: I’ve invited Battelle again to respond to this; he was busy in some other thing last time time I contacted him. Also, the WSJ has held off on the plans for making online free. Maybe they read this article. I don’t know.