Notes from the Gilbane Conference on Content Management

Internet | Language/Structure

Here’s a write-up of what I learned in the sessions at the Gilbane Conference on Content Management in San Francisco two week ago.

It is not a complete account for a number of reasons. First, I was unable break the laws of physics and attend every session. Second, I didn’t take as detailed notes as I should have, but this exercise should encourage me once again to do so. Third, I’m reserving some information for the entity that sponsored my attendance to the conference, my employer.

Lastly, what I learned was not even a complete picture. I’m partly over-reliant on my own prejudices. And I am still bringing together my understanding with that of the communities of the Content Management Professionals (which had a big presence at the conference) and the Association for Image and Information Management (which had not, their conference starts tomorrow in Philadelphia).

I only belabor the obvious above, because in theory, with a few hundred people attending a conference, one might think that enough would people would post online and this could all be aggregated constructively (see my previous post, Conference Markup). The best we have today is this linear list of 25 posts compiled by Frank Gilbane. 

Among the things I missed was this: Seth Gottlieb’s comment: "This year, I heard several people say that people had over-bought content management solutions and were having difficulty adopting them due to complexity and poor usability." That’s a key point to miss. That’s data I’d pay to see aggregated, compiled, and verified.

Tutorial

I began the conference at Bill Trippe’s DITA tutorial. I perhaps would have benefited more from Tony Byrne’s introduction to CMS’s, but Bill turned out to be an excellent session leader. Then again, I probably could have covered it in half the time. But learning an XML document type is not merely about learning the schema; it’s about learning the history and the governance behind it. In short, DITA is for technical documentation; it was developed at IBM; it’s been successfully handed off to OASIS.

Part of the demo showed the XMetal editor, which supports WYSIWYG editing of XML documents. I hadn’t been aware of the progress in editors, so I made sure to look for that through the exhibition.

Enterprise Search Track

ES-2 "New Search Technologies" Francois Bourdoncle gave a humdrum talk about his product Exalead. It only became interesting during the questions when he followed it with a demo. Compare searching for BPM via Exalead vs. via Google. Lacking an algorithm like PageRank, Exalead doesn’t show the most popular choices. But it does show some related search terms quite clearly; many point to software (Business Process Management), while a few relate to music (Beats Per Minute).

David Bean followed with a look at search technologies. He described a spectrum of search technologies. The search we’re used to uses statistical methods to merely found out conceptual information about documents. On the other side of the spectrum are searches that determine factual data — who did what to whom with what and when? This requires linguistic analysis.

He then introduced Attensity, and demonstrated its graphical UI for linguistic decomposition. That means it builds sentence trees as you type– subject-verb-object. If you feed the system hundreds of thousands of sentences from news stories, it displays what subjects performed what actions to or with which objects. Even before he explained such, it was clear that this work was of high interest to the intelligence community. 

This met a type of search that Bean had in mind when he introduced his talk: "I don’t know what I’m looking for, but I know it when I see it."

One of the attendees asked whether one or both of the tools would shed its complex interface and use a simple layout "like Google." But, as I told Francois afterwards, when people say "complex," the defense should be that it’s "intricate." A golf cart’s dashboard is much simpler than that of a conventional automobile, but who would toss out the car’s dashboard for something simpler? We’re in a strange moment in time of Google Uber Alles, but that shouldn’t cloud our thinking. (See this 2005 IEEE Distributed Systems article in which I was quoted).

Both search tools, in fact, were doing more than trying to find the most likely piece of information. Instead they were enabling the user to do some simple cluster analysis to see patterns in the data. It’s also called "data mining" and can best be done with structured data. Web documents are, through the lowest common denominator, still mostly unstructured.

"The Analysts Debate" tracks

There was actually no such track, but this combines the keynote with another session on Tuesday morning. I didn’t take good notes at all on these: shame on me. I was mostly found myself listening for key phrases that I wanted to react to.

The keynote address was actually the superstars of Tony Byrne of CMS Watch, Dan Farber of ZDNet, Gene Gable, Bill Rosenblatt of DRM Watch, and Charlene Li of Forrester. I was familiar with some of Charlene’s writings. She expressed some frustration trying to set up WordPress, so I gleefully emailed this point over to the Drupal folks.

I followed with session (CT-1), for more analysts debating. What perked my ears up near the end a question of where blogs and wikis fit into CMS’s. One analyst offered the dichotomy of formal vs. informal, but that clearly didn’t work. So I made sure "I’ve written about this, and have shared it with the blogger community" (to some extant), and then clarified that the key difference is that most business content– technical documentation, policies, code, etc– is normative, we are always seeking to improve it. Blog content is often narrative in that one person writes it and they move on. All five analysts picked up a pen and wrote down what I said. (see Normative and Narrative).

Rolling ahead on that, I asked where anyone would fit email, as it hadn’t really come up during the conference. The answer came back that it couldn’t be ignored, for compliance reasons. But what I was looking for was a suggestion that companies look to consider the "death of email" that groupware entrepreneur Ray Ozzie and others had called for years ago.

Content Management Case Studies

I attended these as I was curious what I’d need to do as we seek a solution at our own company.

CM-1 "What to Watch Out for When Starting a New CMS Project": This was the first session. Scott Wolfe gave a monotone delivery of boilerplate project management advice. Much to my delight, he included the anecdote about perhaps the only project management story which inspired a tragicomic movie: the Army’s development of the Bradley Fighting Vehicle (The Pentagon Wars). Rahel Baille spoke about some more specific issues of helping users adjust to change.

CT-4 "Introducing CM Technology from IT’s Perspective": These were a couple of talks by Brad Raasch, CIO of John Deere, and Garry Beatty of the city of Boise. These were both solid, giving high level overviews of the steps they took in their projects. What I missed, looking back, was specific numbers on the size of teams, size of budgets, the hidden issues that come up, etc.

CTW Keynote "Enterprise Panel on Best Practices & Implementation Strategies": Pat Tiernan of Hewlett-Packard and Marc McQuain of Vindico Retail spoke. I listened more closely to Tiernan. His presentation rattled off some hard numbers which any CIO ought to appreciate: 2 million versions of documents reduced to 50,000; 18 repositories to 3 and soon to 1.

CM-8 "Business Issues: Compliance & Security": As the last session, I was worried that everyone’s energy would be completely sapped, but it was the best-executed session of the conference. It started humbly with presentations by three vendor CTOs showcasing their technology. Afterwards, Linda Burton pulled the fifteen of us in attendance in to sit around a couple of tables to discuss some of the . Amongst of all of the presenters I’d heard, David Parry of McLaren was stellar in providing crisp examples. He offered up an illustrate of a typical company that hadn’t gotten around to implementing content management: the ‘G-Drive’ was their document repository, and multiple "final" copies of the same document abounded. The worst offender was a customer who had 180 versions of the same document. (and if that doesn’t get you worked up, you’re not in the right industry!)

I followed up on my questions about email from the day before: how does one deal with email, where so much knowledge? I floated– can I describe MS Word and MS Exchange as the chief impediments to securing, processing, organizing information? The attended gathering liked that formulation.


Overall, the lessons were clear. Get executive sponsorship. Work with all of the users in understanding their processes. Understand the processes and structures of information. Though I wish I’d heard more horror stories to pass along.

Blog-Wiki Track

This is an area I have some expertise on, so I went to three panels to hear the presenters and keep ’em honest. I had met Bill Trippe and Bill Cava earlier (and also had no doubts about their integrity!), so I skipped their panel, and caught up with them afterwards.

BW-1 "Enterprise Blogs and Wikis"

Thierry Barselou gave a description of how his company, Ipsen Pharmaceuticals, was using "enterprise blogging." They set up a pilot project to collect competitive intelligence. Curiously, they added in a somewhat intricate workflow to control what content was seen by whom.

Rod Boothby of Ernst & Young described. His presentation ran to 77 slides, which is four times the length of Gilbane suggested limit. He slapped some weak points in there; I challenged some in the questions, and I’ll go through more of them here. One was the premise that companies thrive on innovation, and innovation thrives on blogging. Sounds good to me, but what about Edison’s rejoinder that genius required 99% perspiration? And he never quite explained why it was the blog specifically.

All he said was it would be smart to copy Google. Part of what we’re supposed to copy is Google officially letting its employees spend 20% on projects of their own interest. Hmm. Are they blogging, or programming, or doing other research?  Boothby later said that Google employed the "Wisdom of Crowds" which made a complete hash of both concepts. In the after-panel discussions, Boothby conceded my points on Surowiecki’s book: that crowds can sometimes lead to the undesired outcome of information cascades.

The heart of Boothby’s presentation was such: every person, project and concept related to his research group had a page which was expected to be updated. Come to think of it, this sounds more like a wiki. Or it sounds like the approach Tim Berners-Lee had set up the webserver in the CERN lab a dozen years ago, before someone came up with the terms. Boothby even explained that they didn’t refer to it as a blog, just as "pages.

This session once again demonstrated the often lack of linguistic discipline among the web 2.0 adherents. It is a "blog" for purposes of a panel presentation, but they call it something different to the executives.

But Boothby is a blogger, and he’s got to support the movement. Check out this pseudo-graph, the continuum of IM to Email to Blogs. I asked him whether he might draw a line out form the blogs point to the future. After all, one has to keep innovating.

As it turned out, the E&Y system also had some intricate access controls built in. How did they do it? Oh, we hacked MovableType,  Boothby explained.

As Bill Cava remarked afterwards, it’s much easier to take ECM software and have it blog, rather than take typical blog software and build an ECM around it. You’re either going to pay up front for functionality, or pay afterwards in adding all the features that are supposed to be there. Granted, as an open source fan, I personally prefer the second way. This is a complex argument to make, and I can’t make it all here. Ultimately it boils down to how solid the architecture is of your platform.

Ultimately, the questions that "enterprise blogging" want to address are, how do you get information to the right people now, and how do you organize it for later retrieval? That’s the same challenge that ECM are solving as well. The main takeaway is not the technology of blogging, but the liberation ideology of it.

BW-2 "RSS in the Enterprise"

Greg Reinacker, CTO founder of NewsGator, gave an overview of RSS. Charlie Wood gave a quick presentation of industry accolades of the technology, and some of his firm’s consulting successes in setting up RSS with clients. I didn’t take good notes as I was preparing my stumper questions.

I asked: if there can be a feed from any person or topic, and any person or topic may something potentially interesting, how does this scale, from a usability perspective? This is an intuition that I have, and it’s something just about anybody whose read blogs has shared with me (more people are relying on AI-based aggregators now). I asked them how they did it. Greg said that he had 150 feeds, but doesn’t follow them all. I think that proved my point.

Also, I asked how his company was dealing with the RSS governance confusion, which I’ve been writing a story on. Greg gamely advised to "pick one." That they are very similar to each other, and that they are all underspecified, makes it difficult to choose.

I haven’t chosen one myself– but this is where one would expect the founder of a company in this space to have some leadership in. Nonetheless, he gave an answer which was good enough for the forum he was talking to. And perhaps good enough for me to finish my investigative series on RSS.

My new friend Susan, who sat in on the talk, asked me in the exhibit hall afterwards: "What does RSS look like to a user?" One of the vendors showed her.

BW-3 "Improving the Effectiveness of Business Blogs"

I very much looked forward to this panel. Salim Ismail had founded PubSub and has been instrumental in the Structured Blogging effort — both are essential efforts in the continuing effort to fix RSS.

Pub/Sub, he explained in the talk, was the third stage of evolution of information exchange. Start with point-to-point messaging (Email and IM). There’s no way to multicast information effectively unless you post to a place where others will find it– otherwise known as request/ response (HTTP). This protocol is fine for the web, but it’s been ungainly for RSS, which was never correctly design for many to many distribution; it uses too much bandwidth. A better model from distributed software is called Publish/Subscribe (or Pub/Sub for short), where content is published to a central system and then pushed out to users according to their subscriptions.

(Incidentally, I hadn’t caught the news until after the conference that Salim had left PubSub just two months before).

Theresa Rigli gave a brilliant talk comparing and contrasting taxonomies and folksonomies. There have been popular keynote polemics against taxonomies, and this was not one. I have no glosses to add to her talk; it simply presented the full picture of how an organization needs to negotiate between a controlled vocabulary and new terms that users come up with for indexing.

In the questions, I stuck another wedge into RSS, one that I felt would be specific interest to the enterprise users in the audience. RSS includes no element to indicate whether content is new, modified, or deleted. Of course RSS is underspecified, Salim answered. At present, you have to create different feeds to represent each of these, or just extend RSS on your own. Soon we’ll have a whole ecosystem of ECM vendors and aggregator software (e.g., NewsGator, PubSub) who all will have an RSS+ (excuse me, Atom+) which will implement separate syndication features.

Here’s a growing wiki page of RSS extensions. There’s so many standards, which do you choose? Who will take the lead? Businesses like Newsgator, PubSub, Microsoft, Google? Or industry consortia like AIIM, PRISM, IPTC?

Bill Cava pressed Salim on some of the big advantages of Structured Blogging. Salim stressed the part about allowing people to post their data where they want (e.g., their own blog) rather than on places like Amazon, epinions. Oddly, he didn’t go into the data-aggregation possibilities.

Conclusions


Much of the talk in the background of the blog/wiki track– and in posts like this afterwards– was how the big vendors have been missing the boat on blogs and wikis. The criticisms often focus on the marketing messages. I’d be more impressed if the content management leaders got together and discussed RSS/Atom, whether it works well enough, and what’s really needed for enterprise syndication.

Maybe that’s happened; I don’t know. All I know is that over the next few days, the trade association of content management, the AIIM, is meeting in Philadelphia, while the blog-friendly Syndicate crowd is meeting in New York.

I’m not going to ask vendors whether they support "blogging." I’m going to ask how easily they allow users to create casual content, and how well it can be indexed for others to find it (as blog tools don’t care as much for the latter as they do for the former). I’m not going to ask whether they do "wiki." I can simply ask whether they support in-form editing, clean URL aliasing, and collaborative editing (Drupal does all three, which is why I set it up internally at my company. Beyond that, I’d be curious how easy the workflow and access control lists can be customized, since blogs and wikis fundamentally care about neither.

Ironically, at the Beyond Broadcasting conference on public media at the Berkman Center this past Saturday, I was in a workshop which utterly blew my mind. The participants were invited to sketch out their own content process workflows. Now, presumably, this is something that companies do on their own, rather than at conferences. But it seemed like such a terrific exercise to do in an open setting.