Explaining my Fondness for Explicit Content

I was invited to participate a little while back in the Collective Intelligence FOO Camp held at the Googleplex and organized by Hal Varian, Tom Malone, Tim O’Reilly, and Gary Flake.  If you’re wondering what the whole thing was about, so were we attendees. Our closing session was devoted to trying to define exactly what ‘collective intelligence’ was. The most popular explanation came from Kim Rachmeler "The network knows what the nodes do not." In a piece of brilliant showmanship she also offered the near-haiku "The nodes know nothing. The nodes know all. Both are true."

I volunteered to host a discussion on "What does corporate America think of collective intelligence." We were happily belaboring the topic when Tim O’Reilly walked in, listened for a while, then essentially stated that we were barking up the wrong tree by focusing on blogs, wikis, tags, prediction markets, and the other standard tools.

He said that we should be concentrating on implicit, not explicit, user-generated information. And he offered a bet that implicit would turn out to be the more valuable of the two types. Much subsequent discussion considered whether this was a false dichotomy, but after reflection I don’t believe that it is. To explore the issue, let me start by offering my own definitions of the two:

Explicit user-generated information is information that people knowingly and deliberately generate by contributing to online platforms. Examples of explicit information include a blog post or comment, a wiki edit, a vote or rating, a trade in a prediction market, a link, and a tag. 

Implicit user-generated information is information that people unknowingly generate as they work online. It’s the digital fingerprints or traces that people leave as they follow links, look at content, consider one product then buy another, etc. This data can be aggregated to show what’s popular, what’s related, who has a good reputation, etc. My impression is that the collection and analysis of implicit online information grew out of Web analytics (clickstream data) and eCommerce recommendations (customers who bought [shopped for] this also bought [ended up buying] this). I find these recommendations tremendously valuable, and they’re entirely implicit.

Another type of implicit information is the aggregation of individuals’ explicit contributions. Two of the best-known examples of this are Google’s PageRank algorithm and tag clouds like those at del.icio.us and Flickr. As I wrote earlier, people create links and tags largely out of self-interest, but these activities have substantial group-level benefits; they reveal the overall structure of online content and so help everyone navigate and find information efficiently. Tools like PageRank and tag clouds turn online content into an emergent system —  one in which structure clearly exists and changes over time, but that structure can’t be inferred from examining the work of any single actor, and the actors themselves are unaware of the overall structure (just as is the case with an ant colony, one of the classic examples of an emergent system). 

The concept of emergence suggests a quick ‘sniff test’ for whether a given piece of digital information should be considered explicit or implicit. If it’s shown to the people who generated it, would they say "Oh, yeah —  I knew that" or would they say "I had no idea!"? If the former, it’s explicit. If the latter, it’s implicit. 

I also want to emphasize a few other distinctions related to user-generated content that might be relevant for decision makers:

  • Individual-level contributions (blog posts, tags, shopping cart additions)  vs. group-level ones (wiki edits, trades in a prediction market). The difference here is that others are directly affected by the latter type, and so probably more likely to make their own contribution in response. 
  • Above-the-flow contributions vs. in-the-flow ones. Again, the latter are more likely.
  • Altruistic contributions (edits to another workgroup’s wiki) vs. self-interested ones (trades in a prediction market, which are intended to increase the value of an individuals’ portfolio). Here again, the latter seem more likely. 
  • Deliberate actions (rate, vote, trade, post)  vs. passive ones (click, browse). Same story.
  • Currently private (emails) vs. invisible (clicks) vs. public (comments). Users can’t really complain about the latter being made visible, and they probably won’t complain too much about the middle category, as long as it’s anonymized. But technologies that analyze currently private information in hopes of making or suggesting connections might be trouble. I’ve heard of a few corporate efforts to analyze employees’ email traffic in order to say something like "You seem interested in protein folding / ISO 9000 certification / declining CD sales / whatever.  We know of other people in the company who are interested in the same thing. Would you like an introduction to them?" I appreciate the intent behind such efforts, but wonder how they’ll be received. Many people consider their email boxes to be private (I know I do) and might not like the thought of their employer peering into them, even with the best of intentions.  At the same time, though, many of us (myself included) don’t mind the thought of Google scanning our emails in order to serve us ads, so the situaion is fluid.

So was O’Reilly right that implicit is more valuable? During our discussions at the CI FOO, John Riedl pointed out that because impicit information is typically so much more voluminous it can be more valuable in aggregate. But I think that even if Tim is right, his wager is of more academic than practical interest. 

This is because no matter which side of the bet you come down on, the smart move is to encourage explicit contributions. Doing so will lead to more implicit content in two ways. First, as Riedl pointed out there will be a huge amount of implicit content generated as a byproduct of the explicit content —  think of all the possible ways to look at Wikipedia article creation and editors. Second, more online content of any form means more browsing and passive consumption. This browsing yields another body of clickstream-ish implicit content —  for example all of Wikipedia’s page views. 

So if you’re a believer in the power of explicit user-generated content, encourage it. If on the other hand you’re a believer in power of implicit information, encourage explicit user-generated content because that’s the best way to get what you really want.

What have you and your organization learned from explicit and/or implicit information that you would not have known otherwise? Leave a comment, please, and let us know.