Implicit over explicit

There are two types of metadata (data about data) floating around out there. Implicit and Explicit.

Explicit metadata is data that someone attached to an object because they were asked to do so in order to charitably improve the system.

Implicit metadata is data that someone attached to an object in the natural course of interacting with that object.

Actions that create explicit metadata include:

  • Rating a video on Youtube.
  • Rating a song in your music player.
  • Digging a website on Digg.

Actions that create implicit metadata include:

  • Watching a video on Youtube.
  • Buying a product on Amazon.
  • Skipping past a song in your music player as soon as it gets annoying.

The massively important, and often overlooked, thing about implicit metadata is that it’s generally trustworthy. It’s like the results of a double-blind scientific study. Explicit metadata on the other hand, while often useful, is always in doubt. It’s like the results of an exit poll during an election. People lie. People are stupid. People are remarkably un-self-aware. Going the explicit route exposes you to all of these problems. It’s icky.

Let’s say you wanted to know which users on Yahoo! Answers are experts in a certain topic. Which of these two approaches should you take:
1) Ask each user what they’re an expert in. (Explicit)
2) Look at all of the user’s answers and see what topics they’ve actually gotten a lot of best answers in. (Implicit)

Let’s say you wanted to be able to tell which websites are interesting and which ones are not. Which of these two approaches should you take?
1) Ask people to give a thumbs up to websites that they like. (Explicit)
2) Look at which websites people have chosen to link to in other websites. (Implicit).

The correct answer in both cases is without a doubt 2. In the 2nd example, 2 is the PageRank approach that spawned the massively successful 800 pound Google gorilla. 1 is the approach that spawned the interesting but relatively tiny and easily spammed Digg.

After PageRank, my other favorite example of the excellent use of implicit metadata is Flickr’s “interestingness“. As the site grew, identifying high quality pictures in the system became a more important problem. To figure out which pictures to highlight and which photographers to drive attention to, they had two choices:
1) Ask people which pictures they think are cool with a rating system.
2) Figure it out algorithmically based on already available information.

They chose 2, and it’s awesome. Just like PageRank, nobody knows exactly how, but it just works. It’s like magic. Interesting pictures just bubble up to the top. It’s nearly impossible to game, despite the incentives to do so (ego, money, etc.). They can change the rules anytime they want and they have a ton of information to work with (# Comments, who commented, # views, who viewed, what camera was used, which websites link to it, time to first comment, etc. etc.). If they had chosen to rely instead on explicit ratings, I think Flickr would be a spammy, porn-filled, lower class venue today.

The approach they opted for is one that any social site is capable of taking. You just have to generate and track the right kind of metadata.

Don’t ask your users to tag stuff for the benefit of the community. Instead, make tagging a way for them to organize their own stuff. Don’t give people a rating system. Instead, give them tools to save and share their favorite stuff.

Then you just have to sit back and watch which items they actually save and share. Each action is an implicit thumbs up.

The beauty of it is that instead of giving your users another chore (rating), you’ve instead improved their experience by giving them new, useful tools. In an effort to gather good implicit metadata, you’re actually often forced to improve the user experience. It makes you go for the win-win. It’s the right way to do things.

