Database Follies

I’ve been noodling for awhile a topic that’s been vexing me – the website Metacritic (www.metacritic.com).  Or, more specifically, the way the website goes about drawing conclusions regarding whether movies and music have been favorably reviewed or not.

For those of you unfamiliar, Metacritic provides a cross-section of reviews from a carefully-screened group of critics for the latest releases in film, TV, music and games. Metacritic uses black box Metascores to combine all of the individual critic scores into an overall grade for each item, so users can gauge the critical consensus at a glance.  It’s actually a pretty interesting site – I’ve been known to pass many an hour reading reviews for movies I will never see and for music to which I will never listen.  For instance, while pulling up the website to write this post, I just read an article titled “Ashton Kutcher’s Worst Movies.”  And I can never get that time back.  I’ll save you the trouble – it’s all of them!

Some Basic Math

I’ve noticed this trend for awhile, but here are just the facts from the website recently:

  • In music, under “upcoming and recent releases,” there were 248 albums listed.  Let me break down the numbers a couple of different ways:
    • The average Metascore was 72.
    • Using the Metacritic classification system, 89% of albums were classified as having “generally favorable reviews”, 11% were classified as “mixed or average reviews,” and 1 album (not 1% – 1 album) was listed as receiving “generally unfavorable reviews.”
  • In movies, under “now in theaters,” there were 158 movies listed.  The numbers here tell a somewhat different story:
    • The average Metascore is 57.
    • 44% of the movies were classified as having “generally favorable reviews,” 41% were classified as “mixed or average reviews,” and 16% received “generally unfavorable reviews.”

Random Questions

I look at this admittedly somewhat non-scientific database, and I can’t help but feel the results lack some face validity.  Is it fair to look at just movies and music overall?  Do I need sub-categories?  How do I create norms for each sub-category, and will they change over time?  What if I have the wrong sub-categories, are my results still valid?  What about particularly all the music (I think the movie reviews are more comprehensive) that isn’t even in the database?  Would that information make the album I want to buy next look better or worse? If the scores are so high, how is there any meaningful discrimination?

Superfluous Observations

  • Justin Bieber got better reviews than Band of Horses.  Are you kidding me?  This just confirms that what Elvis Costello said, “Writing about music is like dancing about architecture; it’s a really stupid thing to want to do.”
  • Apparently the idea of illusory superiority has invaded the arts, where now all the women are strong, all the men are good looking, and all the children are above average.

A More Robust Approach

What I’ve just described in terms of Metacritic is very much like what we do in market research with the way we leverage static databases. It’s also not that unlike the listener reviews that appear on iTunes, Amazon, and others.  Everything gets inflated because of an inherent selection bias – the people giving ratings (whether consumer or professional) tend to focus on ones they believe they would like, or because they want to be nice.

Forget movies for a second, I’m going to focus on music. Imagine if instead of compiling a composite of ratings, Metacritic or a website took consumers through the following process:

  • Tell me what you are listening to now, how often, and how often you think you’d listen to this new one?
  • How relevant and differentiated do you find this new album, in the context of what you currently listen to?
  • If this album were surrounded by these other choices, would you pick it or something else?

Seems far-fetched, but it’s a simple trial/repeat-type model, just like what we do in CPG.  Whether in CPG or other areas of life, the question isn’t whether you like something – it’s whether it would replace what you are currently doing.  I think Italy looks lovely, but I’d rather go to China (my example used to be Egypt, but, uh, not so much today).  Not buying that new Paul McCartney album, no matter how it scores in the database (a 63, in case you were wondering, classified as having generally favorable reviews), because I don’t want an album called “Kisses on the Bottom” showing up on any playlist.  Anywhere.  Ever.

Current Behavior, Competitive Clutter

This was a meandering way of making a point, but the static database problem runs rampant in Market Research, if you look for it – inherent in any rating where we ask people for a monadic evaluation without understanding current behavior and competitive context.  It’s not just some arcane research issue, it cuts to the essence of how consumers interact with products.

Next time you go to conduct a Market Research study, ask yourself these two questions.  (1)  Did I start by understanding current consumer behavior?  (2)  Did I force consumers to make a choice within a competitive context, by comparing to their current behavior?  If the answer to either of these questions is “no,” you might end up with the Metacritic problem.  Unless it turns out you truly do live in Lake Wobegon, and all your ideas are above average.