Sponsored By
Andreas Ahlborn, Blogger

March 4, 2013

9 Min Read


Showing M&Ms that agree to disagree


When I was 10 years old I started to systematically catalogue and categorize all my media. I think it started as a Valve to ease up the pressure of everyday schoollife by being constantly judged by teachers and parents to have sth. to judge myself.

I started with comics: I would have a small collection of favourite comic-books that I graded A, a bigger one graded B etc. With the years my methods got more refined and I dissected the media in their different parts: I would judge the penciling and coloring apart from the writing and developed crude formulas that weighted the “value” of a comic book like: 50% weight for the penciling, 30% for the writing, 10% for the coloring and an additional 10% for the cover. When I reached my teens I further developed this system with friends and applied it to all kinds of things: Music, Books, Movies. Our decision process was very heated and often we adapted our value categories, because it couldn´t be that a John Cougar Mellencamp-Record was numerical superior to any Springsteen-Record.  And yes, we numbered Girls, too (we didn`t got further in differentiation than Face/Body), years before Social Networks did it.

Without being aware of it, we (like millions of other teens worldwide that are obsessed with numbers) preconceived Metacritic and Facebook.

Would we have been so naive to assume that our subjectively to the discourse contributed Numbers actually measured something objectively? I hope not. It was always intended to explain something for ourselves about how we came to verdicts about good/bad quality we perceived in certain Artworks. It was not scientific and we never could have imagined at that time that our verdicts would be getting “more objective” the more we trained them. Our tools of evaluating Artwork improved with our knowledge, sure. You can call our decision-process more informed after we heard 1000 records, in comparison when we thought in the beginning Mike Oldfield and Chris de Burgh were to be considered the most valuable Assets of modern Popmusic. It was also very obvious that our Numbers changed all the time.

If the 10000-hours theory is even remotely true, I hate to bring it to the official critics: at this time and age practically every youth has clocked in this time with gaming, listening to records, watching films until her/his 25th birthday. We are all involuntarily experts when it comes to mass media.

Facts & Figures

I won`t lie to you, it took me some years to actually discover the fundamental flaw that lies at the core of Metacritic. Up until last week I never noticed that Metacritic is in fact a double morality-system. I would have never thought any institution could get away with so obviously fiddling their numbers.

What is wrong with this picture?


And what Explanation is given by Metacritic to justify this treatment?

Why is the breakdown of green, yellow, and red scores different for games?

The reason for this special treatment for games has to do with the games publications themselves. Virtually all of the publications we use as sources for game reviews (a) assign scores on a 0-100 scale (or equivalent) to their reviews, and (b) are very explicit about what those scores mean. And these publications are almost unanimous in indicating that scores below 50 indicate a negative review, while it usually takes a score in the upper 70s or higher to indicate that the game is unequivocally good. This is markedly different from movies, TV or music, where a score of, say, 3 stars out of 5 (which translates to a 60 out of 100 on our site) can still indicate that a movie is worth seeing or an album is worth buying. Thus, we had to adjust our color-coding for games to account for the different meaning of games scores compared to scores for music, movies and TV

I don´t get it.

A site that is dedicated to merge ratings from different systems is exactly then unable to do it properly when the named media uses the same basic 100 scale as itself?

If the overwhelming majority of gamecritics would agree that only the upper 5 values of their scoring system is used –meaning they are not as draconic as for example movie critics with the damnation of a bad movie, why wouldn`t they adjust their ratings according to that information? They have no problem with mapping multiple different star-thumbs ratings to their needs, why not doing the same with the gamemagazine-grades? Further recherche uncovered this article (“Understanding review scores in the meta critic age”). It basically states what another article on Gamasutra also hinted at recently: The Gaming industry is far more fixated on the Metacriticscore than any other.

Developers of big Studios may be somehow punished from their employers for mediocre scores and game magazines might run into trouble with advertisers if they underrate a Blockbustergame. So far so bad.

Imagine a Musicmagazine that rates all its records with a three star rating system: 0/1/2/3 meaning bad/mediocre/good/excellent and then stating that all genres classical/pop/jazz are treated likewise, except Heavy Metal, where 0/1/2/3 means inaudible/what?/I-can-hear-sth/That’s-better arguing that since heavy metal fans are to an overwhelming degree near-deaf, they can´t really appreciate anything else other than volume degrees.

Now imagine a college that rates all its foreign students categorically better than the native students to counter the disadvantage that English is not their first language. The foreign students end up getting nearly the same averaged degree then the native students but every headhunter secretly knows, that a Chinese B translates to an English C, thus the bias towards native students that might be inherent in the schoolsystem is only clouded not abolished.

In the end game critics will run into problems with this “Hype-creep”. In fact if you read most of the big magazines reviews you wonder if the critic has played the same game he has slapped a grade/figure on. Instead of speaking directly to the audience they tend more and more to encode their language with political correct terms. It`s a lot like in the business world, when bosses write in their credentials of your work performance: “He did his best”, but to be actually considered as an applicant to a new job the credentials should say: “He constantly excelled in his performance”. He did his best is managerspeak for “tried hard but failed”.

What would we expect to be the outcome if we made a forecast how over a large number of data samples the averaged critics would mirror this “correction”?  Games should end up with a higher numerical Metacritic rating compared to Movies, TV & Music, but with a similar color coded Metacritic rating: counting every red number as -1, yellow as 0, green as +1, such only emphasizing the positive/negative outcome.

Here is a sample of 11800 games vs. 8400 movies: (Highscores overall/Alltime).


The Problems with Games could be that the sample is "polluted" by the fact that it includes all platforms (mobile/consoles/PC) and therefore the result could be distorted, since successful (critically well received) games/franchises should tend to get get wider distribution across more platforms than unsuccessful ones, thus unfairly multiplying their influence. (After all we don`t get to count the 3D/48fps/normal Version of The Hobbit as 3 films which would be a comparable thing).


So at this point I´m not sure what this all means.

Are games actually on average of higher Quality than Movies? (Meaning while only every fourth Movie ends up being “good”, nearly every 3rd game does).

Are game critics easier persuaded (bribed) into favorable scores as some sources suggest?

Or do they have simply lower quality standards?

Are gamers per se expected to have higher trash-tolerance than movie-goers?

Despite the fact that the Moviegoer-crowd largely overlaps with the Gameplayer-crowd?

Am I being paranoid or is something rotten in the meta of critic?



Since many comments of this article indicate that its intention not to contribute to any kind of conspiracy-theory about what are the hidden reasons why Metacritic is "really" treating games different, don`t seem to have come across, that might totally be my fault due to the fact that English is not my first language, i will add an example how we could get rid of the double-standard system on the one side and taking into account that gamesmagazines have written themselves in the corner of a armsrace (scoreinflation) on the other, thus killing two angry birds with one slingshot.

It simply proposes an arithmetical recalibration, that can be done with 3rd grade school math.

Taken from a comment below:

I am well aware that maybe the core of the problem lies in the fact that by having the "same" measure (100) as Metacritic its simply more convenient to adapt the score instead of "recalibrating it". This could be either done by cutting unnecessseary appendixes (for example if the worst ever created game gets at least 20 points for the effort, then the 100 point game score system is effecrtively a 80 point one) and then equally dividing the gap. The best way in my opinion would be like this:

If we take 100 as the upper limit, 62.5 as the average pure yellow mediocre game, then the lowest score any game could get would be 25. We have 75 gameunits that have to be mapped to metacritics 100. So 1 metacritic point corresponds to 1.33 "gamecritic" points.


The new TR has at the moment a score of 85.
1. we have to subtract the 25 that is an unspoken bonus every gamecritc is used to count in.
-> TR has now an adjusted Score of 60.
2. Now we have to multiply the 60 with 1.33:
60x1.33 = 79,8 is roughly 80.

We now have effectively eliminated the need for a double-standard system.

Read more about:

Featured Blogs
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like