Game Developers and Game Reviewers in Ancient India. Not much has changed. Creative Commons copyright by Nagarjun.

If you take a casual glance around the media landscape you would be quite justified in thinking that the world of game reviewing is thriving.  There are lots of videogame publications in print, online and (tenuously) on TV, with lots of opinionating being directed at a lot of pixels across a wide variety of platforms.

In reality, video game reviewing is a disaster zone that is helping to ensure a steady supply of mediocre games that are enthusiastically embraced by a player population with frighteningly low expectations and a shallow fixation on gaming technology.

Is Anybody Listening to Me?
Setting aside the issue of quality for a moment, videogame reviewing’s major challenge is that as a cultural form it is almost completely ghettoized.  If you are a hardcore gamer you know where to go to get your reviews.  But if you are a non-gamer or even a casual gamer, the likelihood that you will ever come across a game review in some other forum not explicitly devoted to videogames is virtually nil.  I’m not going to spend a lot of time talking about this issue at this stage; suffice to say that this is a very different situation than that of any other cultural form.  You can be completely uninterested in reading anything but your morning paper, but as you drop crumbs and slop coffee on your newsprint (or iPad) there is a pretty good chance you will nevertheless stumble upon a review for a movie, a TV show, a book, a play, a sculptural exhibit, the local flower show, the high school science fair, but, in most cases, you won’t come across a game review.

Of course, the fact that videogame reviewing is firmly ensconced in the ghetto could have everything to do with the fact that most videogame reviews aren’t like the reviews for other creative endeavours.  Pop cultural critic Chuck Klosterman highlighted this in a 2006 piece for Esquirehe argued that a major problem for the world of videogames and a significant factor influencing their acceptance by our culture is that we have a plethora of game reviewers but no actual game critics.  In essence Klosterman argues that most game reviewing today is little more than consumer advice: narrowly focused description of the game in question emphasizing consumerist criteria (how much gameplay will you get out of this?) and leading  onlytoward a recommendation as to whether or not the game is worth buying. By and large Klosterman was correct at the time and his assertion still describes the bulk of the reviewing scene. Videogame reviews therefore  look strikingly different from even the mediocre criticism of other creative products that appear in newspapers and magazines.

And this is where game reviews contribute to their own isolation from the larger creative and entertainment discourse.  Because what is striking about every game review is how isolated the world of gaming is made to appear. Games are consistently treated as if they were created by people in a sealed biodome out in the desert and consumed by players in their own similar facilities. There is no acknowledgement that developers may have been influenced by cultural trends or that the games themselves might feed back through players into that culture, influencing our thinking about events, ideas, the human condition, anything.

You could, of course, argue that this is because games don’t in fact do any of these things but this blog is premised on not believing that easy argument.  It is rather the case that if you aren’t actively looking for something you don’t generally see it. Reviews of other media have their own sets of issues, of course. But even the writer of the most pedestrian book review, dedicated to simple plot and character description, often demonstrates at least a tacit awareness that the book is a cultural artifact shaped by an author’s awareness (or denial) of the world around them.

Every game is special in its own special way
The problem with game reviewing is also not just that they aren’t critical in the more comprehensive sense adopted by Klosterman, but that they aren’t even particularly critical in the narrow sense of being negative. Most games released every year, even those that sink without trace, are granted generally favorable reviews. This situation is exacerbated or, arguably, even ensured, by the fixation of game reviewers on numerical scoring. Whatever the form of the review (text or the increasingly popular video review) it is always accompanied by numerical scores, usually across several categories, tallied finally into an overall score, almost always out of ten. The appeal of this is the same as the appeal numbers have always had: the appearance of objectivity. The truth is the same truth that numbers have always possessed: the appearance of objectivity simply masks different forms of subjectivity.

The big problem with this practice is that when you have such a scoring system it is extremely difficult, for a variety of human nature factors, to give something a really low score. You can try this yourself with a group of friends. Watch a pretty bad movie and ask people to score it across different criteria out of ten and see what happens. If the movie gets below a five I would be surprised. Even people who are otherwise quite comfortable being a complete and utter bastard in other areas of their life seem inherently to want to grant things some redeeming value. For this reason they will often compensate in one category with a positive score for the lower ones in other categories (“this game sucked in every conceivable way but I give the graphics an 8!”).

To see how this works in practice, consider Doom 3 (2004), a game I consider to be the Madonna of the videogame world: derivative, massively overrated at the time of its release and still overrated today. Consider IGN’s review of the game, which, to be fair, with its 8.9 rating comes across as moderately restrained compared with the assessment by some other reviewers who clearly needed to cut back on the Red Bull. At the time the site used five criteria: presentation, graphics, sound, gameplay and lasting appeal. The game scored 10, 10, 9.5, 8.0 and 8.5 in these areas. We could quibble about these scores (I would hotly dispute them) and even raise questions about the definition of the categories (graphics and sound are clearly part of presentation so what exactly is that category?) but these things are all fairly typical of the gaming press who by and large seem to have come up with their scoring systems in the wake of an epic kegger.

In 2004 Gamespot (which gave Doom 3 what was at the time a ball-busting 8.5) utilized the following categories: graphics, sound, gameplay, value, and reviewer’s tilt.  What the hell is a reviewer’s tilt?  Gamespot’s own description is not particularly helpful: “While this component is not intended to be directly representative of our overall experience with a game, we use it to influence the overall rating one way or another, based on our overall experience. For instance, a technically impressive game that’s highly unoriginal may be tilted low, while a game with a truly outstanding story but an unremarkable presentation may be tilted high.”  Hmm, So this category isn’t intended to describe the overall experience with the game but is instead designed to describe the overall experience of the game.  Riiiight.  Many publications have (or used to have–Gamespot ditched the categories and adopted a single holistic score in 2007) a similar “fudge” category whose sole purpose seems to be to compensate for inadequacies in the other categories.  Or rather, to compensate for inadequate application by the reviewer.  After all, wouldn’t scores in the other categories reflect, say, a technically impressive (high graphics and sound scores) but mediocre (low gameplay and value scores) game?

The problem with any scoring system is that you actually need people with the balls to apply it. Here, for example, is the potted version of the reviewer’s reasoning for the 8.0 score in gameplay: “Run to the next room and shoot stuff. Go to the next room and do the same thing. Wouldn’t be any sort of problem if the enemies were more interesting to fight against.” So, let’s get this straight. In a game whose whole reason for being is to be a plotless run-and-gun experience, the core gameplay is repetitive and the enemies are not challenging. And you give it an eight!? Shouldn’t these be failure criteria? Why doesn’t this earn the game a score of 3? Or 2? To be fair, the reviewer announces themselves to be a moron. . .er, sorry, person of questionable judgment, in the first couple of sentences of the review: “DOOM 3 is a great game. Not necessarily for the gameplay aspects, but for the fact that my eyes and ears never went a moment without being completely entertained.” The point of a game is the gameplay. If I want a great audiovisual experience I will rent a movie. If the gameplay sucks, the overall game review should indicate that.

“But dude, that would just, like, totally harsh the mellow!” Yes, that would be point of being a critic. Your job as a critic is not to blow smoke up the arse of developers. Small wonder, then, that many of my students seem to have the tacit suspicion that game companies are paying for favorable reviews. While there is little evidence of this, game reviewers may in fact still be pulling their punches out of the worry that their jobs (and supplies of free games) will disappear if they write negative reviews. A groundless concern? It is difficult to say; there is a lot of lore out there that negative game scores hurt a company’s bottom line, and that some companies take action to avoid negative reviews but given the difficult (noted above) of prising any reliable information of game companies it is hard to determine the facts with any certainty.

But what this one example indicates—and sadly, it is entirely representative of review practices as a whole in the world of games—is that the entire system is not only poorly implemented but multiply broken.

  1. Many publications that use a category system don’t given any evidence of having thought out those categories in any meaningful sense.
  2. The categories themselves rarely appear to be weighted. The IGN example (and while Doom 3 is an older game IGN is still using this same system) is particularly egregious. The stereotype that even a lot of gamers have about themselves is that flashy graphics will trump good gameplay. By in effect granting graphics and sound a similar weighting to actual gameplay, review sites like IGN perpetuate that stereotype.
  3. The scoring categories for many review sites are skewed toward the upper end of the spectrum mainly by language that so negatively loads the range for 5 and below. Most sites define the 5-6 realm as mediocre or as games with significant problems. This ensures that the merely competent game is virtually assured of a score of 7 or above (I discuss this more below).
  4. All review sites suffer from a lack of quality control. This is not to say they should be ruthlessly trying to eliminate subjectivity, which is often the fantasy scenario invoked by legions of players who complain when a reviewer gives their favorite game a less-than-stellar review. Rather the point is that if you are going to have a ratings system, reviewers who don’t use it should be slapped. With regard to the Doom 3 example, IGN’s own scoring criteria describe a score in the 8.0 to 8.5 range as “great.” By the reviewer’s own admission, Doom 3 should have received somewhere in the 6.0 to 6.5 range (defined as “Okay. . .while this game is passable, it’s probably only worth a rental”) if you were being charitable, or more likely 5.0-5.5 (“Mediocre. This game is on the cusp of being bad.”).
  5. The review criteria are radically incomplete when compared with reviews of other media. The one category that should be there for every game review is Innovation (or Creativity, or, better still, Originality). In reviews of many other media originality and innovation are usually implicit but often explicit concerns. Are the characters in this novel singular or derivative? Does this film tell us anything new? The complete absence of originality as a scoring category reflects its complete absence as a meaningful concern in the more extended reviews. This in turn reflects the bad faith that is game reviewing’s collective guilty secret. If originality were a criteria, let alone a weighted criteria, most games that are released every year would get the crushingly mediocre scores that they so richly deserve.

Instead, the result of the current system is that the vast majority of games receive generally favorable scores, in the 7-9 range. You would be justified in concluding that the world is awash in innovative, competently developed and well executed games. Anyone who has actually played any games knows this to be far from the truth. This has far more to do with the fact that the scoring categories are skewed heavily toward the positive end.  I’ve already mentioned the scoring criteria for IGN for the 5 and 6 ranges.  Now let’s have a look at those from Gamespot.  A game that falls in the 7.0 to 7.9 range is described this way: “A game within this range is good overall, and likely worth playing by fans of the particular genre or by those otherwise interested. While its strengths outweigh its weaknesses, a game that falls in this range tends to have noticeable faults.”  A game in the 6.0 to 6.9 range has “certain good qualities but significant problems as well. These games may well be worth playing, but you should approach them with caution.”

Imagine yourself as a game reviewer confronted with a strikingly unambitious, thoroughly derivative game.  Notice how your scoring criteria tend to mention “problems.”  In the absence of any interest in originality problems are inevitably going to be defined as technical issues: glitches, clipping, voices out of sync, etc.  You look at the derivative shooter you’ve just finished playing.  Now there’s no way you would describe it as having “significant problems.”  Moreover, it hardly rises (or sinks) to the level of a game that should be “approached with caution.”  So suddenly we are into the 7 territory.  But look at the description of that category, for a second.  A game that scores 7.9, the equivalent of 79% can still have “noticeable faults.”  So these scoring criteria accurately predict what we do in fact see: a lot of ordinary, flawed games, that in fact get scores that would get them into an Ivy League college.

This situation explains why so many of Ben “Yahtzee” Croshaw’s reviews for The Escapist  are so excoriating.  Sure, going negative in a big way is part of his schtick.  But the major reason why he finds so many games wanting is that he is virtually alone in applying the criteria that no one else is willing to apply: originality.  Once you are willing to go down that road, however, you find that Yahtzee is, in most cases, strikingly accurate in his judgements: most games fail abysmally in this area, constituting re-skinned retreads of the great game ideas of yesteryear.  Almost no one is prepared to go down that road, however, because it would mean that developers, reviewers and players would all have to face up to some hard truths about the products they are making, professing to love, and pretending to critique.

Will this be on the test?
This doesn’t explain why the numerical scales that game reviewersemploy are so skewed and why the scoring rubrics are constructed in such a way that they push reviewers toward awarding high scores to mediocre games.

Earlier, I asserted that numbers haven an objective appearance that masks a deeply subjective nature.  What I mean by that is that numbers can function almost poetically, which is to say imagistically: we see them and they come with a variety of connotations.  Consider: if I am right about stuff in general 75% of the time, I would consider that a pretty good result (it is actually closer to 95%, but let that go).  On the other hand, being right 50% of the time is basically to admit to guesswork.  But if you have been weaned on baseball, you know that batting 500 is pretty damn good.  Batting 750 for an extended period of time puts you in the Heavenly Hall of Fame.  So numbers convey a certain image.

I can’t help noticing how similar the skewed reviewing categories and inflated scores are to the kinds of scoring routinely employed in US education (both high school and college).  I’m continually amazed at the bizarre range of scoring mechanisms employed by my academic colleagues, and this is even before we get into the completely unwarranted practice of “curving” grades.  Pedagogically, grading curves are indefensible.  If you have a lot of people passing your class, either you are doing a bang-up job as a teacher or the material is too easy.  If you have hardly anyone passing your class then a) maybe your students really haven’t mastered that material and should be apprised of that fact sooner rather than later, or b) you suck as a teacher, or c) the material is too hard.  Curving grades, of course, nicely avoids having to do any of the hard work of self-scrutiny and pedagogical retooling.  It also ensures that not too many of your fee paying customers (whom in the old days we called students) will not be fee-paying customers the following semester.

But where the world of education intersects with the world of game reviewing is in the arbitrariness of the scoring system and the problem of grade inflation.  I’m always fascinated by the angst that is produced when I ask my students to design one of their own assignments, and then evaluate it, which of course means that they have to set up a grading rubric.  I start hearing comments like, “But a B has to be a 79, because it is in all my other classes.”  Usually one of their classmates will respond with “But in my high school a B is. . .” or “Over in the business school a B is. . .”  What is happening here is students beginning to confront the fact that they have taken these arbitrary scoring systems as objective fact: the idea that there is an essential “B-ness” out there in the universe that people have researched/intuited and for which they have divined an appropriate numerical representation.  In reality, of course, any grading/scoring system’s purpose is simply to establish a consistent and replicable system.  Most of my students begin to understand this; some can’t, or won’t.

What is striking however is the way in which so many of the scoring systems that I hear about for other classes assign A grades for what seem to me to be pretty low scores: an 85, etc.  The reason of course, is obvious.  It allows John or Jane to rush home to Mommy and Daddy, their report card still moist from the sweat of hard work (i.e. cramming the night before) and to be hailed as a genius.  It allows people who do pretty well in a course to come away thinking that they are exceptional.  You can see the obvious connection with the world of games.  We’ve constructed a similar system that reassures game developers that they are doing an exceptional job when they present us with the merely ordinary.  It is hard to say whether we have the game reviewing scoring systems we do because of the influence of the US education system over the last couple of decades, or whether both are part of a larger cultural shift that has grown increasingly uncomfortable with the idea that excellence is really, really rare.

By now, the obvious answer will probably have occurred to many of you.  If there are some demonstrable problems with individual game reviewing sites, isn’t the answer simple?  Why don’t we turn to the power of the Crowd and worship at the altar of the great Aggregator?

Which brings us, inevitably, to Metacritic.