Star Wars: The Good, the Bad and the Ugly

There’s often talk about separating the Star Wars movies into three tiers, where you end up with a relative, coarse-grain ranking of the seven films: mid-range, better and worse. Or perhaps we should call them “the good, the bad and the ugly”.

As a stroke of luck, my academic background is in computer science, with my honours research partly focusing on data mining. The other day it occurred to me that we can easily apply something called k-means clustering, which basically partitions data into groups based on how close together they are, in order to determine these tiers with some modicum of objectivity. In this analysis, I’m going to use Rotten Tomatoes in order to estimate the quality of each film.

People argue that Rotten Tomatoes is needlessly simplistic, but I personally see it as a form of Monte Carlo integration—it estimates a film’s quality by seeing how many critics and audience members actually think it is generally “good”. It’s like estimating the area of a circle by seeing how many random samples fall within its circumference. (You can get surprisingly close to the real area of a circle with only 300 samples, for what it’s worth, which gives you an idea, by analogy, of how good the Rotten Tomatoes scores are at estimating quality.)

With Rotten Tomatoes, you get two estimated scores: one from critics (with a mean of around 65%, or around 60% for “top critics”) and one from audiences (with a mean of around 70%). This gives us two dimensions to work with. Just to be conservative, I’ve decided to only use “top critics” (because they’re less likely to engage in click-bait) and apply additive smoothing in order to account for the small sample sizes; the audience score I left as-is.

So… plotting the seven films and partitioning them into three clusters gives the following result:


(The overall percentage score is calculated as the relative “distance” from getting 0% from both critics and audiences versus a perfect score from both.)

We thus end up with three tiers of Star Wars films, each with “prototypical” scores based on the centre of each respective cluster:

The Good (green cluster)

Includes: Star Wars: A New Hope, The Empire Strikes Back, Return of the Jedi, The Force Awakens

Prototypical critics score: 80%
Prototypical audience score: 94%
Overall prototypical score: 87%

The Bad (blue cluster)

Includes: Revenge of the Sith

Prototypical critics score: 66%
Prototypical audience score: 65%
Overall prototypical score: 66%

The Ugly (red cluster)

Includes: The Phantom Menace, Attack of the Clones

Prototypical critics score: 40%
Prototypical audience score: 59%
Overall prototypical score: 50%

In particular, The Force Awakens is essentially the prototypically “good” Star Wars movie in terms of overall response, slotting comfortably into the space occupied by the original trilogy. For a popular re-evaluation to trigger it falling into the same cluster as Revenge of the Sith, the audience score would need to fall to 78%—in other words, over 28,000 new negative audience scores (i.e. three stars or less) need to be registered, with no further positive scores, just to move The Force Awakens into the realm of mid-tier Star Wars offerings.

But the overall picture that emerges is even more interesting. Revenge of the Sith is definitely an outlier, neither as “bad” as the first two prequels nor as “good” as the original trilogy, and amongst fans who prefer a more nuanced understanding of the saga, that’s usually how things shake out. Reducing the analysis to two clusters, however, puts Sith squarely in the “bad” pile—that’s typically how the general public see these movies.

In any case, barring some radical re-evaluation, The Force Awakens is a return-to-form, shifting the balance such that there are now more “good” Star Wars films than “bad” by any reasonable measure. Now let’s see where Rogue One falls…

Addendum: For your further edification, here’s a dendrogram showing exactly how the clusters break down hierarchically. In terms of critical/audience response, you have two broad categories that branch off until you end up with the “best” (Star Wars: A New Hope and The Empire Strikes Back) and the “worst” (The Phantom Menace and Attack of the Clones). The Force Awakens is the closest yet that we’ve come to seeing the quality of the first two films.


In order for The Force Awakens to end up on “the other team”, the audience score needs to drop to an incredible 48% from its current score of 90%. The Phantom Menace currently sits at 60%; Attack of the Clones sits at 58%.

However, something interesting happens when the audience score for The Force Awakens dips below 77%: Revenge of the Sith suddenly switches sides, looking more like a genetic relative of the more respected Star Wars movies and less like Episodes I and II. Not that this is likely to occur, mind you, but it’s more likely than a sudden swing where more than half of the audience dislike the new film.

What we can say, then, is that we can pretty much rule out a re-evaluation whereby The Force Awakens is lumped in with the prequels (or worse). As a long-shot, it’ll force a favourable re-evaluation of Revenge of the Sith, assuming that audiences sour on the new film over time.

