[This is an ongoing series concerning the development of a new bad movie metric actively being researched by Patrick. This is installment #1]
The MPAA Rating Factor
To quote a famous bad movie researcher:
[The major flaw with the BMeTric is that it] is ever changing. As the current movie vote/rating data changes so does the baseline. And as a movie’s vote/rating changes its BMeTric changes.
Patrick, Distinguished Professor of the Bad Movie Science and Technology, badmovietwins.com (February 2016)
This flaw also makes calculating the BMeTric of a film prior to release impossible due to its reliance on temporal data. Specifically, the number of votes and rating on IMDb is non-existant prior to release and unreliable until such time as the vote count reaches a relative steady-state. This project hopes to remedy this difficulty by forming a time-independent BMeTric, or BMeTric Live!
To start, it is important to identify some of the parameters we might look at in a time-independent BMeTric. A good starting point is looking at the available data from omdbapi. The parameters from that are: title, year, rating, runtime, genre, release date, director, writer, cast, Metacritic score, IMDb rating, IMDb votes, poster, plot, language, country, awards, tomatoMeter, RT reviews, RT fresh reviews, RT rotten reviews, RT consensus, RT userMeter, RT userRating, RT userReviews, box office gross, production company.
Most of these can be immediately eliminated. Specifically, because they are temporal: Metacritic score, tomatoMeter, IMDb rating, IMDb votes, awards, RT reviews, RT fresh reviews, RT rotten reviews, RT userMeter, RT userRating, RT userReviews, RT consensus, and box office gross are all gone. Analysis of the title, plot, or poster is also rather difficult so throw those guys out for now. The year is rather useless since it will only be applied to upcoming releases, and country and language as well because we tend to watch US releases in English exclusively. That leaves a rather svelte initial set: MPAA Rating, runtime, genre, release date, director, writer, cast, production company. That list honestly look really good to me, kind of exactly what I would hope to incorporate into a time-independent BMeTric.
So let’s quickly look to the first one just to see what we can see. MPAA rating. First and foremost: Only consider movies with a MPAA rating of G, PG, PG-13 and R. Simple rule. To quickly point out: the G, PG, PG-13 and R cover all movies we’d likely consider for the time-independent BMeTric And as far as backtesting and fitting are concerned all movies (except maybe Showgirls?) also fit into those four major categories. There might be some bias because PG-13 was introduced in 1984, but besides the rare few borderline cases (Gremlins, Temple of Doom, etc.) I personally don’t think it will effect the data all that much. Films prior to 1980 are rarely considered by BMT so any excluded because they don’t have an MPAA rating shouldn’t throw things off too much.
So to start, a hypothesis: I think G will have a lower BMeTric in general and PG-13 will have a higher BMeTric in general. This is because I think G-rated films will just generally have less votes, and PG-13 rated films will cover films appealing to a “wide base” of viewers (more votes and lower rating).
Initial results: Here, all I wanted to look at basically is the mean, median and major quantiles (25 and 75) of the rated groups relative to the the wider population (all rated films). Box and whisker is pretty standard (although I went sans-whisker which is typically 5 and 95 percentiles. For those interested it is because the BMeTric is constrained to be between 0 and 100 and is more exponentially shaped than normally shaped, so the 5th percentile is pretty much zero and the 95th is around 85 for every rating and just makes the graph look dumb while providing no information), and guess what?
Totally nailed it! If this isn’t obvious from this incredibly information dense figure, long-story short: G-rated is lower, PG-13 is higher, the other two a enigmatic. I put error bars on the mean (the dot) and then error bars on the median (the red line) and the 25th and 75th percentile (bottom and top of the box respectively) all via bootstrap (although the data isn’t much different using a central limit theorem approach, I checked, it was just easier to subsequently get error bars on the factors). As a first pass we can generate a factor for each rating:
|Rating||Mean Factor||25th Percentile Factor||Median Factor||75th Percentile Factor|
Pretty much in line with what would be expected. Note that the PG and R are a little mixed up. It is basically because the R rating tend to have a more narrow distribution (higher 25th percentile, lower 75th percentile), and PG-rated movies are the opposite. It is something I want to make sure to account for in the future if possible so I listed all of them. If I were to make a metric right now though I would take the easy way out and just adjust the mean, that’s it. I was also surprised at how definitive it all was. The error bars kind of leave no doubt: PG-13 rated movies have a higher BMeTric in general, and G-rated movies have a lower BMeTric in general. Nailing it all day over at BMIT (natch)!
What contributed to the upgrades and downgrades? Looking to the votes and ratings individually:
- IMDB Rating / log(Votes): 6.34 / 3.93 (Total)
- IMDB Rating / log(Votes): 6.61 / 3.81 (G)
- IMDB Rating / log(Votes): 6.37 / 3.85 (PG)
- IMDB Rating / log(Votes): 6.28 / 4.14 (PG-13)
- IMDB Rating / log(Votes): 6.33 / 3.88 (R)
Basically, G-rated films have the highest ratings and lowest number of votes, and PG-13 movies tend to have the highest number of votes and lowest ratings. I guess people don’t like to just go onto IMDb and slam kids’ movies? Otherwise pretty much exactly my hypothesis. Interesting stuff. I’m also kind of impressed with the BMeTric’s ability to cut through the noise. Look at the rating/votes number and then the BMeTric numbers and it seems to me that the BMeTric is pretty good at not only reducing the dimensionality of the data, but also its combination amplifies the numbers as well. It is nice.
How might we use this to help determine BMT Live!? To give an example maybe we need to choose between two movies. On Apirl 29 2016, for example, maybe you are struggling to choose between Garry Marshall instant classic Mother’s Day and the video game adaptation Ratchet and Clank. You don’t know anything about the films except that Mother’s Day is PG-13 and Ratchet and Clank is PG. This analysis would suggest that Mother’s Day would be the better bet (and I would agree). But I’m also willing to bet that this won’t end up being a hugely important factor, how we mix these factors will come at the end of the study. I am liking the path we are on at the moment though, I am a bit more confident that a metric with some value might come of this analysis.