Bad Movie Twins Bad Movie Data Analysis #2

This is the second is a series now where I’ve been breaking down wide-release bad movie data through time. The previous installment can be found here. The conclusion there was that one should split Rotten Tomatoes data at around 1998 since pre-1998 and post-1998 (when Rotten Tomatoes was established) behaved much differently. At the end I suggested I wanted to start looking at more recent trends in bad movie releases. This analysis focuses on whether studios have become better at recognizing bad films and either not releasing them (or releasing them to Video on Demand (VOD), which to us is equivalent) or dumping them into the classic bad movie dump months (January, February, August, etc.).

Once again, let’s briefly describe the data set. I collected every film released to over 600 theaters from Box Office Mojo. I only included films released to more than 600 theaters (“wide” according to Box Office Mojo) in this analysis as that is one of our qualifying metrics. I had collected the Wikipedia, IMDb, and Rotten Tomatoes links for these films prior to the previous analysis. This analysis ended up being the first step in thinking about a model based on this data, a model that could, eventually, tell us things like e.g. “Here is a set of eight films being released this February, which films are most likely to be bad, should we watch one of those films, or should we wait until March.” It could also tell us a “fair yield” for a years worth of bad movies, and eventually help identify VOD films which should qualify according to their properties (e.g. “In 2010 this film would have been released to 2000 theaters, but in 2020 it is released successfully to VOD”).

Initially I was curious about whether there was an identifiable trend (outside of general yearly trends) in bad movie releases by month. My initial prior was: I think it makes sense that classic bad movie dumps like January are getting worse in general, and that previous semi-dump months like February and March and getting better, and thus reinforcing the monthly differences we’ve seen previously. This was based on the recent discovery that zero wide release films received less than forty percent on Rotten Tomatoes during June and July 2018, which makes it extremely likely that 2018 will become the first year since the establishment of Rotten Tomatoes to not release at least 52 films with 40% or lower on Rotten Tomatoes widely (a requirement for the continued existence of BMT for all eternity, naturally).

The first question to be asked though is the above: Are we just seeing less bad movies recently? Are these films just being released to VOD (or not released at all)? I split the released by Rotten Tomatoes score to get a sense of how the groups have been changing in the last 20 years:

RawYearData

So the answer to whether there have actually been a lot less bad released recently is no I think, the number of bad releases in the past ten years has been rather stable, but there are a few crazy things of note in this data. First, that in 2007 there were over 50 wide release films to get below 20%, which is insane. The collapse of the bad movie industry coincides with the financial collapse, which I don’t think is a coincidence, I think that studios making films like Redline with ill-begotten fortunes went out of business in 2008 and simply have not come back. Second, the number of films with Rotten Tomatoes scores above 80% has ballooned. I think that is more likely a case of multiple compounding factors, namely: (1) the MCU and other franchises are now consistently releasing good-to-great films multiple times a year; (2) more consistent wide releases for independent films; and (3) Rotten Tomatoes has become bigger and in general the largest films in a year are getting more and better reviews (as we saw in the last analysis). I do think it is a combination of all of those things. Regardless we can use these films per year numbers to produce adjusted films per year (in order to prevent general year-to-year trends obfuscating the monthly trends I’m interested in):

badMovieAdjustmentFactorSplit

Easy enough. I’m generally interested in three things. First, the average number of films released in a given month in each Rotten Tomatoes category. Second, the trend in these same numbers. And finally, the trend in the bad movie share for a given month. With these three plots I think we can get a clear picture of the traditional bad movie dump months, the trend in those months, and the trend in our bad movie probability in order to better inform our BMT Live! choices in the future.

totalFilmSplit

I just wanted to get an idea of good and bad months traditionally. So this is the average films released across all 20 years in each TomatoMeter category. The dotted lines are the average films released across all five categories, and if you draw a line along the category values you can get a general sense of how much a month released good or bad films. Notably January, February, April, and very slightly August generally release bad films. November and December are the big months for good films. So how has this been changing (adjusting for yearly trends in general)?

adjTotalFilmPercentile

Here it is quite interesting. Most months are in general a wash, specifically from May to September really doesn’t have much of a trend. But it looks like January is getting worse, April is getting better, October is getting slightly worst, November is getting a lot better, and December is getting a lot worse. Perhaps November is become the main month where Oscar films are being pushed, and December is starting to clear out for larger fish (namely Star Wars) leaving bad Christmas kids’ films? April getting worse could also be a product of more summer films getting released in February and March (a la the MCU). Note that these trends are formed using the yearly adjustment factors in the second plot. Interestingly this is getting mighty close to how one would form a Rotten Tomatoes score model, so … that could be coming down the pipe.

All interesting and good things to know. Finally, since it is most important to know which months might be good for BMT I also plotted the “share” of bad movies (the percentage of a year’s worth of bad movies, <40% on Rotten Tomatoes released in a given month) with a trend line:

adjBadFraction

This reinforces some of the things said above: January is, somehow, getting worse with about 12% of the bad movies released in that month; and April and November are both getting much much better in general. Other trends are a little less clear when you look at it this way, specifically with all of the noise it is pretty unclear whether October and December are actually getting worse or not. July is almost definitely a mirage, literally zero bad wide release films came out in July this year so that +41% is going to take a huge hit if I recalculate next year.

All of this is super interesting. If I were to try and fashion some rules it would be:

(1) For the first BMT Live try and get a good January and just run with it, it is very likely to be the best bet; (2) January-March and August-October are prime time for bad movies and we might want to consider doing two good Lives in each of those spans if/when they become available; (3) It seems likely that April-July are going to be very dry forevermore, so it shouldn’t be surprising when 2018 repeats itself (missing the Spring BMT Live! because nothing became available), see rule number 2;

All good guidelines. In a single sentence: We have to get a little loosey goosey with our BMT Live!s because bad movies do seem to be released predominantly during certain months, and the trend seems to be reinforcing itself.

BMT:CSI:SVU (We’re the Special Victims) #3: Razzie Prep, A Look Back

This BMT:CSI:SVU was written around October 1, 2015 during the beginning of preparations for the Razzies. It is always difficult to determine which movies are more important to watch in theaters or right as they come out on DVD, so this short study was just an initial look at how we might connect the BMeTric to real Razzie results.

The problem the Bad Movie Twins face every year during Razzie preparations is the difficult choice of which movies are bad enough and big enough to earn the almost-meaningless dishonor of being nominated for a Razzie. As voting member we take our duty far more seriously than we should. So how best to determine which movies, prior to nominations, deserve our attention? That is where this comes in.

Alright, to start, the most important point during Razzie Prep is the moment the prenominations arrive. That is when you actually know what smaller group of movies you are dealing with (as opposed to the ~600 movies released to theaters in a given year, it is whittled down to around 30). I’ll have to go to the wayback machine (thanks Internet Archive) to determine vote/rating counts on January 1 of a given year of study because that is roughly when prenominations are known.

The method: Get the BMeTric for “all” released movies based on approximated IMDB votes and rating from 1/1/2015 via the internet archive (and a simple linear extrapolation from the nearest two points archived from around that date). Separate out the movies prenominated, nominated, and the winners for the Razzies 2015 from that year and do a side by side ranking based on how well they did in the Razzies and our BMeTric.

In order to do this I also needed to define a Razzie Score. I decided that all moves in a given year should have a score that sums to 100. I decided to then split the score into three equal parts: 33.3 for all the winners, 33.3 for all the nominees, and 33.3 for all the prenominees. In 2015 there were 108 prenominees, 45 nominees, and 9 winners (I counted combinations, like Cameron Diaz nominated for both The Other Woman and Sex Tape, as 0.5 wins/nomination/prenominations for each of those movies). So a win was worth 3.7, a nomination 0.74, and a prenomination 0.3083. I’ll adjust this in the future if it doesn’t seem to work, but there is far too little data to really make a real model I think. Here are the results for 2015:

Razzie Analysis-1

So I have two main takeaways which really is one big takeaway. First, note the over-performers (movies that scored high in the Razzie Score, and lower on the BMeTric): Saving Christmas, Transformer 4, TMNT, A Million Way to Die in the West, and Expendables 3 mainly. These all are what I call “easy targets”. Kirk Cameron, Michael Bay, Megan Fox, Seth MacFarlane, Sly Stallone. It boosts their score in the Razzie voters’ eyes. On the flip side look at the unnominated list. Those are the unnominated movies with a BMeTric over 25. The yellows highlight horror films and the greens are Christian films. First, we need to stay away from horror films, Jesus Cristo. But to get back on track really basically all those films are low budget, and low budget really means: No big targets!

So really there is one big thing that gets you that Razzie Score: Targets … BMTargets. I’ll leave it there. Where I’ll want to look to in the future is perhaps a Predicted Razzie Score. This involves two things. Mainly I’ll have to determine BMTargets, and how that contributes to the score. Also, I’ll need to actually work on the time-independent BMeTric to get a populuarity rating without knowing the vote/rating count ahead of time (obviously very important). Once I have those I think I’ll be able to determine with …. accuracy is a strong word. But I think I might be able to identify “likely” Razzie targets.

BMT:CSI:SVU: We’re the Special Victims #2

This is a continuation of the long-term IMDb data analysis using the Internet Archive. Thanks Internet Archive! You can see part one of this series here. Cheers.

‘Ello everyone. A few months back (or a few days with regards to this website) I tried to solve the BMysTery of the mysterious inflection point in IMDb Data. Don’t know what I mean? The short run down is that a lot of movies seems to have two slopes, one for growth before 2011 and one for after. The previous post explored that and came to a (I think) reasonable conclusion. So what is all this about then? Well, I have a ton of data just lying around and something just kicked up and itched my brain. Time for the long story.

You guys know Material Girls right? Hilary and Haylie Duff vehicle, pretty big deal. Well, every time we do a preview for a movie we generate trajectories for both IMDb rating and votes through time. Usually this results in a scream of “WHY?! Why has the rating of this terrible film gone up over time?!” And typically it was left there, because hey, people have different tastes, and maybe it is just kind of a trait of the data. But then Material Girls!

MaterialGirls_RV

First, holy moley that 2011 inflection. Even the rating has an inflection! This was a huge red flag for me. Second, the rating jumps 2.5 points! That is patently absurd. Through all of this I couldn’t help but think maybe …. it was related to this recent blog post by fivethirtyeight. But then I was looking through some of my very old programs and stumbled onto a very prescient comment:

#Look at that variance! Awesome, basically regression to the mean.
#Movies are superlative when they come out
#End up regressing both up and down to the mean

So that’s what this (short) entry will look at: The regression to the mean in IMDb ratings. Something I clearly knew about literally 7 months ago then managed to forget pretty much instantaneously … yeah, I’m an idiot.

First start with a plot of all of the rating data I’ve got:

Ratings

Nonsense. But you can kind of see that things condense as time goes by. But it is all easier if you plot the rating change (over ten years) by the initial rating of the movie. I’ve included a regression and Material Girls is marked out by a blue square:

RatingsPlot

Nice. Pretty much the entirety of the crazy jump in ratings is explained by regression to the mean. Just look at Material Girls. And funny enough the rating at which it crosses over, 6.0, is kind of the cut off point for bad movies as well, which is fun.

It is interesting, especially looking at the first plot: the rating doesn’t just regress by some exponential, it pretty much follows the voting trajectory. But … yeah, they aren’t that correlated:

RatingsVotesPlot

The rating can’t move without votes, so it following the vote trajectory through time I think is just a consequence of that inherent underlying connection. And I think that’ll just about do it for that. The regression is interesting, but probably at this point hard to utilize for good. It could be used in tandem with a vote number trajectory predictor to try and predict vote/rating trajectories into the past. But predicting votes is the rub, and I’ve found rather difficult.

But I declare this BMysTery closed! It wasn’t that hard, I mean, I apparently knew the answer seven months ago, but yeah, bad movie IMDb ratings tend to go up (and the opposite for good movies) over time. It isn’t people waking up and realizing movies are better than their rating, it is just regression to the mean. And Material Girls probably wasn’t brigaded by guys.

BMT:CSI:SVU: We’re the Special Victims #1

Editor’s Note: This analysis was initially performed in October of 2015. The plots have been cleaned up and updated (so that I don’t look like I’m incompetent). At the time we were basically just starting in on the use of historical data to explore the evolution of bad movies through time, an ongoing project. We are, of course, indebted to the Internet Archive which has been diligently collecting this data for years. Cheers.

Welcome to BMT:CSI:SVU (we’re the special victims). This section uses high-tech forensic science (not really) to solve the mysteries of BMT … you could even call this a BMysTery. For the first in this series I naturally decided to present my boldest analysis to date. It asked the question: What really has happened to the voting and ratings through time on IMDb? The initial idea was to start in on trying to predict ultimate vote counts based on an initial vote trajectory … I got waylaid a bit. I started by taking every movie listed on OMDB that has a release year of 2005 (658 movies). I wanted a solid 10 years of samples. I went to the Internet Archive and then took 20 vote/rating sample (the two nearest archived pages on either side of the new year from 2006 to 2015) for every movie that had a valid page (the vote number being greater than 5) prior to January 1, 2006 and after January 1, 2015. I finally just linearly approximated vote/rating pairs for each New Year Day from 2006 to 2015.

The resulting data set had 471 movies (yeah … I took about 10000 page calls, sorry Internet Archive)  each with approximate vote/rating pairs for 10 data points (New Years Day from 2006 to 2015).

RatingTime

The rating plot isn’t that interesting, it shows that the ratings have dropped a little over time without much of a trend. Although this doesn’t jive with a lot of the individual plots I generated previously which suggested rather strongly that the rating tends to increase with more votes being cast. Instead I ended up finding that there isn’t a correlation between how the rating changes and the current rating or number of votes, something to be investigated further in the future I think.

If you normalize the voting trajectories based on the number of votes on New Years Day 2015 though you get a more interesting result.

VotesNorm

Basically, it looks like the samples are split into two groups: movies that gained most of their votes after 2010 and those that gained most votes before 2010. This is in fact a trend: there is the odd anomaly in IMDb data whereby movies seemed to have an inflection point sometime in 2011. This can be more easily seen using the sum of all the votes, and in 2011 the total number of votes all of a sudden starts to increase:

VotesMean

Say what? That doesn’t quite jive with what we saw in the trajectories before. But it is true, a bunch of movies have either the 2011 inflection visible (red) or they appear to have leveled off since initial release (green, a much more expected trend). Here are the top and bottom 10 ranked by deviation from a linear trendline:

VotesExtrema

 

So in order to quantify the difference between the two trajectories I note that the mean normalized trajectory is roughly linear:

VotesNormMean

 By correcting for this the normalized and corrected vote count trajectories now go from zero to zero. If you sum across the normalized and corrected trajectories then normal trajectories will have a positive value and those with the 2011 Inflection will have a negative value. I called this value S (for sum, inventive I know).

The thing I thought was interesting was if you then plot the S value against the log(votes) from IMDB you see a rather strong correlation between the two:

SCorrelation

 Against the rating it is a bit more unclear. And while I won’t get into the nitty gritty (mutual information, distance correlation, and partial correlations all support what I’m about to say by the way, the Pearson correlation is reported above the plot and the data does appear linear so this is probably sufficient), basically I would say rather confidently that whether the 2011 Inflection is present or not is strongly linked to the popularity (number of IMDB votes) of the movie in question. Specifically, more popular movies are more likely to have the inflection.

This result is probably the strong indicator that a previously held belief about the 2011 inflection is true: the inflection has to do with IMDB expanding their smartphone/internet presence and seeing a sudden influx of new customers in 2011. Why? Because these new customers are more likely to vote on the initial wildly popular movies than something like Crispin Glover’s directoral debut. So for movies released prior to 2011, the most popular movies are much much more likely to see gains (and thus the inflection).

An alternative theory would perhaps be the international angle. As the international user base grows those users are also much much more likely to vote on the wildly popular movies (which are more likely to be available in foreign languages and released internationally). There are two reasons I think this is less likely. First, the inflection is seen in both international and US vote statistics (scraped from the much less robust Internet Archive data set of the IMDB ratings pages, and normalized by the maximum value in the windowed year average):

NationalVotes

Indeed, looking at the percentage of votes from international users and the increase (proportionally) is rather linear in reality, no inflection:

InternationalPercent

Second, I think there would be a lot more foreign language outliers in that case. A case where users from, say, Hong Kong increase, then those movies (with a much smaller number of votes) would have also seen the inflection. But in general I don’t think that was true (although I haven’t looked too closely, but I think I would have noticed that).

So that’s it. I declare this BMysTery closed! I think it is definitely due to a sudden influx of new users probably due to the widespread adoption of smartphones and development of the IMDB app. I should point out it still could be bots, because bots might try and fake out IMDB’s automated purging algorithms by voting on (likely) popular movies. But I don’t really see why that is more likely than my conclusion which I think makes total sense. Case closed I say!

Jamie’s Peer Review

I agree, particularly since you can find that around that time is when the IMDB apps became available. In June of 2010 (less than a year before the inflection) the android app launched in conjunction with a IMDB Everywhere initiative, where the company made a concerted effort at expanding their presence on mobile devices. The only thing that is a little curious is that the inflection seems pretty exact (I would wonder what kind of distribution we are talking about for the inflection point. Is it always the same point? Or are we seeing the inflection as a range starting around June 2010?). Would be weird if the initiative started in June 2010 and showed no effectiveness for a half a year before seeing a dramatic effect all at one time. Would still beg the question as to what specifically caused the dramatic effect… just curious. Probably still related to the initiative though.

BMeTric Live! #1: MPAA Ratings

[This is an ongoing series concerning the development of a new bad movie metric actively being researched by Patrick. This is installment #1]

The MPAA Rating Factor

To quote a famous bad movie researcher:

[The major flaw with the BMeTric is that it] is ever changing. As the current movie vote/rating data changes so does the baseline. And as a movie’s vote/rating changes its BMeTric changes.

Patrick, Distinguished Professor of the Bad Movie Science and Technology, badmovietwins.com (February 2016)

This flaw also makes calculating the BMeTric of a film prior to release impossible due to its reliance on temporal data. Specifically, the number of votes and rating on IMDb is non-existant prior to release and unreliable until such time as the vote count reaches a relative steady-state. This project hopes to remedy this difficulty by forming a time-independent BMeTric, or BMeTric Live!

To start, it is important to identify some of the parameters we might look at in a time-independent BMeTric. A good starting point is looking at the available data from omdbapi. The parameters from that are: title, year, rating, runtime, genre, release date, director, writer, cast, Metacritic score, IMDb rating, IMDb votes, poster, plot, language, country, awards, tomatoMeter, RT reviews, RT fresh reviews, RT rotten reviews, RT consensus, RT userMeter, RT userRating, RT userReviews, box office gross, production company.

Most of these can be immediately eliminated. Specifically, because they are temporal: Metacritic score, tomatoMeter, IMDb rating, IMDb votes, awards, RT reviews, RT fresh reviews, RT rotten reviews, RT userMeter, RT userRating, RT userReviews, RT consensus, and box office gross are all gone. Analysis of the title, plot, or poster is also rather difficult so throw those guys out for now. The year is rather useless since it will only be applied to upcoming releases, and country and language as well because we tend to watch US releases in English exclusively. That leaves a rather svelte initial set: MPAA Rating, runtime, genre, release date, director, writer, cast, production company. That list honestly look really good to me, kind of exactly what I would hope to incorporate into a time-independent BMeTric.

So let’s quickly look to the first one just to see what we can see. MPAA rating. First and foremost: Only consider movies with a MPAA rating of G, PG, PG-13 and R. Simple rule. To quickly point out: the G, PG, PG-13 and R cover all movies we’d likely consider for the time-independent BMeTric And as far as backtesting and fitting are concerned all movies (except maybe Showgirls?) also fit into those four major categories. There might be some bias because PG-13 was introduced in 1984, but besides the rare few borderline cases (Gremlins, Temple of Doom, etc.) I personally don’t think it will effect the data all that much. Films prior to 1980 are rarely considered by BMT so any excluded because they don’t have an MPAA rating shouldn’t throw things off too much.

So to start, a hypothesis: I think G will have a lower BMeTric in general and PG-13 will have a higher BMeTric in general. This is because I think G-rated films will just generally have less votes, and PG-13 rated films will cover films appealing to a “wide base” of viewers (more votes and lower rating).

Initial results: Here, all I wanted to look at basically is the mean, median and major quantiles (25 and 75) of the rated groups relative to the the wider population (all rated films). Box and whisker is pretty standard (although I went sans-whisker which is typically 5 and 95 percentiles. For those interested it is because the BMeTric is constrained to be between 0 and 100 and is more exponentially shaped than normally shaped, so the 5th percentile is pretty much zero and the 95th is around 85 for every rating and just makes the graph look dumb while providing no information), and guess what?

RatingAnalysis

Totally nailed it! If this isn’t obvious from this incredibly information dense figure, long-story short: G-rated is lower, PG-13 is higher, the other two a enigmatic. I put error bars on the mean (the dot) and then error bars on the median (the red line) and the 25th and 75th percentile (bottom and top of the box respectively) all via bootstrap (although the data isn’t much different using a central limit theorem approach, I checked, it was just easier to subsequently get error bars on the factors). As a first pass we can generate a factor for each rating:

RatingFactors

Rating Mean Factor 25th Percentile Factor Median Factor 75th Percentile Factor
G 0.73 0.43 0.54 0.64
PG 1.00 0.67 0.83 1.01
PG-13 1.26 1.25 1.35 1.34
R 0.91 1.07 0.97 0.90

Pretty much in line with what would be expected. Note that the PG and R are a little mixed up. It is basically because the R rating tend to have a more narrow distribution (higher 25th percentile, lower 75th percentile), and PG-rated movies are the opposite. It is something I want to make sure to account for in the future if possible so I listed all of them. If I were to make a metric right now though I would take the easy way out and just adjust the mean, that’s it. I was also surprised at how definitive it all was. The error bars kind of leave no doubt: PG-13 rated movies have a higher BMeTric in general, and G-rated movies have a lower BMeTric in general. Nailing it all day over at BMIT (natch)!

What contributed to the upgrades and downgrades? Looking to the votes and ratings individually:

  • IMDB Rating / log(Votes): 6.34 / 3.93 (Total)
  • IMDB Rating / log(Votes): 6.61 / 3.81 (G)
  • IMDB Rating / log(Votes): 6.37 / 3.85 (PG)
  • IMDB Rating / log(Votes): 6.28 / 4.14 (PG-13)
  • IMDB Rating / log(Votes): 6.33 / 3.88 (R)

Basically, G-rated films have the highest ratings and lowest number of votes, and PG-13 movies tend to have the highest number of votes and lowest ratings. I guess people don’t like to just go onto IMDb and slam kids’ movies? Otherwise pretty much exactly my hypothesis. Interesting stuff. I’m also kind of impressed with the BMeTric’s ability to cut through the noise. Look at the rating/votes number and then the BMeTric numbers and it seems to me that the BMeTric is pretty good at not only reducing the dimensionality of the data, but also its combination amplifies the numbers as well. It is nice.

How might we use this to help determine BMT Live!? To give an example maybe we need to choose between two movies. On Apirl 29 2016, for example, maybe you are struggling to choose between Garry Marshall instant classic Mother’s Day and the video game adaptation Ratchet and Clank. You don’t know anything about the films except that Mother’s Day is PG-13 and Ratchet and Clank is PG. This analysis would suggest that Mother’s Day would be the better bet (and I would agree). But I’m also willing to bet that this won’t end up being a hugely important factor, how we mix these factors will come at the end of the study. I am liking the path we are on at the moment though, I am a bit more confident that a metric with some value might come of this analysis.