BMT:CSI:SVU: We’re the Special Victims #1

Editor’s Note: This analysis was initially performed in October of 2015. The plots have been cleaned up and updated (so that I don’t look like I’m incompetent). At the time we were basically just starting in on the use of historical data to explore the evolution of bad movies through time, an ongoing project. We are, of course, indebted to the Internet Archive which has been diligently collecting this data for years. Cheers.

Welcome to BMT:CSI:SVU (we’re the special victims). This section uses high-tech forensic science (not really) to solve the mysteries of BMT … you could even call this a BMysTery. For the first in this series I naturally decided to present my boldest analysis to date. It asked the question: What really has happened to the voting and ratings through time on IMDb? The initial idea was to start in on trying to predict ultimate vote counts based on an initial vote trajectory … I got waylaid a bit. I started by taking every movie listed on OMDB that has a release year of 2005 (658 movies). I wanted a solid 10 years of samples. I went to the Internet Archive and then took 20 vote/rating sample (the two nearest archived pages on either side of the new year from 2006 to 2015) for every movie that had a valid page (the vote number being greater than 5) prior to January 1, 2006 and after January 1, 2015. I finally just linearly approximated vote/rating pairs for each New Year Day from 2006 to 2015.

The resulting data set had 471 movies (yeah … I took about 10000 page calls, sorry Internet Archive)  each with approximate vote/rating pairs for 10 data points (New Years Day from 2006 to 2015).

RatingTime

The rating plot isn’t that interesting, it shows that the ratings have dropped a little over time without much of a trend. Although this doesn’t jive with a lot of the individual plots I generated previously which suggested rather strongly that the rating tends to increase with more votes being cast. Instead I ended up finding that there isn’t a correlation between how the rating changes and the current rating or number of votes, something to be investigated further in the future I think.

If you normalize the voting trajectories based on the number of votes on New Years Day 2015 though you get a more interesting result.

VotesNorm

Basically, it looks like the samples are split into two groups: movies that gained most of their votes after 2010 and those that gained most votes before 2010. This is in fact a trend: there is the odd anomaly in IMDb data whereby movies seemed to have an inflection point sometime in 2011. This can be more easily seen using the sum of all the votes, and in 2011 the total number of votes all of a sudden starts to increase:

VotesMean

Say what? That doesn’t quite jive with what we saw in the trajectories before. But it is true, a bunch of movies have either the 2011 inflection visible (red) or they appear to have leveled off since initial release (green, a much more expected trend). Here are the top and bottom 10 ranked by deviation from a linear trendline:

VotesExtrema

 

So in order to quantify the difference between the two trajectories I note that the mean normalized trajectory is roughly linear:

VotesNormMean

 By correcting for this the normalized and corrected vote count trajectories now go from zero to zero. If you sum across the normalized and corrected trajectories then normal trajectories will have a positive value and those with the 2011 Inflection will have a negative value. I called this value S (for sum, inventive I know).

The thing I thought was interesting was if you then plot the S value against the log(votes) from IMDB you see a rather strong correlation between the two:

SCorrelation

 Against the rating it is a bit more unclear. And while I won’t get into the nitty gritty (mutual information, distance correlation, and partial correlations all support what I’m about to say by the way, the Pearson correlation is reported above the plot and the data does appear linear so this is probably sufficient), basically I would say rather confidently that whether the 2011 Inflection is present or not is strongly linked to the popularity (number of IMDB votes) of the movie in question. Specifically, more popular movies are more likely to have the inflection.

This result is probably the strong indicator that a previously held belief about the 2011 inflection is true: the inflection has to do with IMDB expanding their smartphone/internet presence and seeing a sudden influx of new customers in 2011. Why? Because these new customers are more likely to vote on the initial wildly popular movies than something like Crispin Glover’s directoral debut. So for movies released prior to 2011, the most popular movies are much much more likely to see gains (and thus the inflection).

An alternative theory would perhaps be the international angle. As the international user base grows those users are also much much more likely to vote on the wildly popular movies (which are more likely to be available in foreign languages and released internationally). There are two reasons I think this is less likely. First, the inflection is seen in both international and US vote statistics (scraped from the much less robust Internet Archive data set of the IMDB ratings pages, and normalized by the maximum value in the windowed year average):

NationalVotes

Indeed, looking at the percentage of votes from international users and the increase (proportionally) is rather linear in reality, no inflection:

InternationalPercent

Second, I think there would be a lot more foreign language outliers in that case. A case where users from, say, Hong Kong increase, then those movies (with a much smaller number of votes) would have also seen the inflection. But in general I don’t think that was true (although I haven’t looked too closely, but I think I would have noticed that).

So that’s it. I declare this BMysTery closed! I think it is definitely due to a sudden influx of new users probably due to the widespread adoption of smartphones and development of the IMDB app. I should point out it still could be bots, because bots might try and fake out IMDB’s automated purging algorithms by voting on (likely) popular movies. But I don’t really see why that is more likely than my conclusion which I think makes total sense. Case closed I say!

Jamie’s Peer Review

I agree, particularly since you can find that around that time is when the IMDB apps became available. In June of 2010 (less than a year before the inflection) the android app launched in conjunction with a IMDB Everywhere initiative, where the company made a concerted effort at expanding their presence on mobile devices. The only thing that is a little curious is that the inflection seems pretty exact (I would wonder what kind of distribution we are talking about for the inflection point. Is it always the same point? Or are we seeing the inflection as a range starting around June 2010?). Would be weird if the initiative started in June 2010 and showed no effectiveness for a half a year before seeing a dramatic effect all at one time. Would still beg the question as to what specifically caused the dramatic effect… just curious. Probably still related to the initiative though.

Advertisements

1 thought on “BMT:CSI:SVU: We’re the Special Victims #1”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s