Predicting Best Picture Nominees and Winners
Background:
- The Academy Awards or The Oscars are an annual American awards ceremony honoring cinematic achievements in the film industry
- Awards are given in various categories, with probably the most prestigious being the award for Best Picture
- Given its prestige, Best Picture nominations and awards provide tremendous exposure
- A Best Picture nomination alone has been estimated to increase box office revenue by ~$6.9 million
So, given some data about movies, can we predict which ones will receive Best Picture nominations, and which of these will win?
Findings:
Short Answer: No
Long Answer: No, not really
- My best model is 98% accurate at predicting Best Picture winner / nominee status
- BUT… does so by saying no movies are ever nominated (much less picking a winner) yielding a 0% true positive rate
- Essentially, there are SO MANY more movies that are not nominated than are that the model can not be forced to actually make predictions without severe overfitting to the data
- My best model is 98% accurate at predicting Best Picture winner / nominee status
Nevertheless, I was able to identify some factors which are strong predictors of Best Picture nomination success. In order of predictive power these are:
- Director(s) - Directors with a history of consistently directing Best Picture nominees (e.g. Martin Scorsese) strongly improve a movie's odds
- Stars (Leading actors / actresses) - Similarly, stars with a history of strong Oscar level performances (e.g. Jack Nicholson) also strongly improve a movie's chances
- Writer(s) - The same holds true for writers
- Genre - Westerns, bio-pics, war movies, and romances are more likely to be nominated, whereas animated movies are less likely
- Country - Foreign (non-US) movies are disadvantaged
- Language - Similarly, foreign language movies are also disadvantaged
- Critical Opinion - Critical reception is important only a gatekeeper punishing poorly rated movies, but once a movie passes a certain threshold and is deemed "good enough" more positive critical reception has very little effect
- Public Opinion - The same holds true for reception by the general public
- Number of Nominees in Year - Obviously, if a movie is competing since 2009 when the number of nominees was increased, it stands a slightly higher chance of being nominated
- Month of Release - Movies released late in the year also have a very slight advantage
- Director(s) - Directors with a history of consistently directing Best Picture nominees (e.g. Martin Scorsese) strongly improve a movie's odds
Report:
Data:
Technologies:
- Python
- BeautifulSoup
- Selenium
- Pickle
- Pandas
- Statsmodels
- scikit-learn
- MATLAB
- UNIX
- Github
- Excel
- Powerpoint
Methodolgy:
- Scrape and clean IMDB list of movies by year for 1990 - 2014 (~120k movies)
- Scrape Wikipedia for list of all Best Picture nominees and winners, clean this data, and merge with the above
- Scrape individual IMDB movie pages for detailed data on each movie, clean this data, and merge with the above
- Scrape Google search results for list list of frequently mentioned directors, actors, etc. to use as boolean variables for each movie
- Convert categorical variables to boolean variable(s)
- Process Best Picture status to be our dependent variable (Win = 10 points, Nominated = 5 points, Not-Nominated = 0 points)
- Split data into a training and testing set
- Oversample Best Picture nominees and winners slightly so that they're a slightly larger portion of our training set
- Regress Best Picture status vs. all features in our data set, examine outputs, and compare predictions to actual outcomes in our test data
- Use backward elimination and stepwise regression to arrive at best possible model
Code:
For more details checkout my Github repository.
Written on October 11, 2015