EDA Analysis on Popular Non-English Movies

Prompt: A company wants to break into the movie market with original content. Come up with recommendations on how to do so and what the best approach would be with research and EDA(Exploratory Data Analysis).

Our Approach:

Nowadays a lot of people do not just watch American or English movies. Foreign or non-English films are quite popular and are worth investing in.

Hypothesis:

Our thoughts going in are as follows:

  • Growth of the US Spanish-speaking population should be noticeable in Spanish movie popularity.
  • With the world experiencing a tech boom and so much interest and conspiracies around tech and AI, there may be an increased interest in the Sci-Fi Genre.

Data Collection:

For this project, we collected our data from TMDb(The Movie Database) using their API and from IMDb(Internet Movie Database) using webscraping. We collected data on Top Rated movies from the TMDb (The Movie Database) API and used the movie ids from there to create a second database with additional details from Movie Details and merged the datasets.

Data Cleaning:

Based on the data we had collected, it was hard to distinguish between American and non-American movies due to their being many movies with multiple countries involved with production. Therefore we decided to separate based on whether the movie was in English or not.

Data visualization and EDA:

When looking at the data and trends, we looked at the data from two angles. We looked at trends in relation to the language of the movies and we looked at trends in relation to the genre of the movies. When looking at the data, we grouped by decade to see if there were any trends across the decades.

Language trends/visualizations
Genre trends/visualizations

Conclusion/Final Recommendations:

Chinese and Japanese movies are among the highest rated and most profitable of non-English movies. And Sci-Fi and Action genres have the highest revenue and profits and have been trending upward over the last 30 years. Lastly, Animated movies, while not the highest in profits, have higher than average revenue and profits along with the second highest average ratings among all genres.

Data Science student at Flatiron School