EDA Analysis on Popular Non-English Movies

An EDA project with Eric B

Zawar Ahmed
4 min readJul 27, 2020

Prompt: A company wants to break into the movie market with original content. Come up with recommendations on how to do so and what the best approach would be with research and EDA(Exploratory Data Analysis).

Our Approach:

Nowadays a lot of people do not just watch American or English movies. Foreign or non-English films are quite popular and are worth investing in.

However, deciding what foreign content should be made might not be as clear. In this project we explore how non-English movies have performed over time and see if we can narrow down the choices by looking at language and genre trends.

Hypothesis:

Our thoughts going in are as follows:

  • China’s economic boom should be reflected in Chinese movie profits.
  • Growth of the US Spanish-speaking population should be noticeable in Spanish movie popularity.
  • With the world experiencing a tech boom and so much interest and conspiracies around tech and AI, there may be an increased interest in the Sci-Fi Genre.

We’ll see if we’re correct or not later in our EDA

Data Collection:

For this project, we collected our data from TMDb(The Movie Database) using their API and from IMDb(Internet Movie Database) using webscraping. We collected data on Top Rated movies from the TMDb (The Movie Database) API and used the movie ids from there to create a second database with additional details from Movie Details and merged the datasets.

We were lacking some information that we thought would be helpful and useful in our analysis, namely gross worldwide revenue, so we collected that data from IMDb (Internet Movie Database) through webscraping and merged it into our dataset.

Data Cleaning:

Based on the data we had collected, it was hard to distinguish between American and non-American movies due to their being many movies with multiple countries involved with production. Therefore we decided to separate based on whether the movie was in English or not.

Before we webscraped the data from IMDb, we removed all movies where the original language was English, and we also removed all movies before 1990 so we could focus on the last three decades and see how things change as non-English movies are more widespread and available.

After we cleaned the data and dropped the groups mentioned above we had 1010 movies left in our dataset.

Data visualization and EDA:

When looking at the data and trends, we looked at the data from two angles. We looked at trends in relation to the language of the movies and we looked at trends in relation to the genre of the movies. When looking at the data, we grouped by decade to see if there were any trends across the decades.

Language trends/visualizations

With regards to language, what we found was that Chinese and Japanese movies were top earners over past 10 years. We also found that Chinese movies seem to be on an upward trend with their profits rising each decade.

There were a couple of other interesting finds. There was a spike in the success of Spanish films during the 2000s. Also, it would seem that French movies lost money on average during the 1990s, but this would need to be looked into further to determine the cause and if this is an accurate representation.

Genre trends/visualizations

There were a lot of genres so the ones we’re going to focus on have been circled in either red or green. The one genre circled in red, War movies, have high ratings but also high budgets with low revenue and profits. This would be a genre that would not be recommended to invest in because over the past two decades they cost more to make on average relative to most other genres, but have dropped in revenue and profits between the 2000s and 2010s. In the last decade, they had the lowest average revenue and the second lowest average profits of all genres.

The top two profit earning genres are Sci-Fi and Action movies, and they are also on an upward trend in profit and revenue over the last 30 years. Sci-Fi and Action movies have low ratings relative to other genres, but seeing as there is no significant difference across the average ratings for all genres, it is not as big of a deal.

However, Animated movies have higher than average revenue, ratings and profits, so these movies seem to be doing well across all areas.

Conclusion/Final Recommendations:

Chinese and Japanese movies are among the highest rated and most profitable of non-English movies. And Sci-Fi and Action genres have the highest revenue and profits and have been trending upward over the last 30 years. Lastly, Animated movies, while not the highest in profits, have higher than average revenue and profits along with the second highest average ratings among all genres.

So we would recommend investing in or making Chinese or Japanese movies in the Sci-Fi, Action, and Animated genres.

--

--