When starting a new data science project, there are a few things to consider and keep in mind. So I’m going to use this blog to sort of map out the process and the steps I plan to take in order to get through this project.


The first thing you need to decide on is the reason behind doing the project. Just having an idea is not enough because the purpose and objective keeps you focused and plays a part in the decisions you make each step of the way. Are you looking for trends, is it going to be…


Video games are being developed and released all the time. Whether they are made by big or small companies, there are video games out there for all kinds of people.

However, when developing a video game or when it is about to be released, it would be helpful to know what will affect the sales of the game so as to try to maximize sales as well.

Big Questions

Some of the big questions I’ll be trying to answer through this regression analysis are:

Are there noticeable trends/patterns with regards to video game sales?

What factors have the biggest impact on video…

Task: Create a model that can predict the tags or categories that a restaurant should have using user reviews.

Purpose: The purpose of this project is to attempt to build a model to predict the tags(or labels/categories) of a restaurant based on the user reviews for the restaurant. If successful this could help apply relevant tags to restaurants or be used to periodically update the tags. This would be helpful to provide users more accurate search results so they would not miss something because of missing tags.


Data Gathering & Data Prep

I collected data from the Yelp Dataset, which can be found on the…

If you have not heard of linters, don’t worry you are not alone. I took a data science course and did not know about them until after, when a friend mentioned it was a possible interview question. So even if I have not used one, I thought it best I should at least understand what they are and why they are useful.

What is a Linter?

A linter is a tool that looks over code and flags “programming errors, bugs, stylistic errors, and suspicious constructs.”(Wikipedia) As the nature and syntax of languages vary, there are various linters that focus on certain areas for the…


Our project is an anime recommendation system. Recently everyone has been at home due to the corona virus but if you’re like me and really enjoy anime, finding a new one to watch is a job on its own. Finding a new anime in and of itself is not hard, but just randomly picking and watching an anime which can range from 7–1000 episodes can lead to some bad results. …


Consider the following: You have an idea and want to start a Kickstarter campaign to raise money in order to make your idea a reality. You’re a bit hesitant though because you don’t want it to end in failure.

So I did a project where I aim to see if we can determine factors that play an important role in determining the success of a Kickstarter campaign to get an idea before someone launches the campaign.

Big Questions

  • What kind of trends are there with regards to success of campaigns?
  • What are the most important factors in the success of campaign?
  • What…

An EDA project with Eric B

Prompt: A company wants to break into the movie market with original content. Come up with recommendations on how to do so and what the best approach would be with research and EDA(Exploratory Data Analysis).

Our Approach:

Nowadays a lot of people do not just watch American or English movies. Foreign or non-English films are quite popular and are worth investing in.

However, deciding what foreign content should be made might not be as clear. …

If you’re learning about data science, you’ve definitely heard of neural networks, but may not know much more than the name. In this blog, I’m going to talk a bit about the history of neural networks and how they work to give a brief overview/intro to those just looking into it.


It may be hard to believe, but the idea of neural networks were first proposed in 1944 by two University of Chicago professors, Warren McCullough and Walter Pitts. However, at the time they were just capable of doing calculations that a digital computer could do. We will talk more…

Once you start learning about modeling and predicting in data science, you will definitely be introduced to the idea of bias and variance and loss functions. If you’re familiar with it you know that the general idea is that high variance is a sign of overfitting and high bias is a sign of underfitting. Loss functions generally determine the loss based on the bias and variance of your model. We’re going to look at some of the math and stats behind bias and variance and how they apply to a couple of loss functions.

Overfit and Underfit

If your model is overfit, it…

Recently, I learned about making linear regression models and there were a large variety of models that one could use. When looking at a subset of these, regularization embedded methods, we had the LASSO, Elastic Net and Ridge Regression. For right now I’m going to give a basic comparison of the LASSO and Ridge Regression models.

Regularization Embedded Models

So Embedded methods are models that learn which features best contribute to the accuracy of the model while the model is running. The methods we are talking about today regularize the model by adding additional constraints on the model to aim toward lowering the…

Zawar Ahmed

Data Science student at Flatiron School

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store