Try this: Open your browser history and see all the web pages you’ve visited in the last 30 days. If you’re an IT employee in India today, you’ll have hundreds, if not thousands, of links in this period — from the latest movie trailer to online programming tutorials. You’ll realise that in the digital era, data is easy to obtain. We are leaving digital footprints with nearly every activity we undertake. By intelligently analysing this data, we can understand the world around us faster and better. In this blog post, we’ll show you how professionals have done so in the past, through our collection of top 10 data science projects from around the world.
Top Data Science Projects to Learn From
While exploring the role of a data scientist in an earlier blog post, we had argued that the field is an intersection of programming, mathematics/statistics and domain knowledge. A data scientist is one-part statistician, one-part business analyst, and one-part data engineer. Today, you’ll see how these roles come together in real life data science projects.
#1 Data Science Projects: Customer segmentation
One of the fundamental ways in which businesses understand their customers is through segmentation. Typically, customers are segmented based on demographics, psychographics, sales behaviour etc. to target them with the right products and offers. At a very large scale — say for an FMCG company like Unilever or retailer like Walmart — performing customer segmentation manually is increasingly difficult. Which is why this is an exemplary data science use case.
In her project, data scientist Rebecca Yiu, uses R, PCA and K-means clustering to perform market segmentation for a fictional company. Learn about her data science projects here.
#2 Data Science Projects: Uber’s pickup analysis
Is Uber Making NYC Rush-Hour Traffic Worse? — This was one of the four questions that was answered by FiveThirtyEight, a data-driven news website now owned by ABC. For this project, FiveThirtyEight obtained Uber’s ride share data and analysed it to understand patterns of ridership, how it interacts with public transport, and how it affects taxis. They then wrote detailed news stories supported by this data analysis.
#3 Data Science Projects: Web traffic forecasting using time series
Two years ago, Google hosted a competition to forecast future web traffic for about 145,000 Wikipedia articles on Kaggle. A most useful data science use case resulting from this competition would be in planning server usage, optimizing network and infrastructure resources, preventing outages etc. Over a thousand teams participated in this contest. One such team used data science tools such as traditional moving averages, ARIMA based techniques, recurrent neural networks, and Google DeepMind’s Wavenet to make their predictions. They have published the details of their project, which you can read here.
#4 Data Science Projects: Predicting restaurant success
Data scientist Michail Alifierakis used Yelp data to build his “Restaurant Success Model” to evaluate the success/failure rates of restaurants. This is a great data science use case for lenders and investors, helping them make profitable financial decisions. You can access details of his project here, Yelp data here and the project code on GitHub here.
#5 Data Science Projects: Detecting fake news
How to detect if the news you’re reading is fake? Data scientist Johnny Wales has a solution for that — unslanted.net/newsbot/. In his blog post, he details the data science data sets he used, explains how he performed data wrangling, built the model using logistic regression, decision tree etc. Read about this project here.
#6 Identifying the right dog
Another interesting Kaggle challenge was Dog Breed Challenge, which requires you to run computer vision analysis on large data science data sets to accurately identify a dog’s breed. Data scientist Connor Shorten explains how he did it, using image classification methods and data science tools like Keras, Python etc. in his article here.
#7 Comparing book prices between Amazon and Flipkart
Adarsh S built scrapers to collect data of 800 books from Amazon and Flipkart to compare prices and identify which is the most cost-efficient e-commerce site in general. After this, he did other interesting analysis, which he outlines here. You can access the data science data sets here and conduct your own analysis too!
#8 Visualising climate change
Data visualization and presentation is an integral part of data science, because it helps engage with the audience and tell the story impactfully. In this project, data scientist Giannis Tolios visualises changes in global mean temperatures, as well as the rise of CO2 concentrations in the atmosphere. Read about his project here.
#9 Predictive policing
Law enforcement agencies around the world are leveraging data science technologies to forecast crimes and prevent them whenever possible. Recently, the Delhi Police began to use Crime Mapping Analytics and Predictive System (CMAPS). It is a software that uses real-time data from police helpline and satellite imagery from ISRO to visualise cluster maps of crime hotspots. While we don’t have public access to the real-time data used by the Delhi Police, you can build your own data science projects with the past data made available by the national crime records bureau (NCRB).
#10 Data Science Projects: Democratizing data science at Uber
One of the key challenges in data science is that it requires one to be a mathematician or a statistician to even make basic predictions and forecasts. Uber’s data science platform overcomes this challenge by automating forecasting using pre-built algorithms and tools, enabling everyone on the team to get predictions, as long as they have data. Watch Franziska Bell, Uber’s Director of Data Science, talk about their data science platform here.
From the most basic tasks like data cleansing or wrangling to the more complex data science applications like building Uber’s platform, there is a world of opportunity out there for an aspiring data scientist. These top data science projects we’ve listed are just a cross-section of the possibilities that’ll open up for you. To prepare for the possibilities and build a strong foundation, consider Springboard’s online program in data science — it comes with 1:1 mentoring- led a project-driven approach which is career-focused along with a job guarantee.