Think about the last movie you watched on any OTT platform. Did you go there to watch exactly that movie? Or did you scroll through the platform and pick one that is recommended? Most of us use some recommender system or the other, everyday — across movie watching, online shopping, social networking, news publications, etc. With the latest in machine learning and deep learning techniques, you too can build a recommender system. Joveo Data scientist, Batul Bombaywala demonstrates how.

Batul started her career as a data analyst before she made a transition to becoming a data scientist at Myntra Jabong. She has over four years of experience in data science teams across startups. She is also a Springboard mentor. Today, she shows how to build a recommender system with Python, using collaborative filtering technique.

What is Recommender System?

Recommender is a form of information filtering system that predicts the likelihood of a user’s preference for any item and makes recommendations accordingly. For example, Netflix recommends the movies you are likely to enjoy, Amazon recommending products you might need, Facebook showing possible friends, etc. 

In fact, Amazon claims that 70% of their sales come through recommendation. It is a powerful tool for platform owners to build visibility for their products, cross-sell, upsell and overall increase revenue. A recommender system is one of those use cases for data science that has a direct impact on a company’s sales.

Types of Recommender System

Let’s see various types of recommender systems. Primarily, there are three kinds of recommendation systems. 

1. Recommender System: Recommend most popular item

Like the name suggests, in this method, the platform will recommend items that are most bought, movies that are most watched etc. It takes ‘item popularity’ as the singular feature to recommend options.

2. Recommender System: Building a classifier

This is a parametric method, which means that it uses specific parameters to filter information and make recommendations. If you take the example of a movie, this can be ‘item features’ like genre, actors, language etc. as well as ‘user features’ like location, preferred language etc. While this method mostly works, it is limited by the kind of features available.

3. Recommender System: Recommendation algorithm

This is a system where the algorithm takes into account multiple factors to present a recommendation. Primarily, there are two kinds of recommendation algorithms:

Content filtering: This algorithm uses keywords that describe an item and the user’s preference to present recommendations. For example, if a user has watched one movie, it recommends movies with similar features such as genre, language, length etc.

Collaborative filtering: This algorithm predicts one user’s behaviour based on the preferences of other similar users. For instance, you might have seen the ‘people who bought this also bought’ section in e-commerce platforms. This is what is collaborative filtering.

Movie Recommendation System Dataset

Now, let us look at how to apply a collaborative filtering algorithm to make movie recommendations using this MovieLens dataset, which has over 20 million movie ratings and tags.

This dataset has rows of users and items. The values in the matrix are ratings. Ratings can be both explicit like the number of stars given by a user; or implicit like how long the user watched any particular movie. 

Steps in Collaborative Filtering

While designing a collaborative filtering algorithm, you need three fundamental things.

  1. Finding similar users
  2. Finding ratings of these users for target users’ items
  3. Evaluating the system

There are two major techniques to do this: Memory-based and model-based. Let’s see how to use the memory-based technique for movie recommendation.

Memory-based technique uses all results in a matrix to predict ratings for the target user. It is better to use cosine similarity between the users, instead of euclidean distance. Once you’ve identified similar users, use an average of the top users to inform the recommendation system. But note that the first user might be more similar to the target user than the 10th or the 50th one. So, it is best to calculate a weighted average while making recommendations.

Below is a simplified process for doing this.

  1. Begin by cleaning the data. 
    • In this dataset, there are dashes that would be good to remove. 
    • Identify unique movies that have been rated.
    • Remove users who haven’t at least rated 55 movies. This will help reduce data volume as well as improve the quality of data.
  2. Merge datasets
    • In this case, we’ll use both genres and tags. So, combine the two datasets for metadata. 
  3. Create vectors
    • Create vectors using TF-IDF. Term frequency-inverse document frequency (TF-IDF) is a technique that gives more importance to rare words and less importance to stop words such as prepositions etc. This helps improve recommendation accuracy.
  4. Perform dimension reduction
    • This dataset gives what we call a sparse matrix. Which means, out of the 3 lakh or so items, a user might have seen 100 movies. For the rest, the rating will be empty. So, perform singular value decomposition (SVD) for dimension reduction.
    • This will identify variables that have the most variance. 
  5. Run algorithms for the target film
    • This can be content filtering, collaborative filtering or a hybrid one. 

To see a clear demonstration of this process of building a recommender system with Python, watch Batul’s tutorial on Youtube. To access the analysis in the video, fill this form. Natural language processing (NLP) is one of the many use cases for data science, a field that is fast growing. If you’re looking to transition into a data science career, consider Springboard’s Data Science Career Track. It offers 1:1 mentorship, career coaching, hands-on projects and a job guarantee.