Recommending Audiobooks for Drivers and Commuters Stuck in Traffic in Metro Manila

Frequent Itemset Mining (FIM) and ML Modelling Using PySpark

Crisanto Chua Jeddahlyn Gacera

Abstract

To go from one place to another, a person needs to commute either by private car or public transport in order to reach their destination. This is an essential part of being a member of society. The problem faced by most individuals, especially those living in Metro Manila, is that it takes too long to arrive in the desired location because of the worsening traffic situation. A minimum of 1 hour and 6 minutes is lost because of the daily commute. This unproductive time can be regained by listening to audible books while driving or riding public transport. This study presents a different approach of recommending books for audible conversion which is based on textual reviews. The text reviews, which were collected from the Amazon Customer Reviews Dataset available in the Registry of Open Data on AWS (https://registry.opendata.aws/amazon-reviews/), specifically the Books product category, were used to predict whether the book would be a possible candidate for conversion to an audible using different machine learning models. Four different algorithms were compared to predict whether the book is a candidate for conversion to audible format. Gradient Boosted Tree Classifier was the best model with a 91.7% PR-AUC. Further analysis was done for what books are most popular through Frequent Itemset Mining (FIM). The predictive model together with the insights gathered from FIM will provide future audible service companies in the Philippines an efficient system of qualifying new books for their subscribers.

Introduction

Metro Manila is one of the most congested cities in the world. Its 13 million population is squeezed into just 620 square kilometers, a density of more than 21,000 inhabitants per one kilometer of living space. This has led to a traffic problem that has one expat describing Metro Manila “unlivable” because of the traffic situation.

According to a traffic study conducted by Boston Consulting Group (BCG) for ride-hailing platform Uber in 2016, the average Filipino is stuck in traffic an hour and six minutes (66 minutes) a day, equivalent to 16 days of lost productivity time per year (Pena, 2017). This translates to a staggering 100,000 pesos in lost income per year. It will be difficult to solve the traffic situation given the limited resources available but what could be done in the short term is to provide options to people stuck in traffic on how to be productive. One of the suggested ways to make up for this time “lost” in traffic is for them to listen to audible books. Books that will help commuters expand their knowledge, hence make their commute more productive.

With the advancement of technology and increased internet bandwidth, more and more people have access to information on the go. Digital media now makes it possible for individuals to easily listen or read from their mobile devices in transit which has led to an increase in the consumption of e-books and audiobooks. The problem arises with using them to read eBooks for an extended period is that it could cause several health problems such as stress and neck pain. (Shimray, 2015)

Of particular interest and relevant to our study is the difference on learning via reading or listening. It has been shown that the modality is invariant to the amount of comprehension. Individuals recalled an equal amount of information regardless whether they listened to an audiobook or read it from an electronic tablet (Rogowsky, 2016) Moreover, brain activities recorder via MRI has shown similar patterns whether the input be spoken or written. (Deniz, 2019)

Audiobooks have now given individuals the opportunity to learn while being on the move with minimal or no loss of comprehension compared to reading. The demand for audiobooks in Metro Manila exists and this study presents a new way of recommending books suitable for conversion to audio.

Methodology

The following steps were taken for this study:

  1. Collection and description of Amazon Customer Reviews Dataset from the AWS Open Data
  2. Data preparation and selection
    • Text data feature extraction
    • Binning the ratings
  3. Machine learning models
  4. Performing Frequent Itemset Mining (FIM)

If you wish to have a copy of the technical paper for this project, kindly contact us via e-mail or LinkedIn.