Insurance companies now record Telematic data (X,Y coordinates vs Time) for drivers to understand driver behaviour and calculate the premium that the driver must pay for insurance.
The problem statement was to define signatures for drivers from their telematic data such that you can classify whether a trip belongs to a driver or not given trip data.
This project was done as part of a class project for Advanced Machine Learning at USF. We were learning Spark and we wanted to take an existing problem and adapt its solution in Spark. We chose to use this competition because the dataset seemed very interesting. This was a dead Kaggle competition and so we actually had access to intelligent solutions, this was important because we wanted to understand best practices and adapt the solution to Spark.