These are mainly the assignments from week 9 since we didn't get through all the material.
Use the lecture's Jupyter Notebook as a starting point.
Anomaly Detection
Classification Methods
- Head over to the Oslo City Bike webpage and download their 2016 data.
- Load the data into Spark and parse the timestamps of the starts of the trips.
- We'll now repeat the steps from the lecture.
- Calculate the per-day-and-hour trip counts.
- For each day, construct a feature vector containing the trip counts for each hour.
- Using the elbow method, find a good \(k\), such that dividing the days in \(k\) clusters seems to make sense.
- Produce the elbow plot along the way.
- Cluster the days according to the \(k\) you chose.
- Which weekdays occur most commonly in your clusters? Make a plot.
- Which months occur most commonly in your clusters? Make a plot.
- Calculate the per-cluster z-values and find the most anomalous days (setting e.g. \(z_max\) = 3).