Posts

Showing posts from February, 2024

Who are your neighbours? Classification with KNN

Image
The closer an observation to a class, the more likely it belongs to that class. KNN or K-Nearest Neighbour is a non-parametric supervised learning classification algorithm. It uses proximity to make classifications about the grouping of an observation point. How do you measure proximity? You may remember Pythagoras' theorem from your school days. The distance between two points is the square root of the sum of the squares of the sides. This would be one way to define proximity. But we could define proximity differently. Given two points (`x_1`, `x_2`, `x_3`) and (`y_1`, `y_2`, `y_3`)  Euclidean distance `= sqrt( (y_1-x_1)^2 + (y_2-x_2)^2 + (y_3-x_3)^2` Manhattan distance: `= |y_1-x_1| + |y_2-x_2| + |y_3-x_3|` Chebyshev distance: `= max(|y_1-x_1|, |y_2-x_2|, |y_3-x_3|)` Each of these definitions have their pros and cons. For our implementation, I will be using Euclidean distance. However if you wish to reduce the computation complexity, you might want to try Manhattan or Chebyshev