Posts

Showing posts with the label Classification

It's Probably Correct - Classifying with Naïve Bayes

Image
Naïve Bayes is a categorial probabilistic supervised classification. You may already be familiar with the terminology  supervised classification , so I will not repeat it here. Naïve Bayes doesn't require numerical values. It relies on categories or labels only. It is probabilistic because it uses probabilities to calculate the classification. It calculates the classification probabilities based on given records. The more consistent the data (repeatable patterns), the stronger the probability of the classification.  Note : This implementation and example here follows closely from Learn Data Mining Through Excel by Hong Zhou . Great book! Bayes Theorem Naïve Bayes is based on Bayes theorem most famously written as below: `P(y|x) = (P(x|y)*P(y)) / (P(x))` where: `P(y|x)` is the probability of `y` given `x` `P(x|y)` probability of `x` given `y` `P(y)` and `P(x)` are the probabilities of y and x respectively. With multi-independent variables `(x_1, x_2, ..., x_n)` the equation would be

Who are your neighbours? Classification with KNN

Image
The closer an observation to a class, the more likely it belongs to that class. KNN or K-Nearest Neighbour is a non-parametric supervised learning classification algorithm. It uses proximity to make classifications about the grouping of an observation point. How do you measure proximity? You may remember Pythagoras' theorem from your school days. The distance between two points is the square root of the sum of the squares of the sides. This would be one way to define proximity. But we could define proximity differently. Given two points (`x_1`, `x_2`, `x_3`) and (`y_1`, `y_2`, `y_3`)  Euclidean distance `= sqrt( (y_1-x_1)^2 + (y_2-x_2)^2 + (y_3-x_3)^2` Manhattan distance: `= |y_1-x_1| + |y_2-x_2| + |y_3-x_3|` Chebyshev distance: `= max(|y_1-x_1|, |y_2-x_2|, |y_3-x_3|)` Each of these definitions have their pros and cons. For our implementation, I will be using Euclidean distance. However if you wish to reduce the computation complexity, you might want to try Manhattan or Chebyshev

Linear Discriminant Analysis LDA - Using Linear regression for classification

Image
Linear Discriminant Analysis LDA uses linear regression to supervise the classification of data. Essentially you assign each class a numerical value. Then use  linear regression method  to calculate the projection of your observations to the assigned numerical values. Finally you calculate the thresholds to distinguish between classes. Essentially LDA attempts to find the best linear function that separates your data points into distinct classes. The above diagram illustrates this idea. Implementing LDA using LAMBDA Fit Steps in implementing LDA's Fit : 1. Find the distinct classes and assign each with an arbitrary value - UNIQUE and SEQUENCE . 2. Designate each observation with the arbitrary assigned value depending on its class - XLOOKUP . 3. Find the linear regression coefficients for this observations - dcrML.Linear.Fit . 4. Project each observation on the linear regression - dcrML.Linear.Predict . 5. Find the threshold of each class - classCutOff  from the spread of each re

Excuse me. Some Terminologies: Classification vs Clustering vs Regression

This is a short post to describe some terms used in data mining. Classification arranges data into classes/categories using a labeled dataset. Clustering separates an unlabeled dataset into clusters/groups of similar objects. Regression develops a model to predict continuous numerical values.  Classification is a supervised learning algorithm, while Clustering is an unsupervised algorithm. Regression is considered supervised learning because the model is trained using both the input features and output labels - which can be numerical values. Supervised means we rely on labelled training data. Unsupervised means unlabeled training data. That's all for now from DC-DEN !