Posts

Showing posts from June, 2024

It's Probably Correct - Classifying with Naïve Bayes

Image
Naïve Bayes is a categorial probabilistic supervised classification. You may already be familiar with the terminology  supervised classification , so I will not repeat it here. Naïve Bayes doesn't require numerical values. It relies on categories or labels only. It is probabilistic because it uses probabilities to calculate the classification. It calculates the classification probabilities based on given records. The more consistent the data (repeatable patterns), the stronger the probability of the classification.  Note : This implementation and example here follows closely from Learn Data Mining Through Excel by Hong Zhou . Great book! Bayes Theorem Naïve Bayes is based on Bayes theorem most famously written as below: `P(y|x) = (P(x|y)*P(y)) / (P(x))` where: `P(y|x)` is the probability of `y` given `x` `P(x|y)` probability of `x` given `y` `P(y)` and `P(x)` are the probabilities of y and x respectively. With multi-independent variables `(x_1, x_2, ..., x_n)` the equation would be