Data Mining or Machine Learning
I covered a number of statistical tests using Excel LAMBDA. The reason for using Excel LAMBDA was its ubiquity and undemanding learning curve.
While there are more statistical inferences test, I only covered those that I commonly used. If however you think other common ones, please let me know. I would be interested as well.
Data Mining or Machine Learning
When I started data analysis, the term data mining made sense. The techniques used within Data Mining is with the intention of identifying patterns within a data set. The problem came when I started searching more of a topic from data mining, they keep popping up in machine learning.
Machine Learning is the process of computers learning in a way that mimics human learning or through algorithms. To accomplish this machine learning use data mining techniques as the process requires identification of patterns.
While there is a difference between data mining and machine learning, do not be surprise of the overlap or if you start wondering if the two are synonymous. The hype with machine learning has overshadowed data mining. Do not waste your time arguing which is purer. More important is to understand the described techniques.
For me, moving to data mining feels like a natural progression in data analysis after statistics. Data mining is about uncovering patterns within a data set. I will be go through topics like:
- Linear regression
- K-Means clustering
- Linear discriminant analysis
- K-Nearest neighbours
- Naive Bayes classification
In data analysis, descriptive analysis describes the given data, while predictive analysis suggests or projects the behaviour.
However, these categorization cannot be used to clearly demarcate the topics. For example, the linear equation from linear regression method can be used to describe the data set behaviour. Yet the same linear equation can be used to predict values for data belonging to the data set. It is not worth wasting time arguing which technique belong to which category. More important is to understand the purpose of these techniques and when they can be used.
In the next few blogs I will be going through data mining topics listed above. I will be taking implementation based on the book Learn Data Mining Through Excel by Hong Zhou.
For now, I want to say data mining is coming to DC-DEN!
NOTE: For data mining, I will switch to camel notation to save space, contrary to the advise for snake notation in my earlier blog.
Comments
Post a Comment