Which Machine Learning Algorithm To Use?

Terminologies

We learnt a few machine learning terminologies and algorithms in this blog.

Supervised means we rely on labelled training data. It is task driven to identify a goal.

Unsupervised means unlabeled training data. It is data driven to identify a pattern.

Classification arranges data into classes/categories using a labeled dataset.

Regression develops a model to predict continuous numerical values.

Clustering separates an unlabeled dataset into clusters/groups of similar objects.

Classification is a supervised learning algorithm, while Clustering is an unsupervised algorithm. Regression is considered supervised learning because the model is trained using both the input features and output labels - which can be numerical values.

I will mention here that two other unsupervised approaches are: Association, to identify underlying relationships, and Dimension Reduction, to reduce the number dimensions/features to make calculations simpler. I did not cover any methods on association and dimension reduction but Association Analysis and Principal Component Analysis PCA are examples of these respectively.

Mapping

Which method you should choose depends on defining the problem correctly.

  1. What should the output be?
  2. Which machine learning task to use?
  3. What makes the results successful?

Once you have defined the problem correctly, then the below mapping of machine learning methods become useful.

Use this map on your DC-DEN!

Comments