Posts

Showing posts from January, 2024

Linear Discriminant Analysis LDA - Using Linear regression for classification

Image
Linear Discriminant Analysis LDA uses linear regression to supervise the classification of data. Essentially you assign each class a numerical value. Then use  linear regression method  to calculate the projection of your observations to the assigned numerical values. Finally you calculate the thresholds to distinguish between classes. Essentially LDA attempts to find the best linear function that separates your data points into distinct classes. The above diagram illustrates this idea. Implementing LDA using LAMBDA Fit Steps in implementing LDA's Fit : 1. Find the distinct classes and assign each with an arbitrary value - UNIQUE and SEQUENCE . 2. Designate each observation with the arbitrary assigned value depending on its class - XLOOKUP . 3. Find the linear regression coefficients for this observations - dcrML.Linear.Fit . 4. Project each observation on the linear regression - dcrML.Linear.Predict . 5. Find the threshold of each class - classCutOff  from the spread of each re

KMeans clustering - Finding your centre

Image
KMeans clustering is a method to partition your observations into k clusters with k centroids that describes the centre of each cluster . When given a new observation, it is part of a cluster if it is closest to the centroid of that cluster. The diagram above illustrates the k-means clustering concept. The KMeans approach starts by deciding the number of clusters you wish. Then you estimate where the centroids of each cluster might be located. The distance of each observation to each centroid is calculated. Then each observation is re-clustered to the closest centroid. For each new cluster, we re-calculate a new centroid by averaging the cluster data by each feature. We repeat this cycle until no further refinement is achieved. Since Excel LAMBDA does not have iterative loops, a recursive approach will be used. Implementing KMeans clustering in LAMBDA With k-means clustering we implement Predict  before  Fit . Predict Predict takes a list of observations array and a list of centr

Excuse me. Some Terminologies: Classification vs Clustering vs Regression

This is a short post to describe some terms used in data mining. Classification arranges data into classes/categories using a labeled dataset. Clustering separates an unlabeled dataset into clusters/groups of similar objects. Regression develops a model to predict continuous numerical values.  Classification is a supervised learning algorithm, while Clustering is an unsupervised algorithm. Regression is considered supervised learning because the model is trained using both the input features and output labels - which can be numerical values. Supervised means we rely on labelled training data. Unsupervised means unlabeled training data. That's all for now from DC-DEN !