Posts

Showing posts with the label Regression

Excuse me. Some Terminologies: Classification vs Clustering vs Regression

This is a short post to describe some terms used in data mining. Classification arranges data into classes/categories using a labeled dataset. Clustering separates an unlabeled dataset into clusters/groups of similar objects. Regression develops a model to predict continuous numerical values.  Classification is a supervised learning algorithm, while Clustering is an unsupervised algorithm. Regression is considered supervised learning because the model is trained using both the input features and output labels - which can be numerical values. Supervised means we rely on labelled training data. Unsupervised means unlabeled training data. That's all for now from DC-DEN !

Linear Regression: Why you should reinvent Excel's LINEST?

Image
In the previous article on  Linear Regression , I mentioned Excel's LINEST function. But if you tried using the returned coefficients, you may notice something peculiar. The order of the returned linear coefficients is in the reverse order of the input data. LINEST documents: The equation for the line is: `y = m_1x_1 + m_2x_2 + ... + m_nx_n+ b` if there are multiple ranges of x-values, where the dependent y-values are a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors. The array that the LINEST function returns is `{m_n, m_(n-1), ..., m_1, b}`.  The input is in the order 1st, 2nd, 3rd, ... but the returned coefficients are in the reverse. And if you were to use the coefficients to predict y for a given `x_1, x_2, x_3, ...`, you would either swap the x-s around or the coefficients around. This isn't intuitive. For this reason you should reinvent LINEST . The inten