Posts

Showing posts from November, 2023

What is Linear Regression?

Image
Linear Regression is the modelling of the relationship of a dependent variable to one or more independent variables. You want to predict the value of a dependent variable given other independent variable values. The aim of linear regression is to find the line with the best fit for the given data. Simple and Multiple Linear Regression Simple linear regression is the case of only one independent variable. The equation is written as: `y = b + mx` Multiple linear regression has many independent variables and the equation is written as: `y = b + m_1x_1 + m_2x_2 + m_3x_3 + ...` where: `y` is the dependent variable `x_i` are the independent variables `m_i` are the coefficient for the corresponding `x_i` variables `b` is a constant, sometimes known as error or offset `b` also has a special property. It is the y-intersect when `x_i = 0` for all `i`. For simple linear regression `m` and `b` are calculated using the formula: `m = sum_((x - barx)(y - bary))/sum_((x - barx)^2)` `b = bary - m barx`

Supporting Functions for DC-ML

Image
I will be using some tools to support my data mining functions. I will put them here for your reference. SelectData This function filters a set of data by rows. The default is every 4 in 5 is selected as training data. Every 5th row is for validation data. dcrML.Help.SelectData =LAMBDA(array, selectTrain, [headers], [ratioTrain], [ratioValidate], LET( ratioTrain, IF(ISOMITTED(ratioTrain), 4, ratioTrain), ratioValidate, IF(ISOMITTED(ratioValidate), 1, ratioValidate), selectTrain, IF(ISOMITTED(selectTrain), TRUE, selectTrain), ratioTotal, ratioTrain + ratioValidate, selected, IF(selectTrain, FILTER(array, MOD(ROW(array),ratioTotal) < ratioTrain), FILTER(array, MOD(ROW(array),ratioTotal) >= ratioTrain) ), IF(ISOMITTED(headers), selected, VSTACK(headers, selected) ) ) ) GetHeaders This function is overloaded. If dataHeaders are provided, it returns them. However if none provided, it returns a sequential headers: "F