Correlation Coefficient Matrix
Recently I wanted to find the correlation coefficient between multiple groups of data.
A correlation coefficient shows the relationship between two groups of data. A positive correlation means as one variable increase the other increases, while a negative correlation means as one variable increases the other variable decreases. The correlation of variable with itself is always 1. A zero correlation indicates no relationship. Correlation coefficient is always between -1 and 1.
Excel provides the CORREL function for this. It is fairly easy to use as shown by its syntax:
CORREL(array1, array2)
A correlation matrix is a table showing the correlation between pairs of variables. You could put the coefficients in a table like the one shown below.
Implementing a Correlation Matrix function
I thought of using LAMBDA instead. Such correlation matrix would be n x n table, where n is the number of data groups; number of columns from the data array.
numCols, COLUMNS(array),
We can make use of the MAKEARRAY function to build the n x n table, with a nested LAMBDA function calling CORREL.
correlArray, MAKEARRAY(numCols, numCols, LAMBDA(r,c, CORREL(CHOOSECOLS(array,r),CHOOSECOLS(array,c)) )),
If we implemented this, we will see the diagonal is always 1, and the matrix is symmetric about the diagonal. This is because CORREL(a, a) = 1 and CORREL(a, b) = CORREL(b, a).
We can reduce the number of calls to CORREL by checking the row and column values within the nested LAMBDA function.
correlArray, MAKEARRAY(numCols, numCols, LAMBDA(r,c, IF(r > c, CORREL(CHOOSECOLS(array,r),CHOOSECOLS(array,c)), IF(r = c, 1, "") ) )),
Putting it Together
With this in mind we put the function together with some minor cosmetic enhancement shown here below:
dcrCorrelation =LAMBDA(data_in_columns, [headers_in_row], [show_details], LET( array, data_in_columns, numCols, COLUMNS(array), correlArray, MAKEARRAY(numCols, numCols, LAMBDA(r,c, IF(r > c, CORREL(CHOOSECOLS(array,r),CHOOSECOLS(array,c)), IF(r = c, 1, "") ) )), headers_in_row, IF(ISOMITTED(headers_in_row), "Attribute " & SEQUENCE(1,numCols), headers_in_row), IF(show_details, HSTACK( VSTACK("Correlation", TRANSPOSE(headers_in_row)), VSTACK(headers_in_row, correlArray) ), correlArray ) ) )
Let's see this in action.
The function is very straightforward. We only need to provide the data in columns. If we provide the headers, this will appear in the table.The final result is the correlation coefficient matrix as shown below.You can then follow up with some formatting and colours.
Try it in your DC-DEN!
Comments
Post a Comment