Posts

Showing posts with the label LAMBDA

Recursion: LAMBDA calling LAMBDA

Image
Excel Lambda functions can be used to create custom, reusable functions by giving them a friendly name. This means a custom Lambda function could call itself. A recursive function is a function that calls itself and self-moves towards an exit condition. Without the exit condition, the function loops continuously unless externally terminated. Why use recursion? Some problems can be expressed in an elegant recursive structure. For example the n-th factorial is defined as: `n! = n (n-1) (n-2) ... 1` with the termination condition: `0! = 1` Recursion example using Lambda Before we start, Excel provides the FACT function to calculate the factorial of a number. You can use this to compare the results of your own. This is an example of a recursive function MyFactorial =LAMBDA(n, IF(n = 0, 1, n * MyFactorial(n-1) ) ) The IF function is used to determine the termination condition. If the termination condition has not been reached, decrement the variable by 1 and call the functi

Kruskal-Wallis Test - Comparing multiple non-parametric data sets

Image
Previously I showed one-way  ANOVA  for comparing samples means. It assumes the samples are normally distributed. For non-parametric samples, we have Kruskal-Wallis test. Kruskal-Wallis the alternative to one-way ANOVA when your samples do not follow the normal distribution. Distribution 2 samples > 2 samples Normal Two Sample Mean Test ANOVA Non-parametric Mann-Whitney Kruskal-Wallis How does Kruskal-Wallis test work? Kruskal-Wallis test is used to test if there is any statistically significant differences between the medians of three or more independent (unrelated) groups. It is similar to Mann-Whitney test in that it sorts the data altogether, ranks them and then calculates the test statistics. But where Mann-Whitney uses NORM.S.DIST to calculate the test statistics, Kruskal-Wallis uses CHISQ.DIST instead. Implementing Kruskal Wallis in LAMBDA Step 1 : Group data together We can group the data together using

ANOVA - Comparing averages of multiple data sets

Image
In an earlier post, we used the Two Sample Mean Test to compare the mean of two samples (duh). If you have 3 samples you could use the same test, and compare between pairs of samples, i.e. sample 1 & sample 2, sample 2 & sample 3, and sample 1 & sample 3. The problem arises when you have many more sample sets. Running this test means keeping track of the permutations. This approach is wastefully repetitive and you are likely to make mistakes. Also, sometimes, you only want to know if there is at least one sample that is different from the rest. Scientific or medical break throughs are rare. In most experiments and trials, there will be no change/difference. A single test that tells you that there is no difference between the samples allows you to move on.  Is there a better method to compare the mean of multiple samples? Enter ANOVA Analysis Of Variance (ANOVA) is a statistical method used to compare the means between samples. It tells you if there's any significant d

When it is not Normal... The Mann-Whitney Test

Image
The Mann-Whitney Test Mann-Whitney test helps you compare two sets of data when they are not normally distributed. I would use Mann-Whitney test only after I confirm using the  Anderson-Darling test . I should add that I run a box-plot first before running the AD test. I used to think that Mann-Whitney compares the medians of two data sets. But in the process of implementing Mann-Whitney in Excel LAMBDA, I found out that Mann-Whitney test compares the mean ranks, in doing so, it determines if the two data set came from the same population. Step 1: Group the two samples together and sort the data in ascending order, but retain their sample origin. Step 2: Rank the combined data. Step 3: Separate the data back into the two samples. Sum the ranks for each sample. Step 4: Compute the test statistics U `U_1 = R_1 - (n_1(n_1+1))/2` and  `U_2 = R_2 - (n_2(n_2+1))/2` Step 5: Choose the small U to calculate the equivalent z-statistics. `z = (U - m_U) / sigma_U` where `m_U` and `sigma_U` are the

Is Your Data Normal? The Shapiro-Wilk Test

Image
  Shapiro-Wilk and Anderson Darling Tests The Shapiro-Wilk test is another statistical test to assess the normality of a dataset. Just as Anderson-Darling test , both are goodness-of-fit test. Why would we need another normality test? Let me put some points forward: The Shapiro-Wilk SW test is specifically a test for normality. The Anderson-Darling AD test can be applied to other distributions, not just normal distribution. That said, my AD implementation  was specific for normal distribution. AD test is more sensitive to deviations in the tails of the distribution, while SW test is sensitive to deviations across the entire distribution. Both AD and SW can be used to calculate p-values, but the p-value will not be comparable. This means there can be instances where AD and SW disagree . From implementation perspective, I find AD test easier to write than the SW test (a lot more steps), but AD test was much more difficult to understand compared to SW test. So, which test should you use:

Is Your Data Normal? The Anderson-Darling Test

Image
Have a look at the Excel Histogram chart above based on a given set of data. Does it look normally distributed? How could you find out? Well, you could calculate the kurtosis and skewness, e.g. using  Descriptive Statistics , or generate a BoxPlot. But still, how can you know with certainty? Descriptive Statistics showing Kurtosis and Skewness BoxPlot of Sample Data Why is Checking for Normality Important? In previous blogs on One Sample Mean Test and Two Sample Mean Test , I assumed the given data is normally distributed. If the sample is normally distributed then choosing a parametric test like t-test is applicable since we are using the mean (central tendency) and standard deviation (spread) to calculate the t-statistics . Assuming normal distribution is a simplification. It is therefore only to justify using parametric tests. However, mean and standard deviation should not be used to describe a sample that is not normally distr

Extending Two Sample Mean Test for Arrays

Image
In the previous blog, I implemented a Two Sample Mean hypothesis test with Excel LAMBDA. In this blog, I will create a new function that takes in two sample data arrays directly, calculate the necessary statistics, and then reuse the Two Sample Mean function to perform the hypothesis test. This approach will ensure the same result. Parameter Changes In the earlier implementation, the dcrMean.Two.TTest input parameters are: dcrMean.Two.TTest =LAMBDA(sample_mean_1, sample_stdev_1, sample_size_1, sample_mean_2, sample_stdev_2, sample_size_2, [tail], [show_details], To take in two arrays, I will create a new function dcrMean.Two.TTest.Array with input parameters as follows: dcrMean.Two.TTest.Array =LAMBDA(array_1, array_2, [tail], [show_details], Within this new function, the mean and standard deviation of the two input arrays will be pre-calculated before passing the values into the hypothesis test. The implementation will be like this: =LAMBDA(array_1, array_2, [tail], [show_

Creating Two Sample Mean Test with LAMBDA

Image
You received glass window samples from two different vendors. You wish to check if the thickness of the samples are the same. Based on the samples, you obtain the following data: Vendor Sample Sample 1 Sample 2 Mean (cm) 2.0 2.3 Standard Deviation (cm) 0.2 0.3 Number of Samples 15 20 At first glance sample 2 is 0.3cm thicker than sample 1. But are their thickness significantly different? To make a comparison you would need a perform a Two Sample Mean test . Two Sample Mean Test A two sample mean test compares two sample distribution means against each other. It differs from the one sample mean test that compares against a target value. We write the Null Hypothesis as the means of sample 1 and of sample 2 are equal. `H_0: mu_1 = mu_2` And the Alternative Hypothesis as the mean of sample 1 and of sample 2 are not equal. `H_1: mu_1 != mu_2` We could also test

Extending One Sample Mean Test for Arrays

Image
In the previous blog, I implemented a One Sample Mean hypothesis test in Excel LAMBDA. When given an array of sample data, we firstly calculate the sample mean, standard deviation and size, before passing these into the function. The function compares a sample mean against an expected population mean. In this blog, we will create a new function to take the sample data array directly, calculate the necessary statistics, and then reuse the One Sample Mean function to perform the hypothesis test. This extension ensures the same result regardless of the approach used. Parameter Changes In the earlier One Sample Mean dcrMean.One.TTest the input parameters were defined as: dcrMean.One.TTest =LAMBDA(expected_mean, sample_mean, sample_stdev, sample_size, [tail], [show_details], To implement One Sample Mean test for array,  dcrMean.One.TTest.Array  input parameter will take the entire data array: dcrMean.One.TTest.Array =LAMBDA(expected_mean, array, [tail], [show_details], Within this

Creating One Sample Mean Test with LAMBDA

Image
Suppose you visited a village. The village elders claims to possess a secret elixir that makes their children grower taller! You are aware of the national 12-year old  average height is 150cm. And you could measure the village's 12-year old children's height. Plotting the village's children's height gives you a distribution like in the graph above. At first glance the village's children does seem taller than the national average. Is the village's children really taller? Is the distribution significantly different? One Sample Mean Test A one sample mean test compares a sample distribution's mean against a target value. It is similar to the one proportion test but the sample data is continuous, giving a normal distribution curve instead of a yes/no binary result. We write the Null Hypothesis as the sample mean equal to the population mean. `H_0: mu = mu_0` And the Alternative Hypothesis as the sample mean not equal to the population mean. `H_1: mu != mu_0` We

Creating Two Proportion Test with LAMBDA

Image
What is Two Proportion Test Two Proportion Test compares one proportion to determine if it is equal to, lesser than or greater than another proportion. Unlike One Proportion Test , the size of the second proportion can have an effect on the results. Suppose you gave flavour X ice-cream to two groups of people. In the first group 65 of 100 people liked flavour X, and in the second group 90 of 130 people liked. Did this two groups responded similarly? We write our hypothesis statement as the "Did the two groups responded similiarly to flavour X ice-cream?" The Null and Alternative Hypothesis is written as: `H_0: p_1 = p_2` `H_1: p_1 != p_2` where `p_1` is proportion from group 1 `p_2` is proportion from group 2 Test Statistics The equation of a two proportion test statistics is written as `z = (p_1 - p_2) / sqrt( hat p(1 - hat p)(1/n_1 + 1 / n_2) )` where the pooled variance `hat p = (o_1 + o_2)/(n_1 + n_2)` and `p_1 = o_1 / n_1` is the proportion from group 1 `p_2 = o_2 / n_2

Creating One Proportion Test with LAMBDA

Image
What is One Proportion Test The One Proportion Test compares a sample proportion against a target proportion. For example, you observe that you get 6 heads when you flip a coin 10 times (sample). You expect with a fair coin, the probability of getting heads is 0.5 or 50% of the time (target). The One Proportion Test allows you to assess if the flipped coin is a fair coin based on the observations. The test proves this by statistically comparing the observed proportion of heads to number of flipping 0.6, is equal to the expected proportion 0.5. We write the Null Hypothesis as the observed proportion is equal to the hypothesised proportion. `H_0: p = p_0` And the Alternative Hypothesis is that the observed proportion is not equal to the hypothesised proportion (two-tailed test). `H_1: p != p_0` We could also test if the observed proportion is greater than the hypothesised proportion (left tail test). `H_0: p >= p_0` Or if the observed proportion is less than the hypothesised proporti

Formula Name Manager Limit: ~2084 Characters

Image
While trying to implement a LAMBDA function I found myself unable to paste my formula into the Formula Name Manager. My first thought was that the LAMBDA has a character limit. But further investigation showed it was related to the Formula Name Manager, specifically the Refers to: field. Also it is not  related to the Name or the Comment field. Tweaking my formula the maximum number of characters is about 2084. Whitespaces are counted. This includes carriage returns and tabs. So, when implementing LAMBDA with the Formula Name Manager, do remember to keep your formula as concise as possible, but don't be too sloppy with variable names and indentation. That said, go edit your LAMBDA in your DC-DEN! Update: The Comment:  field limit is 256 characters.

Descriptive Statistics summary with LAMBDA

Image
One function in Excel that I find useful is the Descriptive Statistics from the Analysis Toolpak Add-In. At a glance I see a summary of the sample data. However I also find it cumbersome to use. It requires you to add-in Analysis Toolpak - not loaded by default. I wonder why Microsoft choses not to pre-load in Excel even though it has been available since Excel 2000 (maybe even earlier?). It launches a dialog. Recently, this becomes an issue as it unavailable in Excel Web. It returns a static result, not an auto recalculating function, i.e. the result is not updated if and when the data is updated. Of the 3 reasons above, I find the last most annoying. While field/laboratory sample data does not change after collection, but input data from external sources can be changing frequently. Any report based on external sources would have to be painstakingly updated manually. So in this blog, I will create a Descriptive Statistics function with LAMBDA that would recalculate automatically just