Creating Two Sample Mean Test with LAMBDA

You received glass window samples from two different vendors. You wish to check if the thickness of the samples are the same. Based on the samples, you obtain the following data:

Vendor Sample	Sample 1	Sample 2
Mean (cm)	2.0	2.3
Standard Deviation (cm)	0.2	0.3
Number of Samples	15	20

At first glance sample 2 is 0.3cm thicker than sample 1. But are their thickness significantly different? To make a comparison you would need a perform a Two Sample Mean test.

Two Sample Mean Test

A two sample mean test compares two sample distribution means against each other. It differs from the one sample mean test that compares against a target value.

We write the Null Hypothesis as the means of sample 1 and of sample 2 are equal.

`H_0: mu_1 = mu_2`

And the Alternative Hypothesis as the mean of sample 1 and of sample 2 are not equal.

`H_1: mu_1 != mu_2`

We could also test if the mean of sample 1 is greater than or equal to that of sample 2 (left tail test).

`H_0: mu_1 >= mu_2`

Or if the mean of sample 1 is less than or equal to that of sample 2 (right tail test).

`H_0: mu_1 <= mu_2`

Test Statistics

To make the comparison, we will assume that:

The two samples are normally distributed
The two samples are independent
The sample have unequal variances

NOTE: There are cases where the measurements are not independent, and where variances are the same. But for this implementation we will assume neither is the case.

With these assumptions, the t-statistics is given by the formula:

`t = (mu_1 - mu_2) / (sqrt ( s_1^2/n_1 + s_2^2/n_2) )`

where

`t` = referred as t statistics

`mu_1, mu_2` = sample 1 and 2 averages

`s_1, s_2` = sample 1 and 2 standard deviations

`n_1, n_2` = sample 1 and 2 sizes

Intermediate Calculations in LAMBDA

To implement the LAMBDA, I use some intermediate variables.

      numerator, sample_mean_1 - sample_mean_2,
      denominator, SQRT( sample_stdev_1^2/sample_size_1 + sample_stdev_2^2/sample_size_2),
      tStatistics, numerator / denominator,
      df, sample_size_1 + sample_size_2 - 1,

To calculate the p-values, I will use T distribution formula T.DIST from Excel similar to that in One Sample Mean Test.

Putting it Together

The final result will be LAMBDA looking like this:

=LAMBDA(sample_mean_1, sample_stdev_1, sample_size_1, sample_mean_2, sample_stdev_2, sample_size_2, [tail], [show_details],
    LET(tail, IF(ISOMITTED(tail), 0, tail),
      show_details, IF(ISOMITTED(show_details), FALSE, show_details),
      numerator, sample_mean_1 - sample_mean_2,
      denominator, SQRT( sample_stdev_1^2/sample_size_1 + sample_stdev_2^2/sample_size_2),
      tStatistics, numerator / denominator,
      df, sample_size_1 + sample_size_2 - 1,
      pvalueLeftTail, T.DIST(tStatistics, df, TRUE),
      pvalueRightTail, T.DIST.RT(tStatistics, df),
      pvalueTwoTail, 2 * IF(pvalueLeftTail < pvalueRightTail, pvalueLeftTail, pvalueRightTail),
      pvalue, SWITCH(tail, 0, pvalueTwoTail, -1, pvalueLeftTail, 1, pvalueRightTail, "Tail:Left Tail=-1, Two Tail=0, Right Tail=1"),
      details, HSTACK(
        VSTACK(
          "Sample Mean Mu1",
          "Sample StdDev SD1",
          "Sample Size N1",
          "Sample Mean Mu2",
          "Sample StdDev SD2",
          "Sample Size N2",
          "t-statistics",
          "p-value Left Tail Test Mu1 > Mu2",
          "p-value Right Tail Test Mu1 < Mu2",
          "p-value Two Tail Test Mu1 = Mu2"
        ),
        VSTACK(
          sample_mean_1,
          sample_stdev_1,
          sample_size_1,
          sample_mean_2,
          sample_stdev_2,
          sample_size_2,
          tStatistics,
          pvalueLeftTail,
          pvalueRightTail,
          pvalueTwoTail
        )
      ),
      IF(show_details, IFNA(details,""), pvalue)
    )
)

NOTE: After writing a few LAMBDAs, I realised that if a parameter is omitted, the default value is 0 or FALSE. The checking ISOMITTED is unnecessary unless you wish to set to some other value.

Two Sample Mean Test In Action

Let's have a few screenshots of how this LAMBDA formula works. Say you have keyed in the values as below:

Then you call the LAMBDA formula like this:

Then LAMBDA returns the following results:

From the results we see the p-value is 0.001 which is less than 0.05. It is improbable that the glass thickness from Sample 1 and Sample 2 are the same. We reject the null hypothesis and conclude that the glass thickness from Sample 1 and from Sample 2 are different `mu_1 != mu_2`.

Incidentally the right tail test returns a p-value of 0.999 strongly supporting that the glass thickness of Sample 1 is less than that of Sample 2 `mu_1 < mu_2`

In my next blog post, I will extend this, and create a LAMBDA formula that accepts two arrays without imposing on the user to calculate the mean and standard deviation.

If you are game, you could consider implementing cases when the samples are not independent or when they have equal variances. The t-statistics formula will be slightly different, but other than that the rest of the codes should remain the same. Have a go and implement this in your DC-DEN!

Search This Blog

DC-Den