Posts

Out-of-the-Box Part 4 - Variance Testing with F.TEST

Image
We wrap up our "Out of the Box" series with the  F.TEST  function, used to check if the variances of two samples are equal. Variance is simply the square of the standard deviation `sigma^2`, which measures how spread out data is around the mean. Since variance and standard deviation are mathematically linked, we use the terms interchangeably. The F.TEST formula is simple: F.TEST(array1, array2) where array1 and array2  are your data ranges, which don’t need to be the same size. Visualizing Variance with Box Plots In the previous post, we looked at box plots for five sample distributions. From the box plots, you might think that samples A, B, and D have the same spread , while C and E are different . However, these box plots only show quartiles (25% to 75% of the data) and may not accurately reflect variance. Using the F.TEST formula, we find that while sample A's variance differs from sample B's, sample C's variance is similar to B's. As it turns out sample...

Happy New Year 2025!

Image
🎉Happy new year, 2025! 🎊 It’s been a stressful and tumultuous year for me, but I’m grateful to have made it through one of the most challenging times of my life. While I’m not completely out of the woods, I thank God for protecting me and keeping me safe in him. Looking forward, I plan to continue writing about Excel and LAMBDA, not only for my own growth but to explore niche topics that interest me. Off the blog, I’ve been diving deeper into numerical analysis with Excel. Once I’m satisfied with my approach, I will share my insights that hopefully could be valuable to others. Wishing you all a Happy New Year filled with peace. God bless!

Out-of-the-Box Part 3 - Two Sample Mean Testing with T.TEST

Image
Excel's  T.TEST  function is commonly used to compare the means of two sample data sets. According to its documentation , the function is designed to: ...to determine whether two samples are likely to have come from the same two underlying populations that have the same mean . In order words, T.TEST  tests whether the average (mean) values of two samples are significantly different. However, it does not assess whether one sample's average is specifically greater than or less than the other. If your goal is to test if the mean of one sample is greater than or lesser than the other, T.TEST may not be the right tool. Instead, you might consider alternatives like  DC-DEN's two sample mean test , which is specifically designed to compare means in a directional (greater, lesser or equal) context. Understanding the T.TEST formula in Excel The T.TEST function in Excel compares the means of two data sets to determine if they are statistically different. The formula is as fo...

Out-of-the-Box Part 2 - One Sample Mean Testing with Z.TEST

Image
Previously we saw the One Sample Mean test implemented using LAMBDA  by calculating the test statistics. Excel provides something similar using the  Z.TEST  function. The Z.TEST function compares the average of a sample if it is statistically less than or equal to a test value. The formula is stated as: Z.TEST(array, x, [sigma]) where array is the range of data to be tested against x x is the value of test. sigma is optional. If the population standard deviation is known, put it here. Otherwise leave it empty, and Excel will use the sample standard deviation instead. In effect you are testing to see if the average of the array is less than or equal to x `bar(array) < x` Comparing for Less Than or Equal To Observe the box plot of 5 sample distributions above. We want to check if the distributions averages are less than or equal to the value of 50 . We see samples A and B less than 50, sample C around 50, and samples D and E more than 50. NOTE: Where possible do box...

Out-of-the-Box Part 1 - Proportion Testing with CHISQ.TEST

Image
So Excel Lambda is great! You can create custom functions that are not included in Excel. And we have seen in this blog how you could implement various statistical hypothesis testing .  But then what are those Excel TEST functions for? What hypothesis testing can I do with these Out-of-the-box TEST functions? In the next few posts, I will describe some common hypothesis testing you could do: Proportion Test with  CHISQ.TEST One Sample Mean Test with  Z.TEST Two Sample Mean Test with  T.TEST Variance Test with  F.TEST Excel's Chi Squared Test for One Proportion Testing In Excel's documentation, CHISQ.TEST is described as a test for independence . It does this by doing a goodness-to-fit test on how well the data matches the expected. This means we can also use it for one proportion testing. To use CHISQ.TEST we will need to compare the sample proportion against the expected proportion. But unlike DC-Den's One Proportion Testing  where we only need to spe...

Which Machine Learning Algorithm To Use?

Image
Terminologies We learnt a few machine learning  terminologies  and algorithms in this blog. Supervised  means we rely on labelled training data. It is task driven to identify a goal. Unsupervised  means unlabeled training data. It is data driven to identify a pattern. Classification arranges data into classes/categories using a labeled dataset. Regression develops a model to predict continuous numerical values. Clustering  separates an unlabeled dataset into clusters/groups of similar objects. Classification is a supervised learning algorithm, while Clustering is an unsupervised algorithm. Regression is considered supervised learning because the model is trained using both the input features and output labels - which can be numerical values. I will mention here that two other unsupervised approaches are:  Association , to identify underlying relationships, and Dimension Reduction , to reduce the number dimensions/features to make calculations simpler. I d...

It's Probably Correct - Classifying with Naïve Bayes

Image
Naïve Bayes is a categorial probabilistic supervised classification. You may already be familiar with the terminology  supervised classification , so I will not repeat it here. Naïve Bayes doesn't require numerical values. It relies on categories or labels only. It is probabilistic because it uses probabilities to calculate the classification. It calculates the classification probabilities based on given records. The more consistent the data (repeatable patterns), the stronger the probability of the classification.  Note : This implementation and example here follows closely from Learn Data Mining Through Excel by Hong Zhou . Great book! Bayes Theorem Naïve Bayes is based on Bayes theorem most famously written as below: `P(y|x) = (P(x|y)*P(y)) / (P(x))` where: `P(y|x)` is the probability of `y` given `x` `P(x|y)` probability of `x` given `y` `P(y)` and `P(x)` are the probabilities of y and x respectively. With multi-independent variables `(x_1, x_2, ..., x_n)` the equation wou...