DC Racoon's Statistical Journey with Excel

Microsoft Excel is a fantastic and useful tool for a range of purposes. You can use it as a spreadsheet, for project tracking, data graphing, or as a visualisation tool. You can pull data from files, websites, or databases. Data can be sliced, spliced and analysed. You can build static and dynamic dashboards with graphs, and publish them to the Internet. You can automate repeated tasks and write scripts for custom tasks.

Recently I had the time and the opportunity to learn Excel's statistical sample testing functions. I will blog and share with you on using Excel for simple statistical analysis. With Microsoft 365, Excel introduced the LAMBDA function with which you can create custom formula.

But you may ask, why Excel? Isn't other statistical software out there much better?

Case for Excel

Readily Available

Due to Microsoft Office dominance in the corporate world, Excel is available in most organisations. I have not been in any company that did not have Excel in their desktops or laptops.

This isn't to say that ALL companies have Excel installed on their desktops and laptops. But 95% of the time, you will have Excel readily available.

Builds on Existing Knowledge

The learning curve into Excel is gentle. You won't have to spend extra time learning simple tasks (e.g. reading input data or converting to correct unit). The time you spend learning new found skills is reusable in other tasks when using Excel again. In the end you are building on existing knowledge.

Professional statistical tools have a wide range of functions. But these software require you to learn their interface peculiarities on how data is displayed, stored and read. If you move to a different organization, they may have a different preferred tool. And you would need to learn that.

Great for Simple Statistics

Excel comes with many built-in statistical functions. You could easily whip up descriptive information, perform hypothesis testing, and build simple regression graphs. The existing tools gets you started for most frequent usage (averages, size, range, etc.).

The spreadsheet allows verification as you step by step build calculation towards  

Case Against Excel

Limited Statistical Functions

Excel does not provide Anderson Darling test, Mann-Whitney test, etc. You could do all this in Excel, but it would involve a number of steps to accomplish. Advanced statistical functions are no provided.

You can build custom Excel functions with Visual Basic for Applications VBA. These can be used as a macro  enabled file in your spreadsheet (.xlsm) or packaged as an add-in (.xlam). In my blog, I will show you how I implement statistical functions using Excel LAMBDA.

But if you want everything pre-package, you could consider third party Statistical Analysis Add-Ins for Excel.


Limited Statistical Graphing Tools

Excel does not provide extensive graphing tools. You won't get nice Box Plots. You can't get quick regression analysis plots. Most graphs and charts provided are suitable for corporate usage. Heavier scientific and engineering tools are not available.

In this aspect, Excel is clearly not suitable. The only silver-lining is that, if you are using Excel in corporate, most people would not understand those heavy scientific graphs.

Data Size and Other Constraints

Excel has data limitations. Most notably is each sheet can only support up to 1,048,576 rows by 16,384 columns. Honestly, if you have too many records, you might want to reconsider how you do your data sampling. Statistics is getting information from a smaller sample to understand the population. Getting data from the entire population may be too difficult or too expensive.

There is also limitation imposed by available resources on your computer. While this seems obvious, enterprise statistical analysis could leverage on data center or cloud resources to process large amount of data and very quickly.

If you find yourself in this position, you might want to consider Microsoft Power BI. The great thing is that what you learn in Excel can bring over to Power BI.

Conclusion

Knowing Microsoft Excel is good a investment. Statistical knowledge with this tool will stay relevant as Excel is generally available in most organisations. Specialize statistical software may be required for comprehensive analysis - a luxury some may not have. Excel provides functions to address common challenges that you may face.

Until next time, happy playing with Excel in your DC-Den.

Comments