Shuffling your data

🂡🂢🂣🂤🂥 In DC-ML's Supporting Functions , I created an Excel LAMBDA function, SelectData , to split data into training and testing datasets. This assumes data is randomly collected. If it isn't shuffling before splitting can increase confidence in the results. In this blog, I’ll show how to shuffle your dataset. Why Shuffle your data Splitting data helps prevent overfitting by using the test dataset to evaluate model accuracy on unseen data. Shuffling data prevents models from learning ordered patterns that could introduce bias. By randomizing the data, the model generalizes better to unseen data, improving performance and accuracy. ShuffleData The ShuffleData function is simple. It generates a random array ( randArray ) matching the dataset's rows and sorts the dataset using SORTBY based on this array. =LAMBDA(array, [headers], LET( randArray, RANDARRAY(ROWS(array), 1), sortArray, SORTBY(array, randArray), IF(ISOMITTED(headers), sortArray, VS...