Member-only story
Splitting Datasets: From 0 to Hero
Whenever you are dealing with large amount of data and or creating any type of model, a predictive model or classification model for instance, or any model you are creating, for better model evaluation metrics, you would require to know about these terms.
BONUS: — I have talked about a concept called Data wrangling here in this post! and you might like it !
First lets see why we need to do the splitting. The thing is that we want to evaluate our model’s performance.
Why Splitting the dataset at first place?
The main and the most important purpose of splitting data into three different categories is to avoid overfitting which is to pay attention to minor details/noise which are not necessary and only optimizes the training dataset accuracy. We need a model that performs well on dataset that it has never seen (unknown data points), which is called generalization.
But, what is overfitting?
Okay, so I won’t be going into mathematics functions and all and thus I would be explaining in very simple layman language.
Overfitting happens when your developed model learns all(most) of the details and noise in the training data to the extent that it negatively impacts the performance of your model on…