Overfitting refers to an issue where a machine learning model is too adjusted to the dataset it was trained on, so it is not able to apply what it has learned to a new data set. This means that the model will make incorrect or problematic predictions that a business cannot rely on.
In other words, an automated machine learning platform is experiencing overfitting when it is so specific to the original data that it is unable to produce accurate outcomes from new data. When overfitting occurs, the model is only useful when applied to the exact data set that it was trained on.
The term goodness of fit refers to how closely the predictions a model makes match the actual values. If your model has learned the noise in the data rather than the signals it is giving off, it is overfitted and will not be accurate with new datasets.
The opposite problem can occur as well – underfitting. Underfitting is when a model does not have enough features informing it or is too regularized so that it is also simple to make meaningful predictions.
For instance, let’s say you create a model to predict whether a loan application is approved, and you train it on a dataset of 50,000 applications. When you test it, it is 99% accurate – but when you try it with new data, it is only 50% accurate. This means that the model did not generalize well and is a perfect example of overfitting.
Identifying if a model is experiencing overfitting is an essential part of the machine learning lifecycle that must occur before deploying an AI platform. Overfitting leads the model to misrepresent data and will not be accurate when applied to new inputs, so this must be corrected early on.
An overfitted machine learning model can be very misleading, as developers may often deploy it because they think it is highly accurate and ready to be used to make data-driven decisions. The goal should be to find the trend, not to find a line that fits all the data points.
When you work with LogicPlum, we provide all the tools you need to protect your automated machine learning platform from overfitting – throughout the entire machine learning lifecycle. We use techniques like training-validation holdout and cross-validation to ensure your business can make accurate, data-driven decisions.