Skip to the content.

Generalization Errors

The end goal of ML is to make the trained model perform well on new cases (in production). Thus we need to find ways to reduce the generalization error: Bias, Variance and Irreducible error.

Types of the Errors

(For the detail, please refer to Bias-Variance Tradeoff and Ridge Regression)

In a generic regression problem, the goal is to model the unknown relationship \(y=f({\bf x})\) between the dependent variable \(y\) and the \(d\) independent variables \({\bf x}=[x_1,\cdots,x_d]^T\), by a regression function \(\hat{f}({\bf x})\), based on a set of given training data \({\cal D}=\{({\bf x}_n,\,y_n)\;n=1,\cdots,N\}\) containing a set of observed data samples inevitably contaminated by some random noise \(e\).

Now corresponding to a noisy \({\bf x}\), the observed value \(y=f({\bf x})+e\) is also random and so is the regresson model \(\hat{f}({\bf x})\). We assume the random noise \(e\) has zero mean and is independent of \(\hat{f}\).

Measure how well the model \(\hat{f}({\bf x})\) fits the noisy data \(y=f({\bf x})+e\) by the mean squared error (MSE):

\[\begin{align*} E [ (y-\hat{f})^2 ] & = E[(f+e-\hat{f})^2 ] \\ & = \cdots \\ & = E[(f-E\hat{f})^2]+E[(E\hat{f}-\hat{f})^2]+E[e^2] \end{align*}\]

The three terms represent three different types of error:

Irreducible Error

Irreducible error is due to the noisiness of the data itself. The only way to reduce this type of error is to clean up the data (e.g., fix the data source, such as broken sensors, or detect and remove outliers).

Bias Error

The bias may because the model does not detect enough underlying patterns in the data set or because that the data set is not representative.

The former case happens when there’s an underfitting issue. For the latter case, the user shall collect enough data set.

Variance Error

This part is due to the model’s excessive sensitivity to small variantions in the training data. It happens when there’s an overfitting issue. It can also because that the learning algorithm is not robust to the ill-condition matrix. In this case, it can be resolved by ridge regression.