What is Underfitting?
Underfitting is a common problem that occurs in machine learning models when they are too simplistic to capture the underlying patterns in the data. This typically happens when the model has high bias and low variance, indicating that it makes strong assumptions about the data.
Understanding Underfitting
Underfitting occurs when a model fails to learn from the training data sufficiently. Instead of capturing the complexity of the data, it produces a model that is too simple, leading to poor performance on both the training and validation datasets.
Characteristics of Underfitting
1. High Training Error: The model performs poorly even on the training dataset, indicating that it has not captured the data's patterns. 2. High Validation Error: Similar to the training error, the model also fails to generalize well to unseen data, resulting in high validation error. 3. Model Simplicity: The model may be too simple (e.g., using a linear regression model for a nonlinear relationship).Visualizing Underfitting
To better understand underfitting, consider the following example:
In this graph, the blue line represents a linear model applied to data that has a quadratic relationship. The model is unable to capture the curvature of the data, resulting in significant errors.
Causes of Underfitting
Underfitting can arise from several factors, including: - Model Complexity: Choosing a model that is too simplistic for the data's complexity. - Insufficient Features: Not including enough relevant features in the model. - Excessive Regularization: Overly penalizing the complexity of the model can lead to underfitting.Example of Underfitting
Let's consider a practical example using Python and the Scikit-learn library. Here, we will create a simple linear regression model for a quadratic dataset:`
python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
Generate synthetic quadratic data
np.random.seed(0) x = 2 - 3 * np.random.rand(100) y = x*2 + np.random.randn(100) 0.5Reshape data
x = x.reshape(-1, 1)Fit a linear regression model
model = LinearRegression() model.fit(x, y)Make predictions
predictions = model.predict(x)Plotting the results
plt.scatter(x, y, color='blue', label='Data points') plt.plot(x, predictions, color='red', label='Underfitting Model') plt.title('Underfitting Example') plt.xlabel('X') plt.ylabel('Y') plt.legend() plt.show()`
In this example, despite the data having a quadratic relationship, the linear regression model fails to accurately model the data, leading to underfitting.