Classification using MLP
MLPs can also be used for classification tasks. For a binary classification problem, you just need a single output neuron using the logistic activation function: the output will be a number between 0 and 1, which you can interpret as the estimated probability of the positive class. The estimated probability of the negative class is equal to one minus that number.
MLPs can also easily handle multilabel binary classification tasks For example, you could have an email classification system that predicts whether each incoming email is ham or spam, and simultaneously predicts whether it is an urgent or nonurgent email. In this case, you would need two output neurons, both using the logistic activation function: the first would output the probability that the email is spam, and the second would output the probability that it is urgent. More generally, you would dedicate one output neuron for each positive class. Note that the output probabilities do not necessarily add up to 1. This lets the model output any combination of labels: you can have nonurgent ham, urgent ham, nonurgent spam, and perhaps even urgent spam (although that would probably be an error).
If each instance can belong only to a single class, out of three or more possible classes (e.g., classes 0 through 9 for digit image classification), then you need to have one output neuron per class, and you should use the softmax activation function for the whole output layer (see Figure 10-9). The softmax function will ensure that all the estimated probabilities are between 0 and 1 and that they add up to 1 (which is required if the classes are exclusive). This is called multiclass classification.
Regarding the loss function, since we are predicting probability distributions, the cross-entropy loss (also called the log loss) is generally a good choice. Table 10-2 summarizes the typical architecture of a classification MLP.
Key Steps in MLP Classification:
- Initialization:Weights and biases are initialized with small random values.
- Forward Propagation: Data is passed through the layers and the activation function introduces non-linearity helping the network learn complex relationships.
- Loss Calculation: A loss function such as cross-entropy for classification measures how well the model’s predictions match the actual labels.
- Back propagation: The network updates its weights and biases to minimize the loss using optimization techniques like Gradient Descent.
- Prediction: After training the network uses forward propagation with updated weights to predict outcomes on new data.
Although MLPs are highly capable of modeling complex relationships their performance can be sensitive to hyper parameters like the number of layers, number of neurons and choice of activation functions. Thus proper fine tuning is crucial.
Example: Classification using MLP
Lets create a sample dataset:
make_moons
is a function from sklearn.datasets
that generates a synthetic 2D dataset for binary classification. It creates two interleaving crescent-shaped clusters, which look like two half-moons — hence the name.import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=200, noise=0.2)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral, edgecolors='k')
plt.title("Two Moons Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
# 1. Generate toy dataset
X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)
# 2. Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 3. Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# 4. Define the model
model = Sequential([
Dense(16, activation='relu', input_shape=(2,)),
Dense(8, activation='relu'),
Dense(1, activation='sigmoid')
])
# 5. Compile the model
model.compile(optimizer=Adam(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
# 6. Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=16, validation_data=(X_test, y_test), verbose=0)
# 7. Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2f}")
# 8. Plot decision boundary (for visualization)
def plot_decision_boundary(model, X, y):
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
np.linspace(y_min, y_max, 300))
X_grid = np.c_[xx.ravel(), yy.ravel()]
X_grid_scaled = scaler.transform(X_grid)
preds = model.predict(X_grid_scaled)
preds = preds.reshape(xx.shape)
plt.contourf(xx, yy, preds, alpha=0.6, cmap=plt.cm.Spectral)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Spectral)
plt.title("Decision Boundary")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
plot_decision_boundary(model, X, y)
Comments
Post a Comment