Classification using MLP

June 12, 2025

MLPs can also be used for classification tasks. For a binary classification problem, you just need a single output neuron using the logistic activation function: the output will be a number between 0 and 1, which you can interpret as the estimated probability of the positive class. The estimated probability of the negative class is equal to one minus that number.

MLPs can also easily handle multilabel binary classification tasks For example, you could have an email classification system that predicts whether each incoming email is ham or spam, and simultaneously predicts whether it is an urgent or nonurgent email. In this case, you would need two output neurons, both using the logistic activation function: the first would output the probability that the email is spam, and the second would output the probability that it is urgent. More generally, you would dedicate one output neuron for each positive class. Note that the output probabilities do not necessarily add up to 1. This lets the model output any combination of labels: you can have nonurgent ham, urgent ham, nonurgent spam, and perhaps even urgent spam (although that would probably be an error).

If each instance can belong only to a single class, out of three or more possible classes (e.g., classes 0 through 9 for digit image classification), then you need to have one output neuron per class, and you should use the softmax activation function for the whole output layer (see Figure 10-9). The softmax function will ensure that all the estimated probabilities are between 0 and 1 and that they add up to 1 (which is required if the classes are exclusive). This is called multiclass classification.

Regarding the loss function, since we are predicting probability distributions, the cross-entropy loss (also called the log loss) is generally a good choice. Table 10-2 summarizes the typical architecture of a classification MLP.

Key Steps in MLP Classification:

Initialization:Weights and biases are initialized with small random values.
Forward Propagation: Data is passed through the layers and the activation function introduces non-linearity helping the network learn complex relationships.
Loss Calculation: A loss function such as cross-entropy for classification measures how well the model’s predictions match the actual labels.
Back propagation: The network updates its weights and biases to minimize the loss using optimization techniques like Gradient Descent.
Prediction: After training the network uses forward propagation with updated weights to predict outcomes on new data.

Although MLPs are highly capable of modeling complex relationships their performance can be sensitive to hyper parameters like the number of layers, number of neurons and choice of activation functions. Thus proper fine tuning is crucial.

Example: Classification using MLP

Lets create a sample dataset:make_moons is a function from sklearn.datasets that generates a synthetic 2D dataset for binary classification. It creates two interleaving crescent-shaped clusters, which look like two half-moons — hence the name.

import matplotlib.pyplot as plt

from sklearn.datasets import make_moons

X, y = make_moons(n_samples=200, noise=0.2)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral, edgecolors='k')

plt.title("Two Moons Dataset")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.show()

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_moons

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import Adam

# 1. Generate toy dataset

X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)

# 2. Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Feature scaling

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# 4. Define the model

model = Sequential([

Dense(16, activation='relu', input_shape=(2,)),

Dense(8, activation='relu'),

Dense(1, activation='sigmoid')

])

# 5. Compile the model

model.compile(optimizer=Adam(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])

# 6. Train the model

history = model.fit(X_train, y_train, epochs=50, batch_size=16, validation_data=(X_test, y_test), verbose=0)

# 7. Evaluate the model

loss, accuracy = model.evaluate(X_test, y_test)

print(f"Test Accuracy: {accuracy:.2f}")

# 8. Plot decision boundary (for visualization)

def plot_decision_boundary(model, X, y):

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5

y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),

np.linspace(y_min, y_max, 300))

X_grid = np.c_[xx.ravel(), yy.ravel()]

X_grid_scaled = scaler.transform(X_grid)

preds = model.predict(X_grid_scaled)

preds = preds.reshape(xx.shape)

plt.contourf(xx, yy, preds, alpha=0.6, cmap=plt.cm.Spectral)

plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Spectral)

plt.title("Decision Boundary")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.show()

plot_decision_boundary(model, X, y)

Search This Blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme

Classification using MLP

Comments

Post a Comment

Popular posts from this blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme - Dr Binu V P

Basics of Machine Learning

Model Question Paper and Answers KTU GNEST305 AI and Data Science