Principal Component Analysis PCA



✅ What is PCA?

Principal Component Analysis (PCA) is a dimensionality reduction technique used to:

  • Reduce the number of variables in a dataset,

  • Retain the most important patterns,

  • Remove redundancy (correlation),

  • Improve efficiency in modeling or visualization.

It transforms the original variables into a new set of uncorrelated variables called principal components, ordered by the amount of variance they explain in the data.


✅ Why Use PCA?

PCA is used when:

  • Your dataset has many correlated features.

  • You want to visualize high-dimensional data in 2D or 3D.

  • You want to speed up machine learning models by reducing the number of input features.

  • You want to denoise or compress data.


✅ Key Concepts

🔹 1. Variance

  • Measures the spread or information in the data.

  • PCA tries to retain directions with maximum variance.

🔹 2. Principal Components

  • New axes/directions formed by linear combinations of original variables.

  • First principal component (PC1) captures the most variance, followed by PC2, etc.

🔹 3. Orthogonality

  • All principal components are uncorrelated (orthogonal).


✅ How PCA Works – Step-by-Step

Let’s break PCA into intuitive steps.

🔸 Step 1: Standardize the Data

  • Center the data (subtract the mean).

  • Scale the data (if required).

Xstandardized=XμσX_{\text{standardized}} = \frac{X - \mu}{\sigma}

🔸 Step 2: Compute the Covariance Matrix

Cov(X)=1n1XTX\text{Cov}(X) = \frac{1}{n-1} X^T X

This captures relationships (correlations) between variables.

🔸 Step 3: Compute Eigenvalues and Eigenvectors

  • Use Eigen Decomposition of the covariance matrix.

  • Eigenvectors = directions (principal components)

  • Eigenvalues = amount of variance in each direction

🔸 Step 4: Select Top kk Components

  • Choose the first kk components that capture most of the variance (e.g., 95%).

Explained Variance Ratio=λki=1dλi\text{Explained Variance Ratio} = \frac{\lambda_k}{\sum_{i=1}^d \lambda_i}

🔸 Step 5: Project the Data

  • Transform the data into the new space:

Xreduced=XWkX_{\text{reduced}} = X \cdot W_k

where WkW_k is a matrix of the top kk eigenvectors.

✅ Applications of PCA

Application AreaPurpose
Face RecognitionReduce pixel data to key features
GenomicsReduce thousands of gene expressions
FinanceReduce correlated financial indicators
Text MiningDimensionality reduction after TF-IDF
PreprocessingBefore clustering/classification

✅ Limitations of PCA

  • Assumes linear relationships.

  • Sensitive to scaling (standardization is essential).

  • Not ideal when data is not centered or when interpretability is key.

  • Principal components are linear combinations — may lose physical meaning.


✅ Summary

FeatureDescription
GoalReduce dimensions while preserving variance
Method                            Orthogonal transformation to new feature space
Based OnEigen decomposition or SVD
OutputNew uncorrelated features (principal components)
Use CasePreprocessing, noise reduction, visualization

Toy PCA Example with Dummy Data (2D to 1D Reduction)

🎯 Goal:

Reduce 2D data to 1D using PCA and understand how it works manually.


📌 Step 1: Create Dummy Data

We’ll start with a small dataset of 5 points in 2D:

X=[2.52.40.50.72.22.91.92.23.13.0]X = \begin{bmatrix} 2.5 & 2.4 \\ 0.5 & 0.7 \\ 2.2 & 2.9 \\ 1.9 & 2.2 \\ 3.1 & 3.0 \end{bmatrix}

Each row represents a sample with two features (like Height and Weight).

📌 Step 2: Standardize the Data

Subtract the mean of each column:

$\text{Mean} = \left[\mu_1, \mu_2\right] = \left[{2.04}, 2.24\right]$
Xcentered=[0.460.161.541.540.160.660.140.041.060.76]X_{\text{centered}} = \begin{bmatrix} 0.46 & 0.16 \\ -1.54 & -1.54 \\ 0.16 & 0.66 \\ -0.14 & -0.04 \\ 1.06 & 0.76 \end{bmatrix}

📌 Step 3: Compute the Covariance Matrix

Cov(X)=1n1XTX=[0.61660.61540.61540.7166]\text{Cov}(X) = \frac{1}{n-1} X^T X = \begin{bmatrix} 0.6166 & 0.6154 \\ 0.6154 & 0.7166 \end{bmatrix}

📌 Step 4: Compute Eigenvalues and Eigenvectors

The eigenvalues of the covariance matrix are:

  • λ1=1.284

  • λ2=0.049

The corresponding eigenvectors (principal components):

v1=[0.67790.7352]v2=[0.73520.6779]

📌 Step 5: Select Top Component(s)

Since 

λ1​≫λ2​, we retain only PC1.


📌 Step 6: Project Data onto PC1

Let’s compute the projection of the first sample:

[0.46,0.16][0.67790.7352]=0.46×0.6779+0.16×0.7352=0.3128+0.1176=0.4304[0.46, 0.16] \cdot \begin{bmatrix} 0.6779 \\ 0.7352 \end{bmatrix} = 0.46 \times 0.6779 + 0.16 \times 0.7352 = 0.3128 + 0.1176 = \mathbf{0.4304}

Repeat this for each row to get the 1D representation.

📉 Summary of Output

SampleOriginal (2D)Projected (1D)
1[2.5, 2.4]0.4304
2[0.5, 0.7]-2.0635
3[2.2, 2.9]0.6636
4[1.9, 2.2]-0.0808
5[3.1, 3.0]1.9494

✅ Interpretation

  • The data originally lived in 2D space.

  • PCA finds the best line that captures the spread of the data.

  • We project the data onto this line → get 1D compressed version.

  • Most of the variation is retained (since λ₁ ≫ λ₂).

📉 Summary of Output

SampleOriginal (2D)Projected (1D)
1[2.5, 2.4]0.4304
2[0.5, 0.7]-2.0635
3[2.2, 2.9]0.6636
4[1.9, 2.2]-0.0808
5[3.1, 3.0]1.9494

✅ Interpretation

  • The data originally lived in 2D space.

  • PCA finds the best line that captures the spread of the data.

  • We project the data onto this line → get 1D compressed version.

  • Most of the variation is retained (since λ₁ ≫ λ₂).

📌 Python Code

import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Step 1: Dummy data
X = np.array([
    [2.5, 2.4],
    [0.5, 0.7],
    [2.2, 2.9],
    [1.9, 2.2],
    [3.1, 3.0]
])

# Step 2: Standardize (mean center)
X_centered = X - np.mean(X, axis=0)

# Step 3-5: PCA
pca = PCA(n_components=1)
X_pca = pca.fit_transform(X_centered)

# Step 6: Show original and projected
print("Original Data:\n", X)
print("Projected 1D Data:\n", X_pca)

# Optional: Plot
plt.figure(figsize=(6, 4))
plt.scatter(X[:, 0], X[:, 1], color='blue', label='Original Data')
for i in range(len(X)):
    plt.plot([X[i,0], pca.inverse_transform(X_pca)[i,0]],
             [X[i,1], pca.inverse_transform(X_pca)[i,1]], 'r--')
plt.title("PCA Projection to 1D")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.axis('equal')
plt.grid(True)
plt.show()

Output
Original Data:
 [[2.5 2.4]
 [0.5 0.7]
 [2.2 2.9]
 [1.9 2.2]
 [3.1 3. ]]
Projected 1D Data:
 [[ 0.44362444]
 [-2.17719404]
 [ 0.57071239]
 [-0.12902465]
 [ 1.29188186]]


import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Step 1: Define dummy 2D data
X = np.array([
    [2.5, 2.4],
    [0.5, 0.7],
    [2.2, 2.9],
    [1.9, 2.2],
    [3.1, 3.0]
])

# Step 2: Mean center the data
X_meaned = X - np.mean(X, axis=0)

# Step 3: Apply PCA to reduce to 1D
pca = PCA(n_components=1)
X_1D = pca.fit_transform(X_meaned)
X_projected = pca.inverse_transform(X_1D)

# Step 4: Plotting

plt.figure(figsize=(10, 6))

# Plot original data
plt.scatter(X_meaned[:, 0], X_meaned[:, 1], color='blue', label='Original Data')

# Plot projected data (back in 2D space)
plt.scatter(X_projected[:, 0], X_projected[:, 1], color='red',
 label='Projected (1D -> 2D)', marker='x')

# Draw lines connecting original and projected points
for i in range(len(X)):
    plt.plot([X_meaned[i, 0], X_projected[i, 0]],
             [X_meaned[i, 1], X_projected[i, 1]],
             'gray', linestyle='--', linewidth=1)

# Plot first principal component as arrow
pc1 = pca.components_[0]
origin = np.zeros(2)
plt.quiver(*origin, *pc1, scale=3, color='green', label='Principal Component 1', 
width=0.01)

plt.title("PCA: Original Data and 1D Projection in 2D Space")
plt.xlabel("Feature 1 (centered)")
plt.ylabel("Feature 2 (centered)")
plt.axis('equal')
plt.grid(True)
plt.legend()
plt.show()



✅ Python Code Example with Iris Dataset

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load dataset
data = load_iris()
X = data.data
y = data.target
labels = data.target_names

# Step 1: Standardize
X_std = StandardScaler().fit_transform(X)

# Step 2-4: Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_std)

# Step 5: Plot
plt.figure(figsize=(8, 6))
for i, label in enumerate(labels):
    plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], label=label)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('PCA of Iris Dataset')
plt.legend()
plt.grid(True)
plt.show()

Comments

Popular posts from this blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme - Dr Binu V P

Basics of Machine Learning

Types of Machine Learning Systems