Principal Component Analysis PCA
✅ What is PCA?
Principal Component Analysis (PCA) is a dimensionality reduction technique used to:
-
Reduce the number of variables in a dataset,
-
Retain the most important patterns,
-
Remove redundancy (correlation),
-
Improve efficiency in modeling or visualization.
It transforms the original variables into a new set of uncorrelated variables called principal components, ordered by the amount of variance they explain in the data.
✅ Why Use PCA?
PCA is used when:
-
Your dataset has many correlated features.
-
You want to visualize high-dimensional data in 2D or 3D.
-
You want to speed up machine learning models by reducing the number of input features.
-
You want to denoise or compress data.
✅ Key Concepts
🔹 1. Variance
-
Measures the spread or information in the data.
-
PCA tries to retain directions with maximum variance.
🔹 2. Principal Components
-
New axes/directions formed by linear combinations of original variables.
-
First principal component (PC1) captures the most variance, followed by PC2, etc.
🔹 3. Orthogonality
-
All principal components are uncorrelated (orthogonal).
✅ How PCA Works – Step-by-Step
Let’s break PCA into intuitive steps.
🔸 Step 1: Standardize the Data
-
Center the data (subtract the mean).
-
Scale the data (if required).
🔸 Step 2: Compute the Covariance Matrix
This captures relationships (correlations) between variables.
🔸 Step 3: Compute Eigenvalues and Eigenvectors
-
Use Eigen Decomposition of the covariance matrix.
-
Eigenvectors = directions (principal components)
-
Eigenvalues = amount of variance in each direction
🔸 Step 4: Select Top Components
-
Choose the first components that capture most of the variance (e.g., 95%).
🔸 Step 5: Project the Data
-
Transform the data into the new space:
where is a matrix of the top eigenvectors.
✅ Applications of PCA
Application Area | Purpose |
---|---|
Face Recognition | Reduce pixel data to key features |
Genomics | Reduce thousands of gene expressions |
Finance | Reduce correlated financial indicators |
Text Mining | Dimensionality reduction after TF-IDF |
Preprocessing | Before clustering/classification |
✅ Limitations of PCA
-
Assumes linear relationships.
-
Sensitive to scaling (standardization is essential).
-
Not ideal when data is not centered or when interpretability is key.
-
Principal components are linear combinations — may lose physical meaning.
✅ Summary
Feature | Description |
---|---|
Goal | Reduce dimensions while preserving variance |
Method | Orthogonal transformation to new feature space |
Based On | Eigen decomposition or SVD |
Output | New uncorrelated features (principal components) |
Use Case | Preprocessing, noise reduction, visualization |
Toy PCA Example with Dummy Data (2D to 1D Reduction)
🎯 Goal:
Reduce 2D data to 1D using PCA and understand how it works manually.
📌 Step 1: Create Dummy Data
We’ll start with a small dataset of 5 points in 2D:
Each row represents a sample with two features (like Height and Weight).
📌 Step 2: Standardize the Data
Subtract the mean of each column:
$\text{Mean} = \left[\mu_1, \mu_2\right] = \left[{2.04}, 2.24\right]$📌 Step 3: Compute the Covariance Matrix
📌 Step 4: Compute Eigenvalues and Eigenvectors
The eigenvalues of the covariance matrix are:
The corresponding eigenvectors (principal components):
📌 Step 5: Select Top Component(s)
Since
λ1≫λ2, we retain only PC1.
📌 Step 6: Project Data onto PC1
Let’s compute the projection of the first sample:
Repeat this for each row to get the 1D representation.
📉 Summary of Output
Sample | Original (2D) | Projected (1D) |
---|---|---|
1 | [2.5, 2.4] | 0.4304 |
2 | [0.5, 0.7] | -2.0635 |
3 | [2.2, 2.9] | 0.6636 |
4 | [1.9, 2.2] | -0.0808 |
5 | [3.1, 3.0] | 1.9494 |
✅ Interpretation
-
The data originally lived in 2D space.
-
PCA finds the best line that captures the spread of the data.
-
We project the data onto this line → get 1D compressed version.
-
Most of the variation is retained (since λ₁ ≫ λ₂).
📉 Summary of Output
Sample | Original (2D) | Projected (1D) |
---|---|---|
1 | [2.5, 2.4] | 0.4304 |
2 | [0.5, 0.7] | -2.0635 |
3 | [2.2, 2.9] | 0.6636 |
4 | [1.9, 2.2] | -0.0808 |
5 | [3.1, 3.0] | 1.9494 |
✅ Interpretation
-
The data originally lived in 2D space.
-
PCA finds the best line that captures the spread of the data.
-
We project the data onto this line → get 1D compressed version.
-
Most of the variation is retained (since λ₁ ≫ λ₂).
Comments
Post a Comment