Statistical Summaries
📊Understanding Statistical Summaries with Examples
Statistical summaries are essential tools in data analysis, helping us understand the main characteristics of data at a glance. Whether you're preparing data for machine learning or doing exploratory data analysis (EDA), these summaries are your first step.
In this post, we’ll explore the key types of statistical summaries—measures of central tendency, dispersion, and shape, along with clear examples and Python code.
🔍 What Are Statistical Summaries?
They help answer:
-
What’s the average value?
-
How spread out is the data?
-
Are there outliers or skewness?
We categorize summaries into:
-
Measures of Central Tendency
-
Measures of Dispersion
-
Measures of Shape
Let’s go step by step.
1️⃣ Measures of Central Tendency
These describe the center or average of the data.
🧮 Mean (Average)
📌 Explanation:
🧪 Example and Code:
Output:
Mean: 80.0
🔸 Median
📌 Explanation:
-
Middle value of sorted data.
-
For even n: average of two middle values.
🧪 Example and Code:
Output:
Median: 80.0
🔸 Mode
📌 Explanation:
-
The most frequent value.
-
Can be multiple modes or no mode at all.
🧪 Example and Code:
Output:
Mode: 60
2️⃣ Measures of Dispersion
These describe how spread out the values are.
🔹 Range
📌 Explanation:
🧪 Example and Code:
Output:
Range: 30
🔹 Variance
📌 Explanation:
🧪 Code:
Output:
Variance: 200.0
🔹 Standard Deviation
📌 Explanation:
-
Square root of variance.
-
Indicates average deviation from the mean.
🧪 Code:
Output:
Standard Deviation: 14.14
🔹 Interquartile Range (IQR)
📌 Explanation:
🧪 Code:
Output:
IQR: 20.0
3️⃣ Measures of Shape
These help us understand the distribution of the data.
🔸 Skewness
📌 Explanation:
-
Positive skew: longer tail on the right.
-
Negative skew: longer tail on the left.
📌 Example:
-
If most students scored high and a few scored very low → left-skewed
-
If most students scored low and a few scored very high → right-skewed
If most students scored high and a few scored very low → left-skewed
If most students scored low and a few scored very high → right-skewed
🧪 Code:
🔸 Kurtosis
Kurtosis measures the "tailedness" or peakedness of the distribution.
It answers:
-
Are the data values clustered tightly around the mean?
-
Are there heavy tails (more extreme values/outliers)?
📌 Explanation:
-
High kurtosis: heavy tails (outliers likely).
-
Low kurtosis: light tails (uniform-like).
🧪 Code:
📋 Summary Table Example
Let’s say:
Statistic | Code | Output |
---|---|---|
Mean | np.mean(data) | 19.56 |
Median | np.median(data) | 18.0 |
Mode | stats.mode(data, keepdims=False).mode | 20 |
Range | max(data) - min(data) | 20 |
Standard Deviation | np.std(data) | ~6.36 |
IQR | np.percentile(data, 75) - np.percentile(data, 25) | 9 |
Skewness | skew(data) | ~0.5 |
Kurtosis | kurtosis(data) | ~-1.2 |
📊 Visualizing the Data
Boxplot
Histogram
🧠 When to Use What?
Use Case | Statistic to Prefer |
---|---|
Normally distributed data | Mean, Std Dev |
Skewed data | Median, IQR |
Detecting outliers | Boxplot, IQR |
Understanding distribution | Skewness, Kurtosis |
Quick overview | Summary table |
🎓 Final Thoughts
Statistical summaries are your data's first story. Before modeling or machine learning, use these tools to:
-
Understand the shape and scale of your data
-
Identify problems like outliers or skewness
-
Choose the right preprocessing and modeling techniques
✅ Try This:
Take any dataset (e.g., Titanic, Iris
✅ Try This:
Take any dataset (e.g., Titanic, Iris, or your own project data) and compute:
-
Mean, Median, Mode
-
Standard Deviation, IQR
-
Skewness and Kurtosis
-
Plot a histogram and boxplot
Let the data tell its story!
Comments
Post a Comment