Statistical Summaries

June 04, 2025

📊Understanding Statistical Summaries with Examples

Statistical summaries are essential tools in data analysis, helping us understand the main characteristics of data at a glance. Whether you're preparing data for machine learning or doing exploratory data analysis (EDA), these summaries are your first step.

In this post, we’ll explore the key types of statistical summaries—measures of central tendency, dispersion, and shape, along with clear examples and Python code.

🔍 What Are Statistical Summaries?

They help answer:

What’s the average value?
How spread out is the data?
Are there outliers or skewness?

We categorize summaries into:

Measures of Central Tendency
Measures of Dispersion
Measures of Shape

Let’s go step by step.

1️⃣ Measures of Central Tendency

These describe the center or average of the data.

🧮 Mean (Average)

📌 Explanation:

\text{Mean} = \frac{\sum x_i}{n}

🧪 Example and Code:

Scores = [60, 70, 80, 90, 100]
Mean = (60 + 70 + 80 + 90 + 100) / 5 = 80

import numpy as np

data = [60, 70, 80, 90, 100]
mean_value = np.mean(data)
print("Mean:", mean_value)

Output:
Mean: 80.0

🔸 Median

📌 Explanation:

Middle value of sorted data.
For even n: average of two middle values.

🧪 Example and Code:

Examples:
Odd count: [10, 20, 30] → Median = 20
Even count: [10, 20, 30, 40] → Median = (20 + 30) / 2 = 25
import numpy as np

data = [60, 70, 80, 90, 100]
median_value = np.median(data)
print("Median:", median_value)

Output:
Median: 80.0

🔸 Mode

📌 Explanation:

The most frequent value.
Can be multiple modes or no mode at all.

🧪 Example and Code:

Data = [5, 6, 7, 7, 8] → Mode = 7

from scipy import stats

data=[60, 70, 80, 90, 100, 60]
mode_value = stats.mode(data, keepdims=False)
print("Mode:", mode_value.mode)

Output:
Mode: 60

2️⃣ Measures of Dispersion

These describe how spread out the values are.

🔹 Range

📌 Explanation:

\text{Range} = \text{Max} - \text{Min}

🧪 Example and Code:

Data = [20, 35, 50] → Range = 50 - 20 = 30

data=[20,35,50]
range_value = max(data) - min(data)
print("Range:", range_value)

Output:
Range: 30

🔹 Variance

📌 Explanation:

\text{Variance} = \frac{1}{n} \sum (x_i - \mu)^2

🧪 Code:

import numpy as np
data = [60, 70, 80, 90, 100]
variance = np.var(data)
print("Variance:", variance)

Output:
Variance: 200.0

🔹 Standard Deviation

📌 Explanation:

Square root of variance.
Indicates average deviation from the mean.

🧪 Code:

import numpy as np
data = [60, 70, 80, 90, 100]
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

Output:
Standard Deviation: 14.14

🔹 Interquartile Range (IQR)

📌 Explanation:

\text{IQR} = Q3 - Q1

🧪 Code:

import numpy as np
data = [60, 70, 80, 90, 100]
q1 = np.percentile(data, 25)
q3 = np.percentile(data, 75)
iqr = q3 - q1
print("IQR:", iqr)

Output:
IQR: 20.0

3️⃣ Measures of Shape

These help us understand the distribution of the data.

🔸 Skewness

Skewness measures the asymmetry of a distribution — whether the data leans to the left or right of the mean.

📌 Explanation:

Positive skew: longer tail on the right.
Negative skew: longer tail on the left.

📌 Example:

If most students scored high and a few scored very low → left-skewed

If most students scored low and a few scored very high → right-skewed

🧪 Code:

from scipy.stats import skew
data = [60, 70, 80, 90, 100]
skewness = skew(data)
print("Skewness:", skewness)

🔸 Kurtosis

Kurtosis measures the "tailedness" or peakedness of the distribution.

It answers:

Are the data values clustered tightly around the mean?
Are there heavy tails (more extreme values/outliers)?

📌 Explanation:

High kurtosis: heavy tails (outliers likely).
Low kurtosis: light tails (uniform-like).

🧪 Code:


from scipy.stats import kurtosis
data = [60, 70, 80, 90, 100]
kurt = kurtosis(data)
print("Kurtosis:", kurt)

📋 Summary Table Example

Let’s say:


data = [10, 12, 15, 18, 20, 20, 21, 24, 30]

Statistic	Code	Output
Mean	`np.mean(data)`	19.56
Median	`np.median(data)`	18.0
Mode	`stats.mode(data, keepdims=False).mode`	20
Range	`max(data) - min(data)`	20
Standard Deviation	`np.std(data)`	~6.36
IQR	`np.percentile(data, 75) - np.percentile(data, 25)`	9
Skewness	`skew(data)`	~0.5
Kurtosis	`kurtosis(data)`	~-1.2

📊 Visualizing the Data

Boxplot


import matplotlib.pyplot as plt
import seaborn as sns
data = [10, 12, 15, 18, 20, 20, 21, 24, 30]
sns.boxplot(data=data)
plt.title("Boxplot")
plt.show()

Histogram

import matplotlib.pyplot as plt
data = [10, 12, 15, 18, 20, 20, 21, 24, 30]
plt.hist(data, bins=5, edgecolor='black')
plt.title("Histogram")
plt.show()

🧠 When to Use What?

Use Case	Statistic to Prefer
Normally distributed data	Mean, Std Dev
Skewed data	Median, IQR
Detecting outliers	Boxplot, IQR
Understanding distribution	Skewness, Kurtosis
Quick overview	Summary table

🎓 Final Thoughts

Statistical summaries are your data's first story. Before modeling or machine learning, use these tools to:

Understand the shape and scale of your data
Identify problems like outliers or skewness
Choose the right preprocessing and modeling techniques

✅ Try This:

Take any dataset (e.g., Titanic, Iris

✅ Try This:

Take any dataset (e.g., Titanic, Iris, or your own project data) and compute:

Mean, Median, Mode
Standard Deviation, IQR
Skewness and Kurtosis
Plot a histogram and boxplot

Let the data tell its story!

Statistical Summaries

📊Understanding Statistical Summaries with Examples

🔍 What Are Statistical Summaries?

1️⃣ Measures of Central Tendency

🧮 Mean (Average)

📌 Explanation:

🧪 Example and Code:

🔸 Median

📌 Explanation:

🧪 Example and Code:

🔸 Mode

📌 Explanation:

🧪 Example and Code:

2️⃣ Measures of Dispersion

🔹 Range

📌 Explanation:

🧪 Example and Code:

🔹 Variance

📌 Explanation:

🧪 Code:

🔹 Standard Deviation

📌 Explanation:

🧪 Code:

🔹 Interquartile Range (IQR)

📌 Explanation:

🧪 Code:

3️⃣ Measures of Shape

🔸 Skewness

📌 Explanation:

📌 Example:

If most students scored high and a few scored very low → left-skewed If most students scored low and a few scored very high → right-skewed

🧪 Code:

🔸 Kurtosis

📌 Explanation:

🧪 Code:

📋 Summary Table Example

📊 Visualizing the Data

Boxplot

Histogram

🧠 When to Use What?

🎓 Final Thoughts

✅ Try This:

✅ Try This:

Comments

Post a Comment

Popular posts from this blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme - Dr Binu V P

Basics of Machine Learning

Model Question Paper and Answers KTU GNEST305 AI and Data Science

If most students scored high and a few scored very low → left-skewed

If most students scored low and a few scored very high → right-skewed