Measures of Central Tendency
📊 Measures of Central Tendency –
Understanding data begins with a single question: Where is the data centered?
That’s where averages or measures of central tendency come in.
📘 What Are Averages?
According to Prof. Bowley, averages are:
“Statistical constants which enable us to comprehend in a single effort the significance of the whole.”
In simpler words, an average is a single value that represents an entire data distribution.
🎯 Why Are Averages Important?
Averages help:
-
Summarize large data sets
-
Identify trends
-
Make comparisons
-
Serve as a foundation for further statistical analysis
📌 Common Measures of Central Tendency
There are five widely used averages:
-
Arithmetic Mean (Simply called "Mean")
-
Median
-
Mode
-
Geometric Mean
-
Harmonic Mean
Let’s understand each with examples and Python code.
✅ Requisites of an Ideal Measure of Central Tendency
According to Prof. Yule, a good average must:
-
Be rigidly defined
-
Be easy to understand and compute
-
Use all observations
-
Be suitable for mathematical treatment
-
Be minimally affected by sampling fluctuations
Additionally, a good measure:
6. Should not be overly influenced by extreme values (outliers)
🔢 1. Arithmetic Mean (AM)
The most commonly used average.
Formula:
Python Example:
Output:
For frequency data:
🔢 2. Median
The middle value when the data is sorted.
-
If
n
is odd: Median = middle value -
If
n
is even: Median = average of two middle values
Python Example:
Output:
🔢 3. Mode
The value that occurs most frequently in the data.
Python Example:
Output:
🔢 4. Geometric Mean (GM)
Used for multiplicative processes (e.g., growth rates, financial data).
Formula:
Python Example:
Output:
🔢 5. Harmonic Mean (HM)
Useful when dealing with rates, like speed or density.
Formula:
Python Example:
Output:
📊 Summary Table
Measure | Best Used For | Sensitive to Outliers |
---|---|---|
Arithmetic Mean | General numeric data | ✅ Yes |
Median | Skewed distributions | ❌ No |
Mode | Categorical / repeated values | ❌ No |
Geometric Mean | Percentages, ratios, growth | ✅ Yes |
Harmonic Mean | Rates (e.g., speed, price/unit) | ✅ Yes |
🧠 Final Thoughts
Understanding these five measures gives you the power to:
-
Interpret datasets meaningfully
-
Compare distributions
-
Perform deeper statistical analyses
Start with the mean, consider the median for skewed data, and apply mode, GM, and HM when the context calls for it.
📌 Example 2.1(a) – Ungrouped Frequency Distribution
We are given:
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|
5 | 9 | 12 | 17 | 14 | 10 | 6 |
🧮 Formula for Arithmetic Mean:
✍️ Step-by-Step Calculation:
Let’s calculate for each:
1 | 5 | 5 |
2 | 9 | 18 |
3 | 12 | 36 |
4 | 17 | 68 |
5 | 14 | 70 |
6 | 10 | 60 |
7 | 6 | 42 |
Total | 73 | 299 |
✅ Final Answer:
Mean ≈ 4.096
🐍 Python Code:
📌 Example 2.1(b) – Grouped Frequency Distribution
We are given:
Marks | 0-10 | 10-20 | 20-30 | 30-40 | 40-50 | 50-60 |
---|---|---|---|---|---|---|
Students | 12 | 18 | 27 | 20 | 17 | 6 |
Step 1: Find class midpoints ( )
Class | Frequency () | Midpoint () | |
---|---|---|---|
0–10 | 12 | 5 | 60 |
10–20 | 18 | 15 | 270 |
20–30 | 27 | 25 | 675 |
30–40 | 20 | 35 | 700 |
40–50 | 17 | 45 | 765 |
50–60 | 6 | 55 | 330 |
Total | 100 | 2800 |
🧮 Arithmetic Mean:
✅ Final Answer:
Mean = 28.0
🐍 Python Code:
Assumed Mean Method
When calculating the arithmetic mean directly using:
It may involve heavy multiplication if and are large.
To simplify the arithmetic, we use deviations from an assumed mean :
✅ Assumed Mean Method Formula
Let:
Then,
Where:
-
= assumed mean (a value close to most )
-
-
= frequency
🧮 Derivation:
Given:
Then:
So,
📌 Example:
Let’s take the same data from Example 2.1(a):
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|
5 | 9 | 12 | 17 | 14 | 10 | 6 |
Let’s take assumed mean (the middle value).
Then compute and :
1 | 5 | -3 | -15 |
2 | 9 | -2 | -18 |
3 | 12 | -1 | -12 |
4 | 17 | 0 | 0 |
5 | 14 | +1 | +14 |
6 | 10 | +2 | +20 |
7 | 6 | +3 | +18 |
Total | 73 | +7 |
Now apply:
✅ Same result, simpler multiplication.
Step-Deviation Method for calculating the Arithmetic Mean from a grouped (or continuous) frequency distribution—a very efficient shortcut when class intervals are equal.
🔹 Step-Deviation Method (for Grouped Data)
When:
-
The data is in class intervals (e.g., 0–10, 10–20, etc.)
-
Each class has a uniform width
We use the step-deviation method to simplify calculations further than the assumed mean method.
✅ Step-by-step Formula:
Let:
-
= assumed mean (choose a class near the center of the distribution)
-
= mid-point of each class
-
-
= frequency of each class
-
= common class width
-
= total frequency
Then the arithmetic mean is:
📌 Example
Let’s use the data from your earlier example:
Marks (Class Interval) | |
---|---|
0–10 | 12 |
10–20 | 18 |
20–30 | 27 |
30–40 | 20 |
40–50 | 17 |
50–60 | 6 |
-
Find midpoints of each class:
Class | ||
---|---|---|
0–10 | 12 | 5 |
10–20 | 18 | 15 |
20–30 | 27 | 25 |
30–40 | 20 | 35 |
40–50 | 17 | 45 |
50–60 | 6 | 55 |
-
Choose assumed mean: Let , and (since all intervals are of width 10)
-
Compute step-deviations
5 | 12 | -2 | -24 |
15 | 18 | -1 | -18 |
25 | 27 | 0 | 0 |
35 | 20 | +1 | +20 |
45 | 17 | +2 | +34 |
55 | 6 | +3 | +18 |
N=100 | +30 |
-
Apply the formula:
✅ Summary
Advantages of Step-Deviation Method:
-
Greatly reduces computation
-
Especially helpful in exams and large datasets
-
Only valid when class width is uniform
Let's go through Example 2.2 step by step, verifying and explaining the Step-Deviation Method calculation of the mean for the given frequency distribution.
📊 Given Data:
Class Interval | Mid-value | Frequency | ||
---|---|---|---|---|
0–8 | 4 | 8 | -3 | -24 |
8–16 | 12 | 7 | -2 | -14 |
16–24 | 20 | 16 | -1 | -16 |
24–32 | 28 | 24 | 0 | 0 |
32–40 | 36 | 15 | 1 | 15 |
40–48 | 44 | 7 | 2 | 14 |
Constants:
-
Assumed mean
-
Class width
✅ Step-Deviation Mean Formula:
✅ Final Answer:
📘 Properties of Arithmetic Mean
The arithmetic mean (commonly called the average) is one of the most commonly used measures of central tendency. Apart from being easy to compute, it also possesses several important mathematical properties that make it useful in statistical analysis.
✅ Property 1: Sum of Deviations from the Mean is Zero
If we take a list of numbers with arithmetic mean , then:
👉 This means that the total distance of all values above the mean is exactly balanced by the total distance of all values below the mean.
🔎 Example:
If the values are: 3, 5, 7
-
Mean
-
Deviations:
✅ Property 2: Minimum Sum of Squared Deviations
Among all possible values from which deviations could be measured, the mean gives the minimum sum of squared deviations.
Mathematically, for any constant :
This property is important in least squares estimation, where we try to minimize the squared error — hence, the mean is preferred.
✅ Property 3: Mean of a Composite Series
If you have multiple groups of data, each with its own mean and number of values, you can find the mean of the combined data (composite mean) using:
Let:
-
be the means of groups
-
be the sizes of those groups
Then the mean of the combined (composite) data is:
🔎 Example:
Group A:
-
10 students, average marks = 60
Group B: -
20 students, average marks = 70
Composite mean:
So, the combined average is 66.67.
Example :The average salary of male employees in a firm was Rs. 520, and that of female employees was Rs. 420. The mean salary of all the employees was Rs. 500. Find the percentage of male and female employees in the firm.
Given:
-
Average salary of males,
-
Average salary of females,
-
Overall average salary,
Let:
-
= number of male employees
-
= number of female employees
We use the composite mean formula:
Substituting the known values:
Multiply both sides by :
Expand both sides:
Rearrange the terms:
Divide both sides by 20:
So, the ratio of males to females = 4 : 1
✅ Percentage Calculation
Total parts = 4 (males) + 1 (females) = 5
-
Percentage of male employees:
-
Percentage of female employees:
🎯 Final Answer:
-
Male employees = 80%
-
Female employees = 20%
Merits and Demerits of Arithmetic Mean
✅ Merits of Arithmetic Mean
-
Rigorously Defined:
It is mathematically well-defined and has a precise meaning. -
Simple to Understand and Compute:
Arithmetic mean is easy to grasp and quick to calculate, either manually or with software. -
Based on All Observations:
Every value in the dataset contributes to the computation, making it comprehensive. -
Algebraically Manipulable:
It allows algebraic treatment. For example, the mean of a composite series can be calculated using:where are the means and the sizes of component series.
-
Least Affected by Sampling Fluctuations:
Among all averages, the arithmetic mean is the most stable and consistent across samples. -
Ideal Average (as per Prof. Yule):
It fulfills the theoretical criteria for an ideal average.
❌ Demerits of Arithmetic Mean
-
Cannot Be Found by Inspection or Graphically:
Unlike the mode or median, the mean cannot be located visually. -
Not Suitable for Qualitative Data:It cannot be used for non-quantitative characteristics like honesty, beauty, or intelligence.
-
Sensitive to Missing or Illegible Values:A single missing or invalid value can prevent the computation unless omitted.
-
Affected by Extreme Values (Outliers):A few extremely high or low values can distort the mean, making it non-representative.
-
Can Lead to Misleading Conclusions Without Context:Example:
-
Student A scores: 50%, 60%, 70%
-
Student B scores: 70%, 60%, 50%Both have an average of 60%, but A shows improvement while B deteriorates.
-
-
Not Suitable for Open-End Class Intervals:If the data has open classes (e.g., "above 90"), the mean can't be accurately computed.
-
Unsuitable for Highly Skewed Distributions:In heavily asymmetric data, the mean may not reflect the central tendency properly—median is preferred.
Weighted Mean
In the calculation of the arithmetic mean, we usually assume that all items carry equal importance. However, in real-world situations, some items are more significant than others, and their relative importance should be factored into the calculation. This is where the weighted mean becomes essential.
❓ Why Use a Weighted Mean?
-
The simple mean treats all items equally.
-
But in many practical situations (e.g., cost of living, exam marks), different items have different significance or "weights".
-
Example: While calculating the change in cost of living, essential items like rice or wheat must be given more weight compared to non-essentials like cigarettes or confectionery.
🧮 Formula for Weighted Mean
Let:
-
be the values of the items (e.g., prices, scores),
-
be the weights (importance) assigned to each item.
Then the Weighted Mean is:
This is similar to the formula for the simple mean, with weights replacing frequencies .
📌 Key Observations
-
If all weights are equal, the weighted mean = simple mean.
-
If larger weights are given to larger values, the weighted mean > simple mean.
-
If smaller weights are given to larger values, the weighted mean < simple mean.
✅ Use Cases of Weighted Mean
-
Calculating average grades (where different subjects have different credit weights).
-
Measuring cost of living index (where items like rent, food, transport have different importance).
-
Financial portfolio returns (where each asset has a different investment weight).
Find the simple and weighted arithmetic mean of the first natural numbers, the weights being the corresponding numbers.
Solution:
Let the first natural numbers be:
🔹 Simple Arithmetic Mean (A.M.):
The formula for the sum of the first natural numbers is:
So, the simple arithmetic mean is:
🔹 Weighted Arithmetic Mean:
Here, weights are equal to the values themselves.
So:
We need:
We use the formulas:
Substituting:
✅ Final Answer:
-
Simple Arithmetic Mean =
Weighted Arithmetic Mean =
Median
The median of a distribution is the value that divides it into two equal parts. That is:
Half the observations lie below the median.Half lie above the median.
Hence, median is a positional average (not affected much by extreme values).
🔹 1. Ungrouped Data (Raw Data)
- Odd number of observations:
- Even number of observations:
📌 Example:
- Data: 25, 20, 15, 35, 18
- Data: 8, 20, 50, 25, 15, 30
Median =
📝 Remark: For even-numbered datasets, any value between the two middle values can technically be used as the median, but by convention, we use their average.
✅ Median Formula (for Grouped/Continuous Frequency Data):
Where:
Symbol | Meaning |
---|---|
Lower boundary of the median class | |
Total frequency | |
Cumulative frequency before the median class | |
Frequency of the median class | |
Width (class size) of the median class |
✍️ Interpretation:
-
tells you where the median lies in the cumulative frequency table.
-
Find the class where this value falls → that’s the median class.
-
Plug the values into the formula to get the median.
🧮 Example:
🧮 Given:
Wages (in Rs.) | Frequency (f) |
---|---|
20–30 | 3 |
30–40 | 5 |
40–50 | 20 |
50–60 | 10 |
60–70 | 5 |
➕ Step 1: Find cumulative frequencies (cf)
Wages (in Rs.) | Frequency (f) | Cumulative Frequency (cf) |
---|---|---|
20–30 | 3 | 3 |
30–40 | 5 | 8 |
40–50 | 20 | 28 |
50–60 | 10 | 38 |
60–70 | 5 | 43 |
🔍 Step 2: Identify median class
-
Total number of labourers:
-
Find the class whose cumulative frequency just exceeds 21.5 → it is 40–50, with cf = 28.
So, the median class is: 40–50
🔢 Step 3: Apply the Median formula
Where:
-
(lower limit of median class)
-
-
-
-
(class width)
✅ Final Answer:
Median wage = Rs. 46.75
✅ Merits of Median
-
Rigorously Defined:
Median is clearly and unambiguously defined. It has a specific position in the dataset. -
Easy to Understand and Calculate:
Especially with sorted data, the median can often be found simply by inspection. -
Unaffected by Extreme Values (Outliers):
Unlike the mean, the median is not influenced by unusually high or low values. -
Applicable to Open-Ended Distributions:
Median can be computed even when the distribution has open-ended intervals like "below 10" or "above 100".
❌ Demerits of Median
-
Not Exact for Even Number of Observations:
For even-sized datasets, the median is estimated as the average of the two middle values, which may not reflect an actual data point. -
Ignores Most Data Points:
Median only considers the middle position(s); values far from the center do not affect it. For example:-
Median of {10, 25, 50, 60, 65} is 50.
-
Even if 10 and 25 are changed to 1 and 20 or 60 and 65 are changed to 70 and 80, the median remains 50.
-
-
Not Suitable for Algebraic Treatment:
Median does not lend itself to further statistical operations like mean does (e.g., finding combined medians is not straightforward). -
Affected by Sampling Fluctuations:
Median can vary significantly between samples compared to the mean when samples are small or variable.
📘 Uses of Median
-
For Qualitative Data:
Useful when data is ranked but not measurable (e.g., intelligence levels, honesty ratings). -
In Income and Wealth Distribution:
Commonly used to represent central tendency when dealing with wages or wealth, where data is often skewed.
Mode
Mode is the value in a dataset that occurs most frequently. It represents the most typical or common value around which other values tend to cluster.
🔍 Examples of Mode in Real Life
-
The average height of an Indian male is 5'-6"
→ This refers to the most common height, i.e., mode. -
The average shoe size sold in a shop is 7
→ Shoe size 7 is sold most frequently → Mode = 7 -
An average student spends Rs. 150 per month in a hostel
→ Rs. 150 is the most commonly occurring monthly expenditure → Mode = Rs. 150
📊 Example: Discrete Frequency Distribution
x (Value) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
f (Freq.) | 4 | 9 | 16 | 25 | 22 | 15 | 7 | 3 |
Here, the maximum frequency is 25, which corresponds to x = 4.
-
So, Mode = 4
⚠️ Special Cases Where Mode is Not Easily Identified
-
Repeated Maximum Frequencies
-
If more than one value has the same highest frequency, the distribution is bimodal or multimodal.
-
-
Maximum Frequency at the Beginning or End
-
If the highest frequency is in the first or last class, mode may not give a good central value.
-
-
Irregular Frequency Distribution
-
If the data fluctuates significantly or has no clear peak, mode may be misleading or undefined.
📌 Mode Formula for Continuous Frequency Distribution
🧩 Where:
-
= lower boundary of the modal class
-
= class width (class interval size)
-
= frequency of the modal class
-
= frequency of the class before modal class
-
= frequency of the class after modal class
✅ Steps to Find the Mode
-
Identify the modal class (class with the highest frequency).
-
Plug values into the formula:
📊 Example:
Class Interval | Frequency |
---|---|
10 - 20 | 5 |
20 - 30 | 8 |
30 - 40 | 12 |
40 - 50 | 20 ← Modal Class (highest frequency) |
50 - 60 | 10 |
60 - 70 | 5 |
Here:
-
Modal class = 40–50
-
-
-
-
-
🧮 Substitute in formula:
🎯 Final Answer:
📊Example:
Partition Values
Partition values are the values that divide a series (or dataset) into equal parts.
Types of Partition Values
-
Quartiles
Quartiles divide the data into four equal parts:
-
Q₁ (First Quartile): 25% of observations lie below it, and 75% lie above.
-
Q₂ (Second Quartile): It is the Median; 50% of observations lie below and 50% above.
-
Q₃ (Third Quartile): 75% of observations lie below it, and 25% lie above.
-
-
Deciles
Deciles divide the data into ten equal parts:
-
Notation: D₁, D₂, ..., D₉
-
Example: D₇ (Seventh Decile) means 70% of the observations lie below it, and 30% above.
-
-
Percentiles
Percentiles divide the data into 100 equal parts:
-
Notation: P₁, P₂, ..., P₉₉
-
Example: P₄₇ (47th Percentile) is the value below which 47% of the observations lie.
-
Note on Calculation
The methods used to calculate quartiles, deciles, and percentiles are similar to that used for calculating the median, whether the distribution is:
-
Discrete (list of values with frequencies), or
-
Continuous (grouped frequency distribution).
Example
Eight coins were tossed together, and the number of heads resulting from each toss was recorded. This experiment was repeated 256 times. The following frequency distribution table shows how many times each possible number of heads (from 0 to 8) occurred:
Number of Heads (x) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|---|
Frequency (f) | 1 | 9 | 26 | 59 | 72 | 52 | 29 | 7 | 1 |
Tasks:
Calculate the following statistical measures based on the data provided:
-
Median
-
First Quartile (Q₁)
-
Third Quartile (Q₃)
-
Fourth Decile (D₄)
-
27th Percentile (P₂₇)
Comments
Post a Comment