Measures of Central Tendency

 

📊  Measures of Central Tendency – 

Understanding data begins with a single question: Where is the data centered?
That’s where averages or measures of central tendency come in.


📘 What Are Averages?

According to Prof. Bowley, averages are:

“Statistical constants which enable us to comprehend in a single effort the significance of the whole.”

In simpler words, an average is a single value that represents an entire data distribution.


🎯 Why Are Averages Important?

Averages help:

  • Summarize large data sets

  • Identify trends

  • Make comparisons

  • Serve as a foundation for further statistical analysis


📌 Common Measures of Central Tendency

There are five widely used averages:

  1. Arithmetic Mean (Simply called "Mean")

  2. Median

  3. Mode

  4. Geometric Mean

  5. Harmonic Mean

Let’s understand each with examples and Python code.


✅ Requisites of an Ideal Measure of Central Tendency

According to Prof. Yule, a good average must:

  1. Be rigidly defined

  2. Be easy to understand and compute

  3. Use all observations

  4. Be suitable for mathematical treatment

  5. Be minimally affected by sampling fluctuations

Additionally, a good measure:
6. Should not be overly influenced by extreme values (outliers)


🔢 1. Arithmetic Mean (AM)

The most commonly used average.

Formula:

xˉ=x1+x2++xnn\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n}

Python Example:


import statistics data = [10, 20, 30, 40, 50] mean = statistics.mean(data) print("Mean:", mean)

Output:


Mean: 30

For frequency data:

xˉ=fixifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i}

🔢 2. Median

The middle value when the data is sorted.

  • If n is odd: Median = middle value

  • If n is even: Median = average of two middle values

Python Example:


data = [10, 20, 30, 40, 50] median = statistics.median(data) print("Median:", median)

Output:


Median: 30

🔢 3. Mode

The value that occurs most frequently in the data.

Python Example:


data = [10, 20, 20, 30, 40] mode = statistics.mode(data) print("Mode:", mode)

Output:


Mode: 20

🔢 4. Geometric Mean (GM)

Used for multiplicative processes (e.g., growth rates, financial data).

Formula:

GM=x1x2xnnGM = \sqrt[n]{x_1 x_2 \cdots x_n}

Python Example:


import math from statistics import geometric_mean data = [2, 4, 8] gm = geometric_mean(data) print("Geometric Mean:", gm)

Output:


Geometric Mean: 4.0

🔢 5. Harmonic Mean (HM)

Useful when dealing with rates, like speed or density.

Formula:

HM=n1xiHM = \frac{n}{\sum \frac{1}{x_i}}

Python Example:


from statistics import harmonic_mean data = [2, 4, 4] hm = harmonic_mean(data) print("Harmonic Mean:", hm)

Output:


Harmonic Mean: 3.0

📊 Summary Table

MeasureBest Used ForSensitive to Outliers
Arithmetic Mean        General numeric data            ✅ Yes
Median        Skewed distributions        ❌ No
Mode        Categorical / repeated values        ❌ No
Geometric Mean        Percentages, ratios, growth        ✅ Yes
Harmonic Mean        Rates (e.g., speed, price/unit)        ✅ Yes

🧠 Final Thoughts

Understanding these five measures gives you the power to:

  • Interpret datasets meaningfully

  • Compare distributions

  • Perform deeper statistical analyses

Start with the mean, consider the median for skewed data, and apply mode, GM, and HM when the context calls for it.

📌 Example 2.1(a) – Ungrouped Frequency Distribution

We are given:

xx
    1    2    3    4    5    6    7
ff
    5    9    12    17    14    10    6

🧮 Formula for Arithmetic Mean:

xˉ=fixifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i}

✍️ Step-by-Step Calculation:

Let’s calculate fixif_i x_i for each:

xix_ifif_ifixif_i x_i
15    5
29    18
312    36
417    68
514    70
610    60
7642
Total73299

xˉ=299734.096\bar{x} = \frac{299}{73} \approx 4.096

✅ Final Answer:

Mean ≈ 4.096

🐍 Python Code:


x = [1, 2, 3, 4, 5, 6, 7] f = [5, 9, 12, 17, 14, 10, 6] total_fx = sum([f[i] * x[i] for i in range(len(x))]) total_f = sum(f) mean = total_fx / total_f print("Mean:", round(mean, 3))

📌 Example 2.1(b) – Grouped Frequency Distribution

We are given:

Marks0-10    10-20    20-30    30-40    40-50    50-60
Students12        18        27        20        17    6

Step 1: Find class midpoints ( xix_i )

xi=Lower Limit+Upper Limit2x_i = \frac{\text{Lower Limit} + \text{Upper Limit}}{2}

Class    Frequency (fif_i)        Midpoint (xix_i)fixif_i x_i
0–10        12560
10–20        1815270
20–30        2725675
30–40        2035700
40–50        1745765
50–60655330
Total1002800

🧮 Arithmetic Mean:

xˉ=fixifi=2800100=28.0\bar{x} = \frac{\sum f_i x_i}{\sum f_i} = \frac{2800}{100} = 28.0

✅ Final Answer:

Mean = 28.0

🐍 Python Code:


class_intervals = [(0, 10), (10, 20), (20, 30), (30, 40), (40, 50), (50, 60)] frequencies = [12, 18, 27, 20, 17, 6] # Calculate midpoints midpoints = [(low + high) / 2 for low, high in class_intervals] # Compute mean total_fx = sum([frequencies[i] * midpoints[i] for i in range(len(midpoints))]) total_f = sum(frequencies) mean = total_fx / total_f print("Mean:", mean)


Assumed Mean Method

When calculating the arithmetic mean directly using:

xˉ=fixifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i}

It may involve heavy multiplication if xix_iand fif_i are large.

To simplify the arithmetic, we use deviations from an assumed mean AA:

✅ Assumed Mean Method Formula

Let:

di=xiAd_i = x_i - A

Then,

xˉ=A+fidifi\bar{x} = A + \frac{\sum f_i d_i}{\sum f_i}

Where:

  • AA = assumed mean (a value close to most xix_i)

  • di=xiAd_i = x_i - A

  • fif_i = frequency


🧮 Derivation:

Given:

di=xiAxi=di+Ad_i = x_i - A \Rightarrow x_i = d_i + A

Then:

fixi=fi(di+A)=fidi+Afi\sum f_i x_i = \sum f_i (d_i + A) = \sum f_i d_i + A \sum f_i

So,

xˉ=fixifi=fidi+Afifi=fidifi+A=A+fidifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i} = \frac{\sum f_i d_i + A \sum f_i}{\sum f_i} = \frac{\sum f_i d_i}{\sum f_i} + A = A + \frac{\sum f_i d_i}{\sum f_i}


📌 Example:

Let’s take the same data from Example 2.1(a):

xix_i1234567
fif_i59121714106

Let’s take assumed mean A=4A = 4 (the middle value).

Then compute di=xiAd_i = x_i - A and fidif_i d_i:

xix_ifif_idi=xi4d_i = x_i - 4
 fidif_i d_i
15-3-15
29-2-18
312-1-12
41700
514+1+14
610+2+20
76+3+18
Total73+7

Now apply:

xˉ=A+fidifi=4+7734.096\bar{x} = A + \frac{\sum f_i d_i}{\sum f_i} = 4 + \frac{7}{73} \approx 4.096

✅ Same result, simpler multiplication.

Step-Deviation Method for calculating the Arithmetic Mean from a grouped (or continuous) frequency distribution—a very efficient shortcut when class intervals are equal.

🔹 Step-Deviation Method (for Grouped Data)

When:

  • The data is in class intervals (e.g., 0–10, 10–20, etc.)

  • Each class has a uniform width hh

We use the step-deviation method to simplify calculations further than the assumed mean method.


✅ Step-by-step Formula:

Let:

  • AA = assumed mean (choose a class near the center of the distribution)

  • xix_i = mid-point of each class

  • di=xiAhd_i = \frac{x_i - A}{h}

  • fif_i= frequency of each class

  • hh = common class width

  • N=fiN = \sum f_i = total frequency

Then the arithmetic mean is:

xˉ=A+hfidifi


📌 Example

Let’s use the data from your earlier example:

Marks (Class Interval)fif_i
0–1012
10–2018
20–3027
30–4020
40–5017
50–606
  1. Find midpoints xix_i of each class:

Classfif_ixix_i
0–10    12        5
10–20    18        15
20–30    27        25
30–40    20        35
40–50    17        45
50–60    6        55
  1. Choose assumed mean: Let A=25A = 25, and h=10h = 10 (since all intervals are of width 10)

  2. Compute step-deviations di=xiAhd_i = \frac{x_i - A}{h}

xix_ifif_idid_ifidif_i d_i
5 12-2    -24
15 18-1    -18
25 270    0
35        20+1    +20
4517+2    +34
556+3    +18
N=100    +30
  1. Apply the formula:

xˉ=A+hfidifi=25+1030100=25+3=28\bar{x} = A + h \cdot \frac{\sum f_i d_i}{\sum f_i} = 25 + 10 \cdot \frac{30}{100} = 25 + 3 = \boxed{28}


✅ Summary

Advantages of Step-Deviation Method:

  • Greatly reduces computation

  • Especially helpful in exams and large datasets

  • Only valid when class width hh is uniform

Let's go through Example 2.2 step by step, verifying and explaining the Step-Deviation Method calculation of the mean for the given frequency distribution.


📊 Given Data:

Class Interval    Mid-value xx
    Frequency ff
d=xAhd = \frac{x - A}{h}fdf \cdot d
0–8    4    8    -3        -24
8–16    12    7    -2    -14
16–24    20    16    -1    -16
24–32    28    24    0    0
32–40    36    15        1    15
40–48    44    7    2    14
N=77N = 77
fd=25\sum fd = -25

Constants:

  • Assumed mean A=28

  • Class width h=8h = 8


✅ Step-Deviation Mean Formula:

xˉ=A+hfdf\bar{x} = A + h \cdot \frac{\sum f d}{\sum f} xˉ=28+8(2577)\bar{x} = 28 + 8 \cdot \left(\frac{-25}{77}\right)
xˉ=282.597=25.403925.404\bar{x} = 28 - 2.597 = \boxed{25.4039} \approx \boxed{25.404}

✅ Final Answer:

xˉ=25.404\boxed{\bar{x} = 25.404}

📘 Properties of Arithmetic Mean

The arithmetic mean (commonly called the average) is one of the most commonly used measures of central tendency. Apart from being easy to compute, it also possesses several important mathematical properties that make it useful in statistical analysis.


Property 1: Sum of Deviations from the Mean is Zero

If we take a list of numbers x1,x2,...,xnx_1, x_2, ..., x_n with arithmetic mean xˉ\bar{x}, then:

i=1n(xixˉ)=0\sum_{i=1}^{n}(x_i - \bar{x}) = 0

👉 This means that the total distance of all values above the mean is exactly balanced by the total distance of all values below the mean.

🔎 Example:

If the values are: 3, 5, 7

  • Mean xˉ=3+5+73=5\bar{x} = \frac{3 + 5 + 7}{3} = 5

  • Deviations: (35)+(55)+(75)=2+0+2=0(3 - 5) + (5 - 5) + (7 - 5) = -2 + 0 + 2 = 0


Property 2: Minimum Sum of Squared Deviations

Among all possible values from which deviations could be measured, the mean gives the minimum sum of squared deviations.

Mathematically, for any constant aa:

i=1n(xixˉ)2i=1n(xia)2\sum_{i=1}^{n}(x_i - \bar{x})^2 \leq \sum_{i=1}^{n}(x_i - a)^2

This property is important in least squares estimation, where we try to minimize the squared error — hence, the mean is preferred.


Property 3: Mean of a Composite Series

If you have multiple groups of data, each with its own mean and number of values, you can find the mean of the combined data (composite mean) using:

Let:

  • xˉ1,xˉ2,...,xˉk\bar{x}_1, \bar{x}_2, ..., \bar{x}_k be the means of kk groups

  • n1,n2,...,nkn_1, n_2, ..., n_k be the sizes of those groups

Then the mean of the combined (composite) data is:

xˉ=n1xˉ1+n2xˉ2++nkxˉkn1+n2++nk\bar{x} = \frac{n_1 \bar{x}_1 + n_2 \bar{x}_2 + \cdots + n_k \bar{x}_k}{n_1 + n_2 + \cdots + n_k}

🔎 Example:

Group A:

  • 10 students, average marks = 60
    Group B:

  • 20 students, average marks = 70

Composite mean:

xˉ=10×60+20×7010+20=600+140030=200030=66.67\bar{x} = \frac{10 \times 60 + 20 \times 70}{10 + 20} = \frac{600 + 1400}{30} = \frac{2000}{30} = 66.67

So, the combined average is 66.67.

Example :The average salary of male employees in a firm was Rs. 520, and that of female employees was Rs. 420. The mean salary of all the employees was Rs. 500. Find the percentage of male and female employees in the firm.

Given:

  • Average salary of males, x1=520

  • Average salary of females, x2=420x_2 = 420

  • Overall average salary, xˉ=500\bar{x} = 500

Let:

  • n1n_1 = number of male employees

  • n2n_2= number of female employees

We use the composite mean formula:

xˉ=n1x1+n2x2n1+n2\bar{x} = \frac{n_1 x_1 + n_2 x_2}{n_1 + n_2}

Substituting the known values:

500=520n1+420n2n1+n2500 = \frac{520n_1 + 420n_2}{n_1 + n_2}

Multiply both sides by n1+n2n_1 + n_2:

500(n1+n2)=520n1+420n2500(n_1 + n_2) = 520n_1 + 420n_2

Expand both sides:

500n1+500n2=520n1+420n2500n_1 + 500n_2 = 520n_1 + 420n_2

Rearrange the terms:

520n1500n1=500n2420n220n1=80n2520n_1 - 500n_1 = 500n_2 - 420n_2 \Rightarrow 20n_1 = 80n_2

Divide both sides by 20:

n1=4n2n1n2=41n_1 = 4n_2 \Rightarrow \frac{n_1}{n_2} = \frac{4}{1}

So, the ratio of males to females = 4 : 1


Percentage Calculation

Total parts = 4 (males) + 1 (females) = 5

  • Percentage of male employees:

45×100=80%\frac{4}{5} \times 100 = 80\%

  • Percentage of female employees:

15×100=20%\frac{1}{5} \times 100 = 20\%


🎯 Final Answer:

  • Male employees = 80%

  • Female employees = 20%

Merits and Demerits of Arithmetic Mean

Merits of Arithmetic Mean

  1. Rigorously Defined:
    It is mathematically well-defined and has a precise meaning.

  2. Simple to Understand and Compute:

    Arithmetic mean is easy to grasp and quick to calculate, either manually or with software.

  3. Based on All Observations:

    Every value in the dataset contributes to the computation, making it comprehensive.

  4. Algebraically Manipulable:

    It allows algebraic treatment. For example, the mean of a composite series can be calculated using:

    xˉ=i=1knixˉii=1kni\bar{x} = \frac{\sum_{i=1}^{k} n_i \bar{x}_i}{\sum_{i=1}^{k} n_i}

    where xˉi\bar{x}_i are the means and nin_i the sizes of kk component series.

  5. Least Affected by Sampling Fluctuations:
    Among all averages, the arithmetic mean is the most stable and consistent across samples.

  6. Ideal Average (as per Prof. Yule):
    It fulfills the theoretical criteria for an ideal average.


Demerits of Arithmetic Mean

  1. Cannot Be Found by Inspection or Graphically:

    Unlike the mode or median, the mean cannot be located visually.

  2. Not Suitable for Qualitative Data:
    It cannot be used for non-quantitative characteristics like honesty, beauty, or intelligence.

  3. Sensitive to Missing or Illegible Values:
    A single missing or invalid value can prevent the computation unless omitted.

  4. Affected by Extreme Values (Outliers):
    A few extremely high or low values can distort the mean, making it non-representative.

  5. Can Lead to Misleading Conclusions Without Context:
    Example:

    • Student A scores: 50%, 60%, 70%

    • Student B scores: 70%, 60%, 50%
      Both have an average of 60%, but A shows improvement while B deteriorates.

  6. Not Suitable for Open-End Class Intervals:
    If the data has open classes (e.g., "above 90"), the mean can't be accurately computed.

  7. Unsuitable for Highly Skewed Distributions:
    In heavily asymmetric data, the mean may not reflect the central tendency properly—median is preferred.

Weighted Mean

In the calculation of the arithmetic mean, we usually assume that all items carry equal importance. However, in real-world situations, some items are more significant than others, and their relative importance should be factored into the calculation. This is where the weighted mean becomes essential.


Why Use a Weighted Mean?

  • The simple mean treats all items equally.

  • But in many practical situations (e.g., cost of living, exam marks), different items have different significance or "weights".

  • Example: While calculating the change in cost of living, essential items like rice or wheat must be given more weight compared to non-essentials like cigarettes or confectionery.


🧮 Formula for Weighted Mean

Let:

  • XiX_i be the values of the items (e.g., prices, scores),

  • WiW_i be the weights (importance) assigned to each item.

Then the Weighted Mean is:

Xˉw=WiXiWi\bar{X}_w = \frac{\sum W_i X_i}{\sum W_i}

This is similar to the formula for the simple mean, with weights WiW_i replacing frequencies fif_i.


📌 Key Observations

  1. If all weights are equal, the weighted mean = simple mean.

  2. If larger weights are given to larger values, the weighted mean > simple mean.

  3. If smaller weights are given to larger values, the weighted mean < simple mean.


Use Cases of Weighted Mean

  • Calculating average grades (where different subjects have different credit weights).

  • Measuring cost of living index (where items like rent, food, transport have different importance).

  • Financial portfolio returns (where each asset has a different investment weight).

Find the simple and weighted arithmetic mean of the first nn natural numbers, the weights being the corresponding numbers.



Find the simple and weighted arithmetic mean of the first n natural numbers, the weights being the corresponding numbers

Solution:

Let the first nn natural numbers be:

1,2,3,,n1, 2, 3, \dots, n

🔹 Simple Arithmetic Mean (A.M.):

The formula for the sum of the first nn natural numbers is:

X=1+2+3++n=n(n+1)2\sum X = 1 + 2 + 3 + \dots + n = \frac{n(n+1)}{2}

So, the simple arithmetic mean is:

Xˉ=Xn=1+2+3++nn=n(n+1)2n=n+12\bar{X} = \frac{\sum X}{n} = \frac{1 + 2 + 3 + \dots + n}{n} = \frac{\frac{n(n+1)}{2}}{n} = \frac{n+1}{2}

🔹 Weighted Arithmetic Mean:

Here, weights WiW_iare equal to the values XiX_i themselves.

So:

  • Wi=XiW_i = X_i

  • WiXi=Xi2W_i X_i = X_i^2

We need:

Xˉw=WiXiWi=Xi2Xi\bar{X}_w = \frac{\sum W_i X_i}{\sum W_i} = \frac{\sum X_i^2}{\sum X_i}

We use the formulas:

  • Xi=1+2++n=n(n+1)2\sum X_i = 1 + 2 + \dots + n = \frac{n(n+1)}{2}

  • Xi2=12+22++n2=n(n+1)(2n+1)6\sum X_i^2 = 1^2 + 2^2 + \dots + n^2 = \frac{n(n+1)(2n+1)}{6}

Substituting:

Xˉw=n(n+1)(2n+1)6n(n+1)2=(2n+1)3\bar{X}_w = \frac{\frac{n(n+1)(2n+1)}{6}}{\frac{n(n+1)}{2}} = \frac{(2n+1)}{3}

Final Answer:

  • Simple Arithmetic Mean = n+12\frac{n+1}{2}

Weighted Arithmetic Mean = 2n+13


Median

The median of a distribution is the value that divides it into two equal parts. That is:

Half the observations lie below the median.
Half lie above the median.

Hence, median is a positional average (not affected much by extreme values).


🔹 1. Ungrouped Data (Raw Data)

  • Odd number of observations:
                    Median = the middle value after sorting the data.
  • Even number of observations:
                    Median = average of the two middle values.

📌 Example:

  • Data: 25, 20, 15, 35, 18
                Sorted: 15, 18, 20, 25, 35 → Median = 20
  • Data: 8, 20, 50, 25, 15, 30
            Sorted: 8, 15, 20, 25, 30, 50
            Median = 20+252=22.5

📝 Remark: For even-numbered datasets, any value between the two middle values can technically be used as the median, but by convention, we use their average.​

Median Formula (for Grouped/Continuous Frequency Data):

Median=l+(N2Ff)h\text{Median} = l + \left( \frac{\frac{N}{2} - F}{f} \right) \cdot h

Where:

SymbolMeaning
ll
Lower boundary of the median class
NN
Total frequency
FF
Cumulative frequency before the median class
ff
Frequency of the median class
hh
Width (class size) of the median class

✍️ Interpretation:

  • N2\frac{N}{2} tells you where the median lies in the cumulative frequency table.

  • Find the class where this value falls → that’s the median class.

  • Plug the values into the formula to get the median.

🧮 Example:

Find the median wage of the following distribution:
Wages (in Rs.) : 20-30 30-40 40-50 50--60 60-70
No. of labours :     3         5         20     10         5

🧮 Given:

Wages (in Rs.)Frequency (f)
20–303
30–405
40–5020
50–6010
60–705

Step 1: Find cumulative frequencies (cf)

Wages (in Rs.)Frequency (f)Cumulative Frequency (cf)
20–3033
30–4058
40–502028
50–601038
60–70543

🔍 Step 2: Identify median class

  • Total number of labourers: N=43N = 43

  • N2=432=21.5\frac{N}{2} = \frac{43}{2} = 21.5

Find the class whose cumulative frequency just exceeds 21.5 → it is 40–50, with cf = 28.

So, the median class is: 40–50


🔢 Step 3: Apply the Median formula

Median=l+(N2Ff)h\text{Median} = l + \left( \frac{\frac{N}{2} - F}{f} \right) \cdot h

Where:

  • l=40l = 40 (lower limit of median class)

  • N=43

  • F=8 (F before median class)

  • f=20 (frequency of median class)

  • h=10h = 10(class width)


Median=40+(21.5820)10=40+(13.520)10\text{Median} = 40 + \left( \frac{21.5 - 8}{20} \right) \cdot 10 = 40 + \left( \frac{13.5}{20} \right) \cdot 10
Median=40+6.75=46.75\text{Median} = 40 + 6.75 = \boxed{46.75}

✅ Final Answer:

Median wage = Rs. 46.75

Merits of Median

  1. Rigorously Defined:
    Median is clearly and unambiguously defined. It has a specific position in the dataset.

  2. Easy to Understand and Calculate:
    Especially with sorted data, the median can often be found simply by inspection.

  3. Unaffected by Extreme Values (Outliers):
    Unlike the mean, the median is not influenced by unusually high or low values.

  4. Applicable to Open-Ended Distributions:
    Median can be computed even when the distribution has open-ended intervals like "below 10" or "above 100".


Demerits of Median

  1. Not Exact for Even Number of Observations:

    For even-sized datasets, the median is estimated as the average of the two middle values, which may not reflect an actual data point.

  2. Ignores Most Data Points:

    Median only considers the middle position(s); values far from the center do not affect it. For example:

    • Median of {10, 25, 50, 60, 65} is 50.

    • Even if 10 and 25 are changed to 1 and 20 or 60 and 65 are changed to 70 and 80, the median remains 50.

  3. Not Suitable for Algebraic Treatment:

    Median does not lend itself to further statistical operations like mean does (e.g., finding combined medians is not straightforward).

  4. Affected by Sampling Fluctuations:

    Median can vary significantly between samples compared to the mean when samples are small or variable.


📘 Uses of Median

  1. For Qualitative Data:

    Useful when data is ranked but not measurable (e.g., intelligence levels, honesty ratings).

  2. In Income and Wealth Distribution:

    Commonly used to represent central tendency when dealing with wages or wealth, where data is often skewed.

Mode

Mode is the value in a dataset that occurs most frequently. It represents the most typical or common value around which other values tend to cluster.


🔍 Examples of Mode in Real Life

  1. The average height of an Indian male is 5'-6"
    → This refers to the most common height, i.e., mode.

  2. The average shoe size sold in a shop is 7

    → Shoe size 7 is sold most frequently → Mode = 7

  3. An average student spends Rs. 150 per month in a hostel
    → Rs. 150 is the most commonly occurring monthly expenditure → Mode = Rs. 150


📊 Example: Discrete Frequency Distribution

x (Value)    1    2    3    4    5    6    7    8
f (Freq.)    4    9    16    25    22    15    73
  • Here, the maximum frequency is 25, which corresponds to x = 4.

  • So, Mode = 4


⚠️ Special Cases Where Mode is Not Easily Identified

  1. Repeated Maximum Frequencies

    • If more than one value has the same highest frequency, the distribution is bimodal or multimodal.

  2. Maximum Frequency at the Beginning or End

    • If the highest frequency is in the first or last class, mode may not give a good central value.

  3. Irregular Frequency Distribution

    • If the data fluctuates significantly or has no clear peak, mode may be misleading or undefined.

📌 Mode Formula for Continuous Frequency Distribution

Mode=l+(f1f02f1f0f2)×h\text{Mode} = l + \left( \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \right) \times h

🧩 Where:

  • ll = lower boundary of the modal class

  • hh = class width (class interval size)

  • f1f_1 = frequency of the modal class

  • f0f_0 = frequency of the class before modal class

  • f2f_2 = frequency of the class after modal class


Steps to Find the Mode

  1. Identify the modal class (class with the highest frequency).

  2. Plug values into the formula:

    Mode=l+(f1f02f1f0f2)×h\text{Mode} = l + \left( \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \right) \times h

📊 Example:

Class IntervalFrequency
10 - 20                5
20 - 30                8
30 - 40                    12
40 - 50                20 ← Modal Class (highest frequency)
50 - 60                10
60 - 70                5

Here:

  • Modal class = 40–50

  • l=40

  • h=10h = 10

  • f1=20f_1 = 20

  • f0=12f_0 = 12

  • f2=10f_2 = 10


🧮 Substitute in formula:

Mode=40+(20122×201210)×10=40+(8401210)×10=40+(818)×10=40+4.44=44.44\text{Mode} = 40 + \left( \frac{20 - 12}{2 \times 20 - 12 - 10} \right) \times 10 = 40 + \left( \frac{8}{40 - 12 - 10} \right) \times 10 = 40 + \left( \frac{8}{18} \right) \times 10 = 40 + 4.44 = 44.44

🎯 Final Answer:

Mode = 44.44

📊Example:

Find the mode/or the following distribution:

Class - interval: 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

Frequency             5     8         7         12     28         20     10     10

Class IntervalFrequency (f)
0 – 10            5
10 – 20            8
20 – 30            7
30 – 40            12
40 – 50        28 ← Modal class (highest frequency)
50 – 60            20
60 – 70            10
70 – 80            10

🧩 Mode Formula (for grouped data):

Mode=l+(f1f02f1f0f2)×h\text{Mode} = l + \left( \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \right) \times h

🧮 Substitute the values:

l=40 (lower boundary of modal class)
h=10 (class width)
f1=28 (frequency of modal class)
f0=12 (frequency before modal class)
f2=20 (frequency after modal class)
Mode=40+(28122×281220)×10=40+(16561220)×10=40+(1624)×10=40+6.666...=46.67\text{Mode} = 40 + \left( \frac{28 - 12}{2 \times 28 - 12 - 20} \right) \times 10 = 40 + \left( \frac{16}{56 - 12 - 20} \right) \times 10 = 40 + \left( \frac{16}{24} \right) \times 10 = 40 + 6.666... = \boxed{46.67}

Final Answer:

            Mode=46.67

📌 Summary of Remarks on Mode:

1. Handling Irregular Distributions:

  • If:

    • The maximum frequency is repeated, or

    • It occurs at the beginning or end of the table, or

    • There are irregular fluctuations in frequencies,

    → Then the modal class is not obvious.

  • 🔍 In such cases, we use a Grouping Table and an Analysis Table to identify the modal class, then apply the mode formula:

Mode=l+(f1f02f1f0f2)×h\text{Mode} = l + \left( \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \right) \times h

2. Estimating Mode Using Mean and Median:

When the distribution is not perfectly symmetrical, but still moderately skewed, we can estimate the mode using Karl Pearson’s empirical formula:

📐 Karl Pearson's Empirical Formula:

Mode=3×Median2×Mean\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}

Or rearranged:

MeanMedian=13(MeanMode)\text{Mean} - \text{Median} = \frac{1}{3}(\text{Mean} - \text{Mode})

📎 When to Use This Empirical Formula:

  • When mode cannot be determined accurately from the frequency distribution.

  • When the data is skewed, and only mean and median are known.

  • When you want a quick estimate of the mode without constructing the full distribution.


🧠 Tip to Remember:

In symmetric distributions:

Mean=Median=Mode\text{Mean} = \text{Median} = \text{Mode}

In positively skewed distributions:

Mean>Median>Mode\text{Mean} > \text{Median} > \text{Mode}

In negatively skewed distributions:

Mode>Median>Mean\text{Mode} > \text{Median} > \text{Mean}


Merits of Mode

  1. Easy to Understand and Calculate
    – Mode is simple to grasp and can often be found just by inspection, especially in small datasets.

  2. Not Affected by Extreme Values
    – Like median, outliers do not influence the mode.

  3. Applicable with Unequal or Open-ended Class Intervals
    – Mode can still be computed even with unequal class intervals, as long as the modal class and adjacent classes are of equal width.
    Open-ended classes don’t hinder mode calculation.


Demerits of Mode

  1. Ill-Defined in Some Cases
    – It’s not always unique:

    • Some datasets have no mode,

    • Some are bimodal (two modes), or

    • Multimodal (more than two modes).

  2. Does Not Use All Data
    – Mode is determined only by the most frequent value(s), ignoring the rest of the dataset.

  3. Not Suitable for Algebraic Treatment
    – Unlike the mean, mode cannot be used in most algebraic calculations or statistical formulas.

  4. Affected More by Sampling Fluctuations
    – Its value may vary significantly across different samples from the same population.


📌 Uses of Mode

  • Ideal for Finding the Most Common Size or Type in:

    • Business forecasting

    • Manufacturing standard sizes (e.g., shoes, garments)

    • Retail analysis (e.g., most sold item or size)

  • Useful when dealing with qualitative data, such as:

    • Most common brand preference

    • Most frequent customer rating

📘 Geometric Mean (G.M.)

The geometric mean of a set of values is the nth root of the product of all values.

🔹 Definition

If you have n observations:
x1,x2,,xn,
then the geometric mean G is:

G=(x1x2xn)1/n

or in compact form:

G=(i=1nxi)1/n(2.9)


🔸 Using Logarithms (To Simplify Multiplication)

Taking logarithms of both sides:

logG=1ni=1nlogxi

Then,

G=antilog(1ni=1nlogxi)(2.90)


📊 Geometric Mean for Frequency Distribution

Given data points x1,x2,,xn with corresponding frequencies f1,f2,,fn:

G=(x1f1x2f2xnfn)1/N

where N=fi

Taking logarithms:

logG=1Ni=1nfilogxi

Then,

G=antilog(1Ni=1nfilogxi)


🔹 For Grouped (Continuous) Frequency Distributions:

  • xi is the midpoint of the i-th class interval.

  • Use the same formula as above:

    logG=1NfilogxiG=antilog(1Nfilogxi)

🟢 Key Notes:

      • Geometric mean is more accurate than arithmetic mean when dealing with rates, percentages, and ratios.

      • It is only defined for positive values.

      • Useful when values vary multiplicatively (e.g., growth rates)

Geometric Mean (G.M.)

The geometric mean of a set of values is the nth root of the product of all values.

🔹 Definition

If you have n observations:
x1,x2,,xnx_1, x_2, \ldots, x_n
then the geometric mean GG is:

G=(x1x2xn)1/nG = \left( x_1 \cdot x_2 \cdot \ldots \cdot x_n \right)^{1/n}

or in compact form:

G=(i=1nxi)1/n

🔸 Using Logarithms (To Simplify Multiplication)

Taking logarithms of both sides:

logG=1ni=1nlogxi\log G = \frac{1}{n} \sum_{i=1}^{n} \log x_i

Then,

G=antilog(1ni=1nlogxi)

📊 Geometric Mean for Frequency Distribution

Given data points x1,x2,,xnx_1, x_2, \ldots, x_n with corresponding frequencies f1,f2,,fnf_1, f_2, \ldots, f_n:

G=(x1f1x2f2xnfn)1/NG = \left( x_1^{f_1} \cdot x_2^{f_2} \cdot \ldots \cdot x_n^{f_n} \right)^{1/N}

where N=fiN = \sum f_i

Taking logarithms:

logG=1Ni=1nfilogxi

Then,

G=antilog(1Ni=1nfilogxi)

🔹 For Grouped (Continuous) Frequency Distributions:

  • xix_i is the midpoint of the ii-th class interval.

  • Use the same formula as above:

    logG=1NfilogxiG=antilog(1Nfilogxi)\log G = \frac{1}{N} \sum f_i \log x_i \Rightarrow G = \text{antilog} \left( \frac{1}{N} \sum f_i \log x_i \right)

🟢 Key Notes:

  • Geometric mean is more accurate than arithmetic mean when dealing with rates, percentages, and ratios.

  • It is only defined for positive values.

  • Useful when values vary multiplicatively (e.g., growth rates).

  • Merits of Geometric Mean

    1. Rigidly Defined

      • The geometric mean has a clear and definite formula. Unlike mode (which can be ambiguous), G.M. is always mathematically well-defined (if all values are positive).

    2. Based on All Observations

      • It takes every value in the dataset into account, giving a more accurate picture of central tendency than mode or sometimes median.

    3. Suitable for Further Mathematical Treatment

      • The geometric mean allows for algebraic manipulation. For example, for two groups with sizes n1, n2and geometric means G1, , the combined geometric mean G is:

        logG=n1logG1+n2logG2n1+n2
        • This extends naturally to more than two groups.

    4. Stable Against Sampling Fluctuations

      • Less affected by random sample variations compared to the mode.

    5. More Weight to Smaller Values

      • Unlike the arithmetic mean, the geometric mean balances the influence of high and low values by naturally dampening the effect of large outliers.


    Demerits of Geometric Mean

    1. Difficult to Understand and Calculate

      • The use of logarithms and roots makes it less intuitive for people without a mathematical background.

    2. Sensitive to Zero or Negative Values

      • If any value is zero, the geometric mean becomes zero.

      • If any value is negative, the geometric mean becomes imaginary, even if the other values are positive.


    📌 Uses of Geometric Mean

    1. Growth Rates

      • Widely used in calculating:

        • Population growth rates

        • Interest rates

        • Compounded returns

    2. Index Numbers

      • Essential in constructing index numbers like price index, consumer index, etc., where multiplicative effects are important.

📘 Harmonic Mean (H.M.)
🔹 Definition:

  • The Harmonic Mean is defined as the reciprocal of the arithmetic mean of the reciprocals of a given set of values.

It is best used when dealing with rates (like speed, efficiency, etc.).


 Formula (for individual observations):

If there are n observations:

x1,x2,x3,,xn

Then, the harmonic mean H is:

H=ni=1n1xi


🔹 Formula (for a frequency distribution):

If the data is grouped as xi with corresponding frequencies fi, for i=1,2,...,n, and N=fi, then:

H=Ni=1nfixi


📌 Example:

Let’s say we have the following frequency distribution:

Value (xᵢ)Frequency (fᵢ)
23
45
52

Total frequency, N=3+5+2=10

Now compute:

fixi=32+54+25=1.5+1.25+0.4=3.15

So, Harmonic Mean:

H=103.153.17

📘 Merits and Demerits of Harmonic Mean

Merits:

  1. Rigidly Defined:

    • Harmonic mean is a well-defined mathematical measure with no ambiguity.

  2. Based on All Observations:

    • It considers every item in the dataset, making it a representative central tendency measure.

  3. Suitable for Further Mathematical Treatment:

    • It can be algebraically manipulated and combined with other mathematical computations, like in combined harmonic mean calculations.

  4. Less Affected by Sampling Fluctuations:

    • Like geometric mean, it shows stability across different samples.

  5. Gives Greater Weight to Smaller Items:

    • This property is especially useful when smaller values are more significant, e.g., in averaging speeds, rates, or prices.


Demerits:

  1. Difficult to Understand and Compute:

    • Especially for non-mathematical users, the concept and calculation can be abstract and unintuitive.

  2. Cannot Be Used if Any Observation is Zero:

    • If even a single value is zero, the harmonic mean becomes undefined (division by zero).

  3. Not Suitable for Data with Negative or Zero Values:

    • Negative values can lead to incorrect or misleading results; zero values make the mean undefined.

  4. Limited Practical Use:

    • Applicable only in specific scenarios such as average rates, speed, or other ratio-based data.


📌 When to Use:

  • Best suited for calculating:

    • Average speed

    • Average rates

    • Cost per unit

    • Efficiency problems


🚴 Cyclist Problem – Explained

cyclist pedals from his house·to his college at a speed of 10 m.p.h. and back from the college to his  house at 15 m.p.h.Find the  average speed.

Given:

  • Speed from house to college = 10 mph

  • Speed from college to house = 15 mph

  • Distance (one way) = x miles


Solution:

Time taken:

  • Time from house to college = x10 hours

  • Time from college to house = x15hours

  • Total distance = 2xmiles

  • Total time = x10+x15=
    =25x150=x6
     (you can also use LCM directly)

So,

Average speed=Total distanceTotal time=2xx6=12 mph


In this case. the average speed is given by the Harmonic mean of 10 and 15 and not by the arithmetic mean.

📘 Weighted Harmonic Mean: Formula

When a person travels different distances S1,S2,...,SnS_1, S_2, ..., S_n at different speeds V1,V2,...,VnV_1, V_2, ..., V_n, the average speed is given by:

Average Speed=Si(SiVi)\text{Average Speed} = \frac{\sum S_i}{\sum \left(\frac{S_i}{V_i}\right)}

This is the Weighted Harmonic Mean, where:

  • SiS_i = distance covered

  • ViV_i = corresponding speed


✍️ Example :

You are planning a trip that includes travel using different modes of transport with the following details:

  • Train: 900 km at an average speed of 60 km/h

  • Boat: 3000 km at an average speed of 25 km/h

  • Plane: 400 km at an average speed of 350 km/h

  • Taxi: 15 km at an average speed of 25 km/h

Question:
What is your average speed for the entire journey?

✈️ Trip Details:

ModeDistance (km)Speed (km/h)
Train90060
Boat300025
Plane400350
Taxi1525


✅ Step-by-step Calculation:

We compute the total time taken for each segment using:

Time=DistanceSpeed\text{Time} = \frac{\text{Distance}}{\text{Speed}}

Then apply the WHM formula:

1. Time for each segment:

SegmentDistance SiS_iSpeed ViV_iTime SiVi\frac{S_i}{V_i}
Train9006015.00 hrs
Boat300025120.00 hrs
Plane4003501.14 hrs (approx)
Taxi15250.60 hrs

2. Total distance and total time:

Total Distance=900+3000+400+15=4315 km
Total Time=15+120+1.14+0.60=136.74 hrs (approx)\text

📌 Final Calculation:

Average Speed=4315/136.7431.55 km/h

Partition Values

Partition values are the values that divide a series (or dataset) into equal parts.

Types of Partition Values

  1. Quartiles

    Quartiles divide the data into four equal parts:

    • Q₁ (First Quartile): 25% of observations lie below it, and 75% lie above.

    • Q₂ (Second Quartile): It is the Median; 50% of observations lie below and 50% above.

    • Q₃ (Third Quartile): 75% of observations lie below it, and 25% lie above.

  2. Deciles

    Deciles divide the data into ten equal parts:

    • Notation: D₁, D₂, ..., D₉

    • Example: D₇ (Seventh Decile) means 70% of the observations lie below it, and 30% above.

  3. Percentiles

    Percentiles divide the data into 100 equal parts:

    • Notation: P₁, P₂, ..., P₉₉

    • Example: P₄₇ (47th Percentile) is the value below which 47% of the observations lie.


Note on Calculation

The methods used to calculate quartiles, deciles, and percentiles are similar to that used for calculating the median, whether the distribution is:

  • Discrete (list of values with frequencies), or

  • Continuous (grouped frequency distribution).

Example

Eight coins were tossed together, and the number of heads resulting from each toss was recorded. This experiment was repeated 256 times. The following frequency distribution table shows how many times each possible number of heads (from 0 to 8) occurred:

Number of Heads (x)0    1    2    3    4    5  6  7    8
Frequency (f)1    9    26    59    72    52    29       7    1

Tasks:

Calculate the following statistical measures based on the data provided:

  1. Median

  2. First Quartile (Q₁)

  3. Third Quartile (Q₃)

  4. Fourth Decile (D₄)

  5. 27th Percentile (P₂₇)

Given Data:

Number of Heads (x)0    1    2    3    4    5    6   7    8
Frequency (f)1    9    26        59    72    52    29    7    1
Cumulative Frequency (cf)1    10    36    95    167    219    248    255    256

Total number of observations (N) = 256


1. Median

  • N2=2562=128\frac{N}{2} = \frac{256}{2} = 128

  • The cumulative frequency just greater than 128 is 167, which corresponds to x = 4.

Median = 4


2. First Quartile (Q₁)

  • N4=2564=64\frac{N}{4} = \frac{256}{4} = 64

  • The cumulative frequency just greater than 64 is 95, which corresponds to x = 3.

Q₁ = 3


3. Third Quartile (Q₃)

  • 3N4=3×2564=192\frac{3N}{4} = \frac{3 \times 256}{4} = 192

  • The cumulative frequency just greater than 192 is 219, which corresponds to x = 5.

Q₃ = 5


4. 4th Decile (D₄)

  • D4=4N10=4×25610=102.4D_4 = \frac{4N}{10} = \frac{4 \times 256}{10} = 102.4

  • The cumulative frequency just greater than 102.4 is 167, which corresponds to x = 4.

D₄ = 4


5. 27th Percentile (P₂₇)

  • P27=27N100=27×256100=69.12P_{27} = \frac{27N}{100} = \frac{27 \times 256}{100} = 69.12

  • The cumulative frequency just greater than 69.12 is 95, which corresponds to x = 3.

P₂₇ = 3


Comments

Popular posts from this blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme - Dr Binu V P

Basics of Machine Learning

Types of Machine Learning Systems