Class 10 Maths


Statistics

Introduction

Mean of Grouped Data

The mean (or average) of observations, is the sum of the values of all the observations divided by the total number of observations.

If x1, x2,. . ., xn are observations with respective frequencies f1, f2, . . ., fn, then this means observation x1 occurs f1 times, x2 occurs f2 times, and so on.

Now, the sum of the values of all the observations = f1x1 + f2x2 + . . . + fnxn,

and sum of the number of observations = f1 + f2 + . . . + fn.

So, the mean x of the data is given by

`x=(f_1x_1+f_2x_2+…f_nx_n)/(f_1+f_2+…f_n)`

Or, `x=(Σ_(i-1)\^n\f_1x_1)/(Σ_(i=1)\^nf_1`

Example: The marks obtained by 30 students of Class X of a certain school in a Mathematics paper, consisting of 100 marks are presented in table below. Find the mean of the marks obtained by the students.

Marks obtained xi10203640505660707280889295
Number of students fi1134324411231

Solution: To find the mean marks, we require the product of each xi with the corresponding frequency fi.

Marks obtained (xi)Number of students (fi)fixi
10110
20120
363108
404160
503150
562112
604240
704280
72172
80180
882176
923276
95195
TotalΣ fi = 30Σ fixi = 1779

`x= (Σf_i\x_i)/(Σf_i)`

`=(1779)/(30)=59.3`

Direct Method of Finding Mean:

In most of our real life situations, data is usually so large that to make a meaningful study it needs to be condensed as grouped data. So, we need to convert given ungrouped data into grouped data and devise some method to find its mean.

Let us convert the ungrouped data of Example 1 into grouped data by forming class-intervals of width, say 15. Remember that, while allocating frequencies to each class-interval, students falling in any upper class-limit would be considered in the next class, e.g., 4 students who have obtained 40 marks would be considered in the class-interval 40-55 and not in 25-40. With this convention in our mind, let us form a grouped frequency distribution table.

Class Intervals10-2525-4040-5555-7070-8585-100
Number of students237666

Now, for each class-interval, we require a point which would serve as the representative of the whole class. It is assumed that the frequency of each class-interval is centred around its mid-point. So the mid-point (or class mark) of each class can be chosen to represent the observations falling in the class.

Class Mark = (Upper class limit + Lower class limit)/2

Class IntervalNumber of students (fi)Class Mark (xi)fixi
10-25217.535
25-40332.597.5
40-55747.5332.5
55-70662.5375.0
70-85677.5465.0
85-100692.5555.0
TotalΣ fi = 30Σ fixi = 1860

`x= (Σf_i\x_i)/(Σf_i)`

`=(1860)/(30)=62`

This method of finding the mean is known as the Direct Method. Here, 59.3 is the exact mean, while 62 is an approximate mean.

Assumed Mean Method:

Sometimes when the numerical values of xi and fi are large, finding the product of xi and fi becomes tedious and time consuming. So, for such situations, let us think of a method of reducing these calculations.

Nothing can be done with the fi’s, but each xi can be changed to a smaller number to make easier calculations.

The first step is to choose one among the xi’s as the assumed mean, and denote it by ‘a’. Also, to further reduce our calculation work, we may take ‘a’ to be that xi which lies in the centre of x1, x2, . . ., xn. So, we can choose a = 47.5 or a = 62.5.

Let us choose a = 47.5.

The next step is to find the difference di between a and each of the xi’s, that is, the deviation of ‘a’ from each of the xi’s.

i.e., di = xi – a = xi – 47.5

The third step is to find the product of di with the corresponding fi, and take the sum of all the fi di’s.

Class IntervalNumber of Students (fi)Class Mark (xi)di = xi - afidi
10-25217.5-30-60
25-40332.5-15-45
40-55747.500
55-70662.51590
70-85677.530180
85-100692.545270
TotalSum fi = 30Sum fidi = 435

So, the mean of deviations:

`d=(Σf_i\d_i)/(Σf_i)=(435)/(30)=14.5`

Since d is obtained by subtracting a from xi so x can be obtained as follows:

`x = d̅+ a = 14.5 + 47.5 = 62`

Step Deviation Method of Finding Mean:

In the above method all the values of di are multiples of 15 which is equivalent to class size. So to make calculations more simple let us divide all the values of di by 15 to arrive at smaller numbers for fidi

Class IntervalNumber of Students (fi)Class Mark (xi)di = xi - aui = di/hfiui
10-25217.5-30-2-4
25-40332.5-15-1-3
40-55747.5000
55-70662.51516
70-85677.530212
85-100692.545318
TotalSum fi = 30Sum fiui = 29

`u=(Σf_i\u_i)/(Σf_i)=29/30`

`x=a+hu`
`=47.5+15xx29/30=62`

(Since u was arrived at after dividing d by the class size so `a+d = a+hu`)

Important to Note:

Mode of Grouped Data

A mode is that value among the observations which occurs most often, that is, the value of the observation having the maximum frequency. It is possible that more than one value may have the same maximum frequency. In such situations, the data is said to be multimodal.

Mode `=l+((f_1-f_0)/(2f_1-f_0-f_2))xxh`

where l = lower limit of the modal class,

h = size of the class interval (assuming all class sizes to be equal),

f1 = frequency of the modal class,

f0 = frequency of the class preceding the modal class,

f2 = frequency of the class succeeding the modal class.

Let us calculate the mode from the above table:

l =40, h = 15, f1 = 7, f0 = 3, f2 = 6

Mode `=40+(7-3)/(14-3-6)xx15=52`

Median of Grouped Data

Median is a measure of central tendency which gives the value of the middle-most observation in the data. First step to find the median is to group data in ascending order.

If the middle observation is at odd place then median `=(n + 1)/(2)`

If the middle observation is at the even place then median =

Median `=(n/2+n/2+1)/(2)`

That is, it is the average of the middle and successor of middle observations.

Let us take data from the first table:

Marks obtainedFrequencyCumulative frequency
1011
2012
3635
4049
50312
56214
60418
70422
72123
80124
88226
92329
95130

Here number of Observations is 30

So, Median will be average of the 15th and the 16th observations.

15th Observation is 60

16th Observation is 70

Average = 65 Median

The mean is the most frequently used measure of central tendency because it takes into account all the observations, and lies between the extremes, i.e., the largest and the smallest observations of the entire data. It also enables us to compare two or more distributions.

However, extreme values in the data affect the mean.

In problems where individual observations are not important, and we wish to find out a ‘typical’ observation, the median is more appropriate, e.g., finding the typical productivity rate of workers, average wage in a country, etc. These are situations where extreme values may be there. So, rather than the mean, we take the median as a better measure of central tendency.

In situations which require establishing the most frequent value or most popular item, the mode is the best choice, e.g., to find the most popular T.V. programme being watched.

Remarks:

There is a empirical relationship between the three measures of central tendency:

3 Median = Mode + 2 Mean