Statistics

Introduction

Mean of Grouped Data

The mean (or average) of observations, is the sum of the values of all the observations divided by the total number of observations.

If x1, x2,. . ., xn are observations with respective frequencies f1, f2, . . ., fn, then this means observation x1 occurs f1 times, x2 occurs f2 times, and so on.

Now, the sum of the values of all the observations = f1x1 + f2x2 + . . . + fnxn,

and sum of the number of observations = f1 + f2 + . . . + fn.

So, the mean x of the data is given by

`x=(f_1x_1+f_2x_2+…f_nx_n)/(f_1+f_2+…f_n)`

Or, `x=(Σ_(i-1)\^n\f_1x_1)/(Σ_(i=1)\^nf_1`

Example: The marks obtained by 30 students of Class X of a certain school in a Mathematics paper, consisting of 100 marks are presented in table below. Find the mean of the marks obtained by the students.

Marks obtained xi	10	20	36	40	50	56	60	70	72	80	88	92	95
Number of students fi	1	1	3	4	3	2	4	4	1	1	2	3	1

Solution: To find the mean marks, we require the product of each xi with the corresponding frequency fi.

Marks obtained (xi)	Number of students (fi)	fixi
10	1	10
20	1	20
36	3	108
40	4	160
50	3	150
56	2	112
60	4	240
70	4	280
72	1	72
80	1	80
88	2	176
92	3	276
95	1	95
Total	Σ f_i = 30	Σ f_ix_i = 1779

`x= (Σf_i\x_i)/(Σf_i)`

`=(1779)/(30)=59.3`

Direct Method of Finding Mean:

In most of our real life situations, data is usually so large that to make a meaningful study it needs to be condensed as grouped data. So, we need to convert given ungrouped data into grouped data and devise some method to find its mean.

Let us convert the ungrouped data of Example 1 into grouped data by forming class-intervals of width, say 15. Remember that, while allocating frequencies to each class-interval, students falling in any upper class-limit would be considered in the next class, e.g., 4 students who have obtained 40 marks would be considered in the class-interval 40-55 and not in 25-40. With this convention in our mind, let us form a grouped frequency distribution table.

Class Intervals	10-25	25-40	40-55	55-70	70-85	85-100
Number of students	2	3	7	6	6	6

Now, for each class-interval, we require a point which would serve as the representative of the whole class. It is assumed that the frequency of each class-interval is centred around its mid-point. So the mid-point (or class mark) of each class can be chosen to represent the observations falling in the class.

Class Mark = (Upper class limit + Lower class limit)/2

Class Interval	Number of students (fi)	Class Mark (xi)	fixi
10-25	2	17.5	35
25-40	3	32.5	97.5
40-55	7	47.5	332.5
55-70	6	62.5	375.0
70-85	6	77.5	465.0
85-100	6	92.5	555.0
Total	Σ f_i = 30		Σ f_ix_i = 1860

`x= (Σf_i\x_i)/(Σf_i)`

`=(1860)/(30)=62`

This method of finding the mean is known as the Direct Method. Here, 59.3 is the exact mean, while 62 is an approximate mean.

Assumed Mean Method:

Sometimes when the numerical values of xi and fi are large, finding the product of xi and fi becomes tedious and time consuming. So, for such situations, let us think of a method of reducing these calculations.

Nothing can be done with the fi’s, but each xi can be changed to a smaller number to make easier calculations.

The first step is to choose one among the xi’s as the assumed mean, and denote it by ‘a’. Also, to further reduce our calculation work, we may take ‘a’ to be that xi which lies in the centre of x1, x2, . . ., xn. So, we can choose a = 47.5 or a = 62.5.

Let us choose a = 47.5.

The next step is to find the difference di between a and each of the xi’s, that is, the deviation of ‘a’ from each of the xi’s.

i.e., di = xi – a = xi – 47.5

The third step is to find the product of di with the corresponding fi, and take the sum of all the fi di’s.

Class Interval	Number of Students (fi)	Class Mark (xi)	di = xi - a	fidi
10-25	2	17.5	-30	-60
25-40	3	32.5	-15	-45
40-55	7	47.5	0	0
55-70	6	62.5	15	90
70-85	6	77.5	30	180
85-100	6	92.5	45	270
Total	Sum fi = 30			Sum fidi = 435

So, the mean of deviations:

`d=(Σf_i\d_i)/(Σf_i)=(435)/(30)=14.5`

Since d is obtained by subtracting a from xi so x can be obtained as follows:

`x = d̅+ a = 14.5 + 47.5 = 62`

Step Deviation Method of Finding Mean:

In the above method all the values of di are multiples of 15 which is equivalent to class size. So to make calculations more simple let us divide all the values of di by 15 to arrive at smaller numbers for fidi

Class Interval	Number of Students (fi)	Class Mark (xi)	di = xi - a	ui = di/h	fiui
10-25	2	17.5	-30	-2	-4
25-40	3	32.5	-15	-1	-3
40-55	7	47.5	0	0	0
55-70	6	62.5	15	1	6
70-85	6	77.5	30	2	12
85-100	6	92.5	45	3	18
Total	Sum fi = 30				Sum fiui = 29

`u=(Σf_i\u_i)/(Σf_i)=29/30`

`x=a+hu`
`=47.5+15xx29/30=62`

(Since u was arrived at after dividing d by the class size so `a+d = a+hu`)

Important to Note:

The step-deviation method will be convenient to apply if all the di’s have a common factor.
The mean obtained by all the three methods is the same.
The assumed mean method and step-deviation method are just simplified forms of the direct method.

Mode of Grouped Data

A mode is that value among the observations which occurs most often, that is, the value of the observation having the maximum frequency. It is possible that more than one value may have the same maximum frequency. In such situations, the data is said to be multimodal.

Mode `=l+((f_1-f_0)/(2f_1-f_0-f_2))xxh`

where l = lower limit of the modal class,

h = size of the class interval (assuming all class sizes to be equal),

f1 = frequency of the modal class,

f0 = frequency of the class preceding the modal class,

f2 = frequency of the class succeeding the modal class.

Let us calculate the mode from the above table:

l =40, h = 15, f1 = 7, f0 = 3, f2 = 6

Mode `=40+(7-3)/(14-3-6)xx15=52`

Median of Grouped Data

Median is a measure of central tendency which gives the value of the middle-most observation in the data. First step to find the median is to group data in ascending order.

If the middle observation is at odd place then median `=(n + 1)/(2)`

If the middle observation is at the even place then median =

Median `=(n/2+n/2+1)/(2)`

That is, it is the average of the middle and successor of middle observations.

Let us take data from the first table:

Marks obtained	Frequency	Cumulative frequency
10	1	1
20	1	2
36	3	5
40	4	9
50	3	12
56	2	14
60	4	18
70	4	22
72	1	23
80	1	24
88	2	26
92	3	29
95	1	30

Here number of Observations is 30

So, Median will be average of the 15th and the 16th observations.

15th Observation is 60

16th Observation is 70

Average = 65 Median

The mean is the most frequently used measure of central tendency because it takes into account all the observations, and lies between the extremes, i.e., the largest and the smallest observations of the entire data. It also enables us to compare two or more distributions.

However, extreme values in the data affect the mean.

In problems where individual observations are not important, and we wish to find out a ‘typical’ observation, the median is more appropriate, e.g., finding the typical productivity rate of workers, average wage in a country, etc. These are situations where extreme values may be there. So, rather than the mean, we take the median as a better measure of central tendency.

In situations which require establishing the most frequent value or most popular item, the mode is the best choice, e.g., to find the most popular T.V. programme being watched.

Remarks:

There is a empirical relationship between the three measures of central tendency:

3 Median = Mode + 2 Mean

Marks obtained	Frequency	Cumulative frequency
10	1	1
20	1	2
36	3	5
40	4	9
50	3	12
56	2	14
60	4	18
70	4	22
72	1	23
80	1	24
88	2	26
92	3	29
95	1	30

Marks obtained	Frequency	Cumulative frequency
10	1	1
20	1	2
36	3	5
40	4	9
50	3	12
56	2	14
60	4	18
70	4	22
72	1	23
80	1	24
88	2	26
92	3	29
95	1	30

Marks obtained	Frequency	Cumulative frequency
10	1	1
20	1	2
36	3	5
40	4	9
50	3	12
56	2	14
60	4	18
70	4	22
72	1	23
80	1	24
88	2	26
92	3	29
95	1	30