Download Example

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Unit I
Frequency Distribution
Statistics: The numerical facts in the preceding statements ($165,000, 79%, 25.3, 11%, $4.00
, $201,449,289, $5,000,000 and 8721) are called statistics.
The term statistics refers to numerical facts such as averages, medians, percents, and index
numbers that help us understand a variety of business and economic situations
Application in business and economics :
1)
2)
3)
4)
5)
Accounting : Public accounting firms use statistical sampling procedures when
conducting audits for their clients.
Finance : Financial analysts use a variety of statistical information to guide their
investment recommendations. In the case of stocks, the analysts review a variety of
financial data including price/earnings ratios and dividend yields
Marketing : Electronic scanners at retail checkout counters collect data for a variety of
marketing research applications
Production : Today’s emphasis on quality makes quality control an important
application of statistics in production. A variety of statistical quality control charts are
used to monitor the out put of a production process.
Economics : Economists frequently provide forecasts about the future of the economy or
some aspect of it They use a variety of statistical information in making such forecasts For
instance, in fore casting inflation rates, economists use statistical information on such
indicators as the Producer Price Index, the unemployment rate, and manufacturing capacity
utilization. Often these statistical indicators are entered into computerized forecasting
models that predict inflation rates
Data: Data are the facts and figures collected, analyzed, and summarized for presentation and
interpretation. All the data collected in a particular study are referred to as the data set for the
study.
Element: Elements are the entities on which data are collected.
Variables: A variable is a characteristic of interest for the elements.
Observations: Measurements collected on each variable for every element in a study provide the
data. The set of measurements obtained for a particular element is called an observation.
Scales of Measurement:
1. Nominal scale : When the data for a variable consist of labels or names used to identify an
attribute of the element, the scale of measurement is considered a nominal scale
2. Ordinal scale : The scale of measurement for a variable is called an ordinal scale if the
data exhibit the properties of nominal data and the order or rank of the data is meaningful
3. Interval scale: The scale of measurement for a variable is an interval scale if the data
have all the properties of ordinal data and the interval between values is expressed in
terms of a fixed unit of measure. Interval data are always numeric
4. Ratio scale: The scale of measurement for a variable is a ratio scale if the data have all
the properties of interval data and the ratio of two values is meaningful. Variables such as
distance, height, weight, and time use the ratio scale of measurement.
Quantitative data: Numeric values that indicate how much or how many of something data are
obtained using either the interval or ratio scale of measurement
Quantitative variable: A variable with quantitative data
Descriptive statistics: Tabular, graphical, and numerical summaries of data.
Frequency: is the number of occurrences of a repeating event per unit time. It is also referred to
as temporal frequency.
Relative Frequency :
Relative Frequency =
Frequency of the class
Total Frequency
Percent Frequency: Percent Frequency of a class = Relative frequency * 100.
Cumulative Frequency: Cumulative frequency is nothing but the running total of frequencies.
OR
It is defined as the sum of all previous frequencies up to the current point.
Frequency Density: It is defined as the number of observations per unit of its width. Frequency
density gives the rate of concentration of observation in a class.
Frequency Density =
Frequency of the class
Width of the class
Tabular Representation of Data :
,
1. Bar Charts : A bar chart is a graphical device for depicting categorical data
summarized in a frequency ,relative frequency, or percent frequency distribution. On one
axis of the graph (usually the horizontal axis), we specify the labels that are used for the
classes (categories).Afrequency relative frequency, or percent frequency scale can be
used for the other axis of the chart.
Eg
2. Pie Chart : The pie chart provides another graphical device for presenting relative
frequency and percent frequency distributions for categorical data. To construct a pie
chart, we first draw circle to represent all the data. Then we use the relative frequencies to
subdivide the circle into sectors, or parts, that correspond to the relative frequency for
each class. Use Last example
Summarizing Quantitative Data
Width of Class interval: The approximate size of a class interval can be decide by the use of the
following formula
Class Interval =
Largest observation−Smallest observation
Number of class intervals
Class Limits: Class limits are the smallest and largest observations (data, events etc) in each
class. Therefore, each class has two limits: a lower and upper.
Class limit for various class intervals can be done in two ways:
1) Exclusive method( class ) : in this method the upper limit is taken to be equal to the
lower limit
2) Inclusive method (class): here all observations with the magnitude greater than or equal
to the lower limit and less than or equal to the upper limit of a class are included in it.
Class Mark or Mid Value of a class or central value: The average of the values of the class
limits for a given class. A class mark is also called a Midvale or central value.
1. In exclusive type of class intervals the mid value of a class is defined as the arithmetic
mean of its lower and upper limit.
2. In case of inclusive class intervals, there is a gap between the upper limit and lower
limit of class which is eliminated by the class boundaries.
Class Boundaries: Class Boundaries are the midpoints between the upper class limit of a class
and the lower class limit of the next class in the sequence. Therefore, each class has an upper and
lower class boundary
Dot Plot : One of the simplest graphical summaries of data is a dot plot. Ahorizontal axis shows
the range for the data. Each data value is represented by a dot placed above the axis
Example :
:
Histogram : A common graphical presentation of quantitative data is a histogram. This
graphical summary can be prepared for data previously summarized in either a frequency, relative
frequency, or percent frequency distribution. A histogram is constructed by placing the variable of
interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the
vertical axis.
Example : last example data used…
Ogive A graph of a cumulative distribution, called an ogive, shows data values on the horizontal axis and either
the cumulative frequencies, the cumulative relative frequencies, or the cumulative percent frequencies on the
vertical axis
Example :
:
Solution :
:
Example:
Example : Calculate cumulative frequency, relative frequency, percent frequency of the
following data :
Soft Drink
Coke
Diet coke
Pepsi
Thums up
Sprite
Frequency
19
8
5
13
5
Solution :
Softt Drink
Frequency
Coke
Diet coke
Pepsi
Thums up
Sprite
Total
19
8
5
13
5
50
Cumulative
frequency
19
27
32
45
50
Relative
frequency
0.38
0.16
0.10
0.26
0.10
1.00
Percent
frequency
38
16
10
26
10
100
The Stem-and-Leaf Display
The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw graphs that can be used to
summarize data quickly.One technique—referred to as a stem-and-leaf display
Example :
Solution : To develop a stem-and-leaf display, we first arrange the leading digits of each data value to the left of a
vertical line. To the right of the vertical line, we record the last digit for each data value. Based on the top row of
data in Table 2.8 (112, 72, 69, 97, and 107), the first five entries in constructing a stem-and-leaf display would be as
follows:
Crosstabulations
A crosstabulation is a tabular summary of data for two variables
Scatter Diagrams : The scatter diagram is one of the tools of quality. A scatter
diagram is a graphical technique used to analyze the relationship between two variables. It
shows whether or not there is correlation between two variables. Correlation refers to the
measure of the relationship between two sets of numbers or variables
Different types of scatter diagram
Measure of Central Tendency
Average: “Average is a value which is typical or representative of a set of data”
Murray R. Spiegal
Various Meaures of average:
A) Mathematical Averages
1) Arithmetic mean
2) Geometric mean
3) Harmonic mean
4) Quadratic mean
B) Positional Averages
1) Median
2) Mode
Arithmetic mean : It is defined as the sum of observations divided by the number of
observations. It can be computed in two ways
1) Simple arithmetic mean
2) Weighted arithmetic mean
Simple Arithmetic Mean:
a) When individual observations are given :
𝑋𝑖
1) Direct Method : 𝑋̅ = ∑
𝑛
𝑑𝑖
2) Shortcut Method : 𝑋̅ = A+ ∑ 𝑛
, where 𝑑 = 𝑋 − 𝐴
𝑑𝑖
3) Step deviation method : 𝑋̅ = A+ ∑ 𝑛 ∗ 𝑖
, where 𝑑 =
𝑋−𝐴
𝑖
Example: The following figures related monthly output of cloth of a factory in a given year.
Month: Jan Feb
Mar Apr May Jun
July Aug Sep
Oct
Nov Dec
Output:80
88
92
84
96
92
96
100
92
94
98
86
Calculate the average (mean, arithmetic mean) monthly
Solution: Direct Method
𝑋̅ =
88+92+84+96+92+96+100+92+94+98+86
12
= 91.5 meters
Short cut method:
Where A= assumed mean, subtract A from every observation.
Take A =90
Months
Jan
Feb
Mar
Apr
May
Jun
July
Aug
Output
80
88
92
84
96
92
96
100
di = X- 90
-10
-2
2
-6
6
2
6
10
Sep
Oct
Nov
Dec
92
94
98
86
2
4
8
-4
∑ 𝑑𝑖 = 18
18
𝑋̅ = 90+ 12 = 91.5 𝑚𝑡𝑟𝑠
b) When data are in the form of ungrouped frequency distribution
1) Direct Method : 𝑋̅ = ∑
𝑓𝑖𝑋𝑖
𝑁
2) Shortcut method : 𝑋̅ =A+ ∑
𝑓𝑖𝑑𝑖
𝑁
𝑓𝑖𝑑𝑖
3) Step deviation method : 𝑋̅ = A+ ∑ 𝑛 ∗ 𝑖
, where 𝑑 =
𝑋−𝐴
𝑖
Example: The following is the frequency distribution of age of 670 students of a school.
Compute the arithmetic mean of the data.
X
:
5
6
7
8
9
10
11
12
13
14
F
:
25
45
90
165
112
96
81
26
18
12
Solution: Direct Method
X
5
6
7
8
9
10
11
12
13
14
f
25
45
90
165
112
96
81
26
18
12
fx
125
270
630
1320
1008
960
891
312
234
168
∑ 𝑓𝑥 = 5918
Type equation here.
𝑋̅ = ∑
𝑓𝑖𝑋𝑖
𝑁
= 𝑋̅ =
5918
670
= 8.83
Shortcut method: d = X –A ,
X
f
5
25
6
45
Here A = 8
d = X-A
-3
-2
fd
-75
-90
7
8
9
10
11
12
13
14
90
165
112
96
81
26
18
12
-1
0
1
2
3
4
5
6
∑ 𝑓 = 670
𝑋̅ = 𝐴 + ∑
𝑓𝑑
𝑁
= 8+
558
670
-90
0
112
192
243
104
90
72
∑ 𝑓𝑑 = 558
= 8 + 0.83 = 8.83 years
c) When data are in the form of a grouped frequency distribution or continuous series
in Exclusive class
Question: calculate Arithmetic mean of the following distribution
Class intervals :0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency
: 3
8
12
15
18
16
11
5
Solution: here A = 35
Class intervals
Mid Value X
0-10
5
10-20
15
20-30
25
30-40
35
40-50
45
50-60
55
60-70
65
70-80
75
Frequency
3
8
12
15
18
16
11
5
d = X -35
-30
-20
-10
0
10
20
30
40
∑ 𝑓 = 88
𝑓𝑑
𝑋̅ = 𝐴 + ∑ 𝑁 = 35 +
660
88
fd
-90
-160
-120
0
180
320
330
200
∑ 𝑓𝑑 = 660
= 42.5
d) When data are in the form of a grouped frequency distribution or continuous series
in Inclusive class
Example:
Class intervals :
Frequency
:
240-269 270-299 300-329 330-359 360-389 390-419 420-449
7
19
27
15
12
12
8
Solution: here A = 344.5
Class intervals
Mid Value X
240-269
254.5
270-299
284.5
Frequency
7
19
d = X -344.5
-90
-90
fd
-630
-1140
300-329
330-359
360-389
390-419
420-449
314.5
344.5
374.5
404.5
434.5
27
15
12
12
8
-30
0
30
60
90
-810
0
360
720
720
∑ 𝑓 = 100
𝑓𝑑
i.e. 𝑋̅ = 𝐴 + ∑ 𝑁 = 344.5 +
−780
100
∑ 𝑓𝑑 = −780
= 336.7
E) Step Deviation Method :
𝑓𝑑
𝑋̅ = 𝐴 + ∑ 𝑁 ∗ 𝑖
Here i = class interval, d =
Example: class interval:
Frequency
𝑋−𝐴
𝑖
0-5
5-10
20
7
10-15 15-20 20-25 25-30
2
9
10
5
Solution: Take A =12.5
Class interval
0-5
5-10
10-15
15-20
20-25
25-30
F
20
7
2
9
10
5
Mid value ( X)
2.5
7.5
12.5
17.5
22.5
27.5
∑ 𝑓 = 53
𝑓𝑑
−3
𝑋̅ = 𝐴 + ∑ 𝑁 ∗ 𝑖 = 12.5 + 53 ∗ 5 = Rs.12.22
d= (X-A)/i
-2
-1
0
1
2
3
fd
-40
-7
0
9
20
15
∑ 𝑓 = −3
MEDIAN: It is the another measure of central location for a variable. The median is the value in
the middle when the data are arranged in ascending order.
(A) When individual observation are given
a) For odd number of observations (terms) the median is middle value i.e.
b) For an even number of observations i.e.
1
2
𝑁
𝑁
2
2
𝑁+1
2
term
{ 𝑡𝑒𝑟𝑚 + ( + 1) 𝑡𝑒𝑟𝑚 }
Example: Find the median of the following observation
20,15,25,28,18,16,30
Solution: Observations arranged in ascending order
We get 15,16,18,20,25,28,30
Since number of terms is 7 i.e. odd, the median is the size of (
7+1
2
)𝑡ℎ, 𝑖. 𝑒. 4𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
Hence median denoted by Md = 20
Example : find the median of the following
245,230,265,236,220,250
Solution : Arrange the observation in ascending order
220,230,236,245,250,265
Since number of terms is 6 i.e. even, the median is the size
ie..
1
𝑁
𝑁
2
2
2
= { 𝑡𝑒𝑟𝑚 + ( + 1) 𝑡𝑒𝑟𝑚 }
=
=
1
2
1
2
1
6
6
2
2
{ 𝑡𝑒𝑟𝑚 + ( + 1) 𝑡𝑒𝑟𝑚 }
{ 3 𝑟𝑑 𝑡𝑒𝑟𝑚 + 4𝑡ℎ 𝑡𝑒𝑟𝑚 }
= 2 (236 + 245) = 240.5
(B) When ungrouped frequency is given : in this case calculate cumulative frequency
Example: Locate median of the following frequency distribution
Variable (X) :
10
11
12
13
14
15
16
Frequency (f) :
8
15
25
20
12
10
5
Solution :
X
10
11
12
13
14
15
16
F
8
15
25
20
12
10
5
c.f
8
23
48
68
80
90
95
Median is (95+1)/2 = 48
i.e. median is 12
C) Median in Continuous Series
𝑁
2
( )−𝑐.𝑓
a. Md = L +
𝑓
∗𝑖
Example : The following table shows the distribution of marks by 500 students in an
examination. Obtain median from the following data
Marks
F
0-9
30
10-19
40
20-29
50
30-39
48
40-49
24
50-59
162
60-69
132
Solution :
Class intervals
0-9
10-19
20-29
30-39
40-49
50-59
Class biundaries
0.5-9.5
9.5-19.5
19.5-29.5
29.5-39.5
39.8-49.5
49.5-59.5
F
30
40
50
48
24
162
c.f
30
70
120
168
192
354
70-79
14
60-69
70-79
59.5-69.5
69.5-79.5
132
14
186
500
Since N/2 = 250, the median class is 49.5-59.5 and therefore L=49.5, i=10,f=162 c.f =192
Substitute the values in formula
𝑁
2
( )−𝑐.𝑓
Md = L +
𝑓
Md = 49.5 +
∗𝑖
250−192
162
∗ 10 = 53.08
Example :the weekly wages of 1000 workers of a factory are shown in the
following table. Calculate median
Weekly
wages( less
than)
No. of
workers
425
475
525
575
625
675
725
775
825
875
2
10
43
423
293
506
719
864
955
1000
Ans = 673.59
Mode: The "mode" is the value that occurs most frequently or which has the greatest frequency. If no
number is repeated, then there is no mode for the list.
Question : Calculate mode from the following data obtained by the marks of the students
Roll no
Marks
1
20
2
30
3
31
4
32
5
25
6
25
7
30
8
21
9
30
10
32
Solution :
Number of item it
occurs
20
1
25
2
30
3
31
2
32
2
total
10
Since the item 30 occurs maximum number of times i.e. 3, hence the mode (or modal) marks are
30
Size of item
Mode in continuous series := L +
𝑓1−𝑓0
2𝑓1−𝑓—𝑓2
∗𝑖
Where L= Lower limit of the class
f1= frequency of the class (or modal class)
f0= frequency of the previous class(Pre-modal class)
f1= frequency of the next class(post-modal class)
Question : the frequency distribution of marks obtained by 60 students of a class in a college
given below
Marks
30-34
35-39
Frequency 3
5
Find mode of the distribution.
40-44
12
45-49
18
50-54
14
55-59
6
60-64
2
49.5-54.5
14
54.5-59.5
6
59.5-64.5
2
Solution : convert the inclusive class into exclusive class
Marks
29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Frequeny 3
5
12
18
Highest value is 18 i.e the modal class is 44.5-49.5
Here L = 49.5 , f1 = 18 , f0 = 12 , f2 = 14, i= 5 … values substitute in th formula
:= L +
𝑓1−𝑓0
2𝑓1−𝑓—𝑓2
18−20
:= 49.5 +
∗𝑖
2∗18−12−14
∗5
Mode = 47.5 marks
Quaritle : Divides the distribution into 4 equal parts are known as quartile.
Q1 : is known as first quartile
Q2 : it is known as second quartile or it is known as middle quartile or its called median.
Q3. It is known as upper quartile
Q1 < Q2 < Q3
Computation of quartiles
2. In case of individual and discrete series ( after arranging the size of items in asecending
or descending order)
a. Q1 = Size of
b. Q2 = size of
c. Q3 = Size of
𝑁+1
th item
4
2(𝑁+1)
4
3(𝑁+1)
4
th item
th item
3. In case of continuous series ( i.e frequencies class with interval )
𝑁
4
( )−𝑐.𝑓
a. Q1 = L +
𝑓
(
b. Q2 = L +
∗𝑖
2𝑁
)−𝑐.𝑓
4
𝑓
(
c. Q3 = L +
Where L
c.f,
f
i
∗𝑖
3𝑁
)−𝑐.𝑓
4
𝑓
∗𝑖
= lower limit of the class
= cumulative frequency of the previous class
= frequency of the class
= class interval
Coefficient of Quartile deviation =
𝑄3−𝑄1
𝑄3+𝑄1
Range: It is the difference between the largest and smallest value. Usually it is
denotes by R
a) For individual or discrete series = L-S ,
b) For continuous series = UL- LS
,
where L= Largest value, S= smallest value
where UL- = Upper limit of largest value of
the class , LS = Lower limit of smallest
value of the class
c) Coefficient of range individual or discrete series = =
d) Coefficient of range continuous series = =
𝐿−𝑆
𝐿+𝑆
UL− LS
UL+ LS
e) Interquartile range = Q3-Q1
Note : Frequency are not considere at all for computing for range and coefficient of range.
Percentile: The value of a variate which divides a given series or distribution into 100 equal
parts. It is denotes by P
a) In case of individual series or discrete series after arrangement
Pj = Size of
𝑗(𝑁+1)
100
th item
where j = 1 to 99
b) In case of continuous series
Pj = L +
(
𝑗𝑁
)−𝑐.𝑓
100
𝑓
∗𝑖
Example : calculate Q1,Q3,P70,P10,P90 & interquartile range
Marks
No. of
student
0-10
10
10-20
20
20-30
30
30-40
50
Solution :
Marks
0-10
10-20
20-30
30-40
40-50
50-60
No. of student
10
20
30
50
40
30
c.f
10
30
60
110
150
180
calculation of Q1
N/4 th item= 180/4 = 45 th item
45 th item lies = 20 – 30
L = lower limit of the class = 20, c.f = 30, f = 30, i = 10
40-50
40
50-60
30
𝑁
4
( )−𝑐.𝑓
Q1 = L +
𝑓
∗ 𝑖 = 20 +
45−30
30
∗ 10 = 25
Calculation of Q3
(
Q3 = L +
3𝑁
)−𝑐.𝑓
4
𝑓
∗𝑖
Here 3N/4 = 3*180 / 4 = 135,
135 item lies in 40-50 class
L=40 , c.f =110 , f =40 , i=10
135−110
Q3 = 40 +
40
∗ 10
= 46.25
Calculation of P70
P70 = L +
(
70𝑁
)−𝑐.𝑓
100
𝑓
∗𝑖
Here 70N/100 = 70*180/100 => 126 item
126 item lies in class 40-50 then
L = 40 , c.f = 110 , f = 40 , i= 10
Substitute values in formula
P70 = 40 +
126−110
40
∗ 10 = 44
Calculation of P10
P10 = L +
(
10𝑁
)−𝑐.𝑓
100
𝑓
∗𝑖
Here 10N/100 = 10*180/100 => 18 item
18 item lies in class 10-20 then
L = 10 , c.f = 10 , f = 20 , i= 10
Substitute values in formula
P70 = 40 +
18−10
20
∗ 10 = 14
Calculation of P90
P90 = L +
(
90𝑁
)−𝑐.𝑓
100
𝑓
∗𝑖
Here 90N/100 = 90*180/100 => 162 item
162 item lies in class 50-60 then
L = 50 , c.f = 150 , f = 300 , i= 10
Substitute values in formula
P90 = 50 +
162−150
30
∗ 10 = 54
Interquartile range = Q3- Q1 = 46.25 – 25 = 21.25
Standard Deviation: The squares of the deviations from arithmetic mean are taken and the
positive square root of the arithmetic mean of sum of squares of these deviations is taken as a
measure of dispersion. This measure of dispersion is known as standard deviation or root mean
square deviation.It is denoted by greek letter small sigma.
Square of standard deviation is known as variance.
OR
Where ∑f = N
Coefficient of Variation :(
𝑆.𝐷
𝑀𝑒𝑎𝑛
∗ 100 )%
Note : The coefficient of variation is useful when we wish to compare the variability of two
data sets relative to the general level of values (and thus relative to the mean) in each set.
Question : Find the standard deviation , variance , coefficient of variation from the following data
Age under
10
20
30
40
50
60
70
80
No. of persons dying
15
30
53
75
100
110
115
125
Solution :
Age
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
Mid value
5
15
25
35
45
55
65
75
f
15
15
23
22
25
10
5
10
∑f= 125
D = X-A
-30
-20
-10
0
10
20
30
40
fd
-450
-300
-230
0
250
200
150
400
∑fd = 20
fd2
13500
6000
2300
0
2500
4000
4500
16000
∑fd = 48800
fx
75
225
575
770
1125
550
325
750
∑fx = 4395
Values substitute in the formula
44800
√
125
20
− ( 125 ) 2
=
√358.4 − 0.000256=18.931
Variance = square of S.D. = (18.931)2 = 358.382 ( Approx)
Mean = ∑ 𝑓𝑖𝑋𝑖
=
𝑁
4395
125
= 35.16
Coefficient of variation =
𝑆.𝐷
𝑀𝑒𝑎𝑛
18.931
∗ 100 = 35.16 ∗ 100 = 53.84 % (Approx)
Example: find the standard deviation of the following distribution
Age
20-25 25-30 30-35 35-40 40-45 45-50
No. of persons 170
110
80
45
40
35
Take assumed average = 32.5
Ans. 7.936 (Approx)
Weighted Mean : The weighted mean or weighted average is an arithmetic mean in which
each value is weighted according to its importance in the overall group.The formulas for the
population and sample weighted means are identical
𝑖𝑛 𝑐𝑎𝑠𝑒 𝑜𝑓 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑠𝑒𝑟𝑖𝑒𝑠
∑ 𝑊𝑋
𝑋̅ = ∑𝑊
In case of frequency distribution 𝑋̅ =
∑ 𝑊(𝑓𝑋)
∑𝑊
Example : comment on the performance of the students of the three colleges given below using simple
and weighted averages.
College
course
Pass %
M.A
M.Com
B.A
B.Com
B.Sc
M.Sc
71
83
73
74
65
66
‘A’ No. of
students in
100
3
4
5
2
3
3
Pass %
‘B’ No. of
students in
100
2
3
6
7
3
7
82
76
73
76
65
60
Pass %
‘C’ No. of
students in
100
2
3.5
4.5
2
7
2
81
76
74
58
70
73
Solution :
Colleg
e
course
Pass %
X
M.A
M.Co
m
B.A
B.Com
B.Sc
M.Sc
71
83
73
74
65
66
∑X=43
2
‘A’ No.
of
student
s in
100(W)
3
4
WX
Pass
%(X)
213
332
82
76
5
2
3
3
∑W=20
365
148
195
198
∑WX=145
1
73
76
65
60
∑X=43
2
‘B’ No.
of
student
s in
100(W)
2
3
WX
Pass
%(X)
164
228
81
76
6
7
3
7
∑W=28
438
532
195
420
∑WX=797
7
74
58
70
73
∑X=43
2
‘C’ No.
of
student
s in
100(W)
2
3.5
WX
4.5
2
7
2
∑W=21
333
116
490
146
∑WX=151
3
162
266
Simple and Weighted Arithmetic mean
College ‘A’ = 𝑋̅ =
∑𝑋
=
432
6
=72
College ‘A’ = 𝑋̅ =
∑ 𝑊𝑋
College ‘B’ = 𝑋̅ =
∑𝑋
=
432
6
=72
College ‘B’ = 𝑋̅ =
∑ 𝑊𝑋
∑𝑋
College ‘C’ = 𝑋̅ = 𝑁 =
432
6
=72
𝑁
𝑁
Measure of Association Between two Variables:
Covariance :
∑(𝑋− 𝑋̅) ( 𝑌−𝑌̅ )
𝑁
(𝑋 − 𝑋̅) = 𝑖𝑡𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑋
=
1451
20
=72.55
=
1977
28
=70.61
∑ 𝑊𝑋
College ‘C’ = 𝑋̅ = 𝑊 =
1513
21
=72.05
𝑊`
𝑊
(Y− 𝑌̅) = 𝑖𝑡𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑌
Correlation : Correlation measures the degree of linear association between two variables, say X and Y
𝑐𝑜𝑣(𝑥,𝑦)
𝜎𝑥∗𝜎𝑦
Karl Pearson’s Coefficient of Correlation : rxy =
Example : calculate the Karl pearsons coefficient of correlation from the following data
Height
of father
66
68
69
72
65
59
62
67
61
71
Height
65
of sons
Solution :
64
67
69
64
60
59
68
60
64
Height of
father
66
68
69
72
65
59
62
67
61
71
∑X=660
Height of son
(𝑋 − 𝑋̅)
(Y− 𝑌̅)
(𝑋 − 𝑋̅)2
(Y− 𝑌̅)2
65
64
67
69
64
60
59
68
60
64
∑Y=640
0
2
3
6
-1
-7
-4
1
-5
5
1
0
3
5
0
-4
-5
4
-4
0
0
4
9
36
1
49
16
1
25
25
∑(𝑋 − 𝑋̅)2
=166
1
0
9
25
0
16
25
16
16
0
∑(Y− 𝑌̅)2 =108
𝑋̅ = 660/10=66,
𝑌̅ = 640/10 = 64
Values substitute in formula =
Calculate cov(x,y) =
𝑐𝑜𝑣(𝑥,𝑦)
𝜎𝑥∗𝜎𝑦
∑(𝑋− 𝑋̅) ( 𝑌−𝑌̅ )
𝑁
=
𝟏𝟏𝟏
𝟏𝟎
= 11.1
𝟏𝟔𝟔
Standard devation of x , σx=√ 𝟏𝟎 = 𝟒. 𝟎𝟕
𝟏𝟎𝟖
Standard devation of y , σy=√ 𝟏𝟎 = 𝟑. 𝟐𝟖
11.1
Karl pearson’s coefficient of correlation = 4.07∗3.28= 0.83
(𝑋 − 𝑋̅)
(Y− 𝑌̅)
0
0
9
30
0
28
20
4
20
0
∑ 𝑋 − 𝑋̅)
(Y− 𝑌̅) = 111
Question : calculate the coefficient of correlation between age group and rate of mortality from the
following data
Age group
Rate of mortality
0-20
350
Hint ; find mid value of the class
20-40
280
40-60
540
60-80
760
Ans : 0.95(Approx)
80-100
900
Related documents