Download Lakireddy Bali Reddy College of Engineering, Mylavaram

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Lakireddy Bali Reddy College of Engineering, Mylavaram
(Autonomous)
Master of Computer Applications (I-Semester)
MC105- Probability and Statistical Applications
Lecture : 4 Periods/Week
Internal Marks: 40
External Marks: 60
Credits: 4
External Examination: 3 Hrs.
Faculty Name: N V Nagendram
UNIT – I
Probability Theory: Sample spaces Events & Probability; Discrete Probability; Union,
intersection and compliments of Events; Conditional Probability; Baye’s Theorem .
UNIT – II
Random Variables and Distribution; Random variables Discrete Probability Distributions,
continuous probability distribution, Mathematical Expectation or Expectation Binomial,
Poisson, Normal, Sampling distribution; Populations and samples, sums and differences.
Central limit Elements. Theorem and related applications.
UNIT – III
Estimation – Point estimation, interval estimation, Bayesian estimation, Text of hypothesis,
one tail, two tail test, test of Hypothesis concerning means. Test of Hypothesis concerning
proportions, F-test, goodness of fit.
UNIT – IV
Linear correlation coefficient Linear regression; Non-linear regression least square fit;
Polynomial and curve fittings.
UNIT – V
Queing theory – Markov Chains – Introduction to Queing systems- Elements of a Queuing
model – Exponential distribution – Pure birth and death models. Generalized Poisson
Queuing model – specialized Poisson Queues.
________________________________________________________________________
Text Book: Probability and Statistics By T K V Iyengar S chand, 3rd Edition, 2011.
References:
1. Higher engg. Mathematics by B V Ramana, 2009 Edition.
2. Fundamentals of Mathematical Statistics by S C Gupta & V K Kapoor Sultan
Chand & Sons, New Delhi 2009.
3. Probability & Statistics by Schaum outline series, Lipschutz Seymour,TMH,New Delhi
3rd Edition 2009.
4. Probability & Statistics by Miller and freaud, Prentice Hall India, Delhi 7th Edition 2009.
Planned Topics
UNIT – IV
1.
Introduction
2.
Definition
3.
Types (four) of correlation
4.
Methods of studying Correlation
5.
Scatter diagram or Scatter-gram
6.
Advantages of Scatter-gram
7.
Simple graphs
8.
Coefficient of Correlation
9.
KARL Pearson’s coefficient of Correlation
10.
Properties(I,II and III) of correlation coefficient
11.
When deviations are taken from an assumed mean
12.
Correlation of grouped bi-variate data
13.
Rank Correlation Coefficient
14.
Properties of Rank Correlation Coefficient
15.
Equal or repeated ranks
Laki Reddy Bali Reddy College of Engineering, Mylavaram
Master of Computer Applications (I-Semester) MC105- Probability and Statistical
Applications
Chapter 4 correlation - regression theory
LBRCE
lecture 1
Correlation Vs Regression
04-jan-2014
By N V Nagendram
--------------------------------------------------------------------------------------------------------------Introduction:
We are used to study the characteristics of only one variable like marks, weights,
heights, prices, sales etc., This type of analysis is called “Univariate Analysis” .
Definition: If there exists some relationship between two variables then the statistical
analysis of such data is called “Bivariate Analysis”.
Here we are interested to find any relation between two variables under study.
Example: we can easily conclude that there is some relation between the price and the sale,
price and the production of a commodity.
Correlation refers to the relationship of two or more variables. We know that there
exists relationship between the heights of a father and a son, wage and price index. The study
of relation is called “Correlation”.
Definition: Correlation is a statistical analysis which measures and analyses the degree or
extent to which two variables fluctuate with reference to each other.
The correlation expresses the relationship or independence of two set of variables
upon each other. One variable may be called the subject (independent) and the other relative
(dependent).
Types of Correlation:
Correlation is classified into many types.
1. Positive and negative 2. Simple and multiple
3. Partial and total
4. Linear and non-linear
1. Positive and Negative correlation:
Definition: If two variables tend to move together in the same direction that is an
increase in the value of one variable is accompained by an increase in the value of the
other variable or a decrease in the value of one variable is accompained by an
decrease in the value of the other variable then the correlation is called “positive or
direct correlation”.
Example: height and weight, rainfall and yield of crops, price and supply.
Definition: If two variables, tend to move together in the same directions that is an
increase or decrease in the values of one variable is accompained by a decrease or
increase in the value of the other variable, then the correlation is called “negative or
inverse correlation”.
2. Simple and multiple correlation:
Definition: About the study of only two variables, the relationship is described as
simple correlation.
Example: Quantity of money and price level, demand and price.
Definition: About the study of more than two variables simultaneously, the
relationship is described as multiple correlation.
Example: The relationship of price, demand and supply of a commodity.
3. Partial and total correlation:
Definition: The study of two variables excluding some other variables is called
“partial correlation”.
Example: Price and demand, eliminating the supply side.
Note: In total correlation, all the facts are taken into account.
4. Linear and non-linear correlation:
Definition: If the ratio of change between two variables is uniform, then there will be
linear correlation between them.
A
2
7
12
17
B
3
9
15
21
The ratio of change between the variables is the same. If we plot these on the graph,
we get a straight line.
Definition: If a curvilinear or non-linear correlation, the amount of change in one
variable does not bear a constant ratio of the amount of change in the other variables.
The graph of non-linear or curvilinear relationship will be a curve.
Methods of studying Correlation:
There are two different methods for finding out the relationship between variables. They are
1) Graphics methods are
(i) Scatter diagram or Scatter-gram
(ii) Simple graph
2) Mathematical methods are
(i) Karl ‘Pearson’s coefficient of correlation
(ii) Spearman’s rank coefficient correlation
(iii) coefficient of concurrent deviation
(iv) Method of least squares
Scatter diagram / Scattergram:
The scatter-gram is a chart obtained by ploting two variables to find out whether there is any
relationship between them. Here X variables are ploted on the horizontal axis and Y variables
are ploted on the vertical axis.Thus we can know the scatter or concentration of various
points as shown below:
r = +1
////
////
////
////
////
////
High degree of positive correlation
r = -1
\\\\
\\\\
\\\\
\\\\
\\\\
\\\\
\\\\
High degree of negative correlation
No correlation
Lakireddy Bali Reddy College of Engineering, Mylavaram
Chapter 4 correlation - regression theory
LBRCE
Lecture 2
Correlation
07-JAN-2014
By N V Nagendram
--------------------------------------------------------------------------------------------------------------Advantages of Scatter Diagram / Scattergram:
1. Scatter diagram is a simple, attractive method to find out the nature of correlation.
2. It is easy to understand.
3. A rough idea is got at a glance whether it is positive or negative correlation.
Simple graph:
the values of the two variables are plotted on a graph paper. We get two curves, one
for X variables and another for Y variables. These two curves reveal the direction and
closeness of the two variables and also reveal whether are not the variables are related.
Definition: If both the curves move in the same direction, i.e., parallel to each other, either
upward or downward, correlation is said to positive correlation.
Definition: If both the curves move in the opposite directions, i.e., opposite to each other,
either upward or downward, correlation is said to negative correlation.
Uses: this method is used in the case of time series.
Note: this method does not reveal the extent to which the variables are related.
Coefficient correlation:
Correlation is statistical technique used for analysing the behaviour of two or more variables.
Its analysis deals with the association, between two or more variables. Statistical measures of
correlation relates to co-variation between series but not of function or casual relationship.
Karl Pearson’s coefficient of correlation:
Karl Pearson (1867 – 1936) a british biometrician and statistician suggested a
mathematical method for measuring the magnitude of linear relationship between two
variables. This is known as Pearsonian coefficient of correlation. It is denoted by “r”. This
method is most widely used. It is also called Product – Moment correlation coefficient.
There are several formulae to calculate “r” as below:
1. r =
cov ariance of xy
x X y
2. r =

xy
N x y
3. r =
 XY
 X Y
2
X = (x - X ), Y = (y - Y ) where X , Y are means of this series x and y.
 x is standard deviation of series x.
 y is standard deviation of series y.
2
Properties of correlation coefficient:
Property 1. The maximum value of rank correlation coefficient is 1.
i.e., coefficient correlation r lies between -1 and 1 symbolically, |r|  1 or -1  r  1.
Property 2. The coefficient of correlation is independent of the change of origin and scale of
measurements.
 ( x  X ) ( y  Y ) =  (ui  u) (vi  v) = r
i.e., rxy =
uv
 ( xi  x) 2 ( yi  Y ) 2
 (ui  u) 2 (vi  v) 2
where u,v are obtained by change of origin and scale of variables x and y.
Property 3. If X, Y are random variables and a,b,c,d are any numbers such that a  0, c  0
then r(aX + b, cY + d) =
ac
r ( X ,Y )
| ac |
Property 4. Two independent variables are uncorrelated. i.e., X and Y are independent
variables then r(X, Y) = 0.
When deviations are taken from an assumed mean:
When actual mean is not a whole number, but a fraction or when the series is large,
the calculation by direct method will involve a lot of time.
To avoid such tedious calculations, we can use the assumed mean method.
 XY  3200  3200  0.988
Formula =
 X 2 Y 2 175000 X 60 3240.37
Where
X is deviation of the items of x – series from an assumed mean i.e., X = x - A
Y is deviation of the items of y – series from an assumed mean i.e., Y = y - A
N is number of items
XY = the total of the product of the deviations of x and y-series from their assumed mean.
X2 = the total of the squares of the deviations of x-series from an assumed mean.
Y2 = the total of the squares of the deviations of y-series from an assumed mean.
X = the total of the deviations of x -series from assumed mean.
Y = the total of the deviations of y -series from assumed mean.
Correlation of grouped Bivariate data:
When the number of observations is very large the data is classified into correlation
table or two-way frequency distribution. The class intervals for y are in the column headings
and for x is the stubs. The formula for calculating the coefficient of correlation is
 f XY -
r
 fX  fY
N
( f X ) 2
( f Y ) 2
X  f Y2 
N
N
where f is the frequency X, Y are the deviated values.
 f X2 
Rank correlation coefficient:
A british psychologist Charles Adward spearman found out the method of finding the
coefficient of correlation by ranks.
It can not be measured quantitatively as in the case of Pearson’s coefficient
correlation. It is based on the ranks given to the observations.
The formula for spearman’s rank correlation is given by
= 1-
6 D 2
N ( N 2  1)
Where
 = rank coefficient of correlation
D2 = sum of the squares of the differences of two ranks
N = number of paired observations.
Properties of Rank correlation coefficient:
1. The value of  lies between -1 and 1 that is -1    1.
2. If  = 1, there is complete agreement in the order of the ranks and the direction of the
rank is same.
3. If  = - 1, there is complete dis-agreement in the order of the ranks and they are in the
opposite directions.
A) Procedures to solve:
1. When ranks are given:
Step 1: compute the difference of two ranks and denote by D.
Step 2: square D and get D2
Step 3: obtain  by substituting the figures in the formula.
B) when ranks are not given:
But actual data are given, then we must give ranks. We can give ranks by taking the highest
as 1 and lowest as 1, next to the highest (lowest) as 2 and follow the same procedure for both
the variables.
Equal or related works:
If any two or more persons are bracketed equal in any classification or if there is more than
one item with the same value in the series then the spearman’s formula for calculating the
rank correlation coefficient breaks down.
In this case common ranks are given to repeated items. The common rank is the average of
the ranks which these items would have assumed, if they were different from each other and
the next item will get the rank next to ranks all ready assumed.
Example: if two individuals are placed in the 7th place, each of them are given by the rank
7.5 and the next rank will be 9. Similarly if 3 are ranked at the 7th place then they are given
(7 + 8 + 9)
= 8 which is common rank assigned to each, and the next rank will
by the rank
3
be 10. We use a slightly different formula.
1
1


2
3
3
  D  12 (m  m)  12 (m  m)  .... 
 = 1 - 6

N3  N




where m is the number of items whose ranks are common.
Lakireddy Bali Reddy College of Engineering, Mylavaram
Chapter 4 correlation - regression theory
LBRCE
Lecture 3
Regression
08-JAN-2014
By N V Nagendram
---------------------------------------------------------------------------------------------------------------
Introduction:
The study of correlation measures the direction and strength of the relationship between two
variables. In correlation we can estimate the value of one variable, when the value of the
other variable is given.
In regression, we can estimate the value of one variable with the value of the other variable
which is known.
Definition: the statistical method which helps us to estimate the unknown value of one
variable from the known value of the related variable is called “regression”.
Definition: the line described in the averae relationship between two variables is known as
“line of regression”.
Note: we are using now-a-days the term estimating line instead of regression line.
Uses of Regression:
1.It is used to estimate the relation between two economic variables like income and
expenditure.
2.It is highly valuable tool in Economics and Business.
3.Widely used for prediction purpose.
4.we can calculate coefficient of correlation and coefficient of determination with the help of
the coefficient of regression.
5.It is useful in statistical estimation of demand curves, supply curves, production function,
cost function and consumption function etc.
Comparison between correlation and regression:
The correlation coefficient is a measure of degree of covariability between two
variables, while the regression establishes a functional relation between dependent and
independent variables. So that the former can be predicted for a given value of the later.
In correlation, both the variables x and y are random variables, whereas in
regression, x is a random variable and y is a fixed variable.
The coefficient of correlation is a relative measure whereas regression coefficient is
an absolute figure.
Methods of studying Regression:
We have two methods for studying regression:
1.Graphic method
2. Algebraic method.
1. Graphic method:
The points representing the pairs of values of the variables are plotted on a graph. The
independent variable is taken on X-axis and the dependent variable on Y-axis.
These points form a scatter diagram or scatter-gram. A regression line is between these points
by free hand.
X
65
67
62
70
67
69
71
Y
68
68
66
68
67
68
70
71
70
69
68
67
66
65
64
63
62
regression of line X on Y
regression of line X on Y
63 64 65 66 67 68 69 70
2. Algebraic method:
Regression line:
A regression line is a straight line fitted to the data by the method of least squares. It indicates
the best possible mean value of one variable corresponding to the mean value of the other.
There are always two regression lines constructed for the relationship between two variables
X and Y. Thus one regression line shows the regression of X upon Y and the other shows the
regression of Y on X.
Regression Equation:
Definition:
Regression equation is an algebraic expression line. It can be classified into regression
equation, regression coefficient, individual observation and group discussion.
The standard form of the regression equation is Y = a + b X where a, b are called constants.
“a” indicates the value of Y when X = 0. It is called Y-intercept. “b” indicates the value of
slope of the regression line and gives a measure of change of Y for a unit change in X. it is
also called as regression coefficient of Y on X.
Thus if we know the value of a and b we can easily compute the value of Y for any given
value of X. the values of a and b are found with the help of the following Normal equations.
Regression equation of Y on X:
Y = Na + b X and XY = a X + b X2
Regression equation of X on Y:
X = Na + b Y and XY = a Y + b Y2
Deviation taken from arithmetic mean of X and Y:
This method is easier and simpler than previous method to find the values of a and b. We find
the deviations of X and Y series from their respective means.
Regression equation of X on Y X  X  r
x
(Y  Y )
y
Where X = mean of X series, Y = mean of Y series
The regression coefficient of X on Y = r
 x XY
=
=bXY
 y Y 2
The regression coefficient of Y on Y = r
 y XY
=
=bYX
 x X 2
Thus r2 = bXY bYX
Deviations taken from the assumed mean
If the actual mean is fraction this method is used.
In this method we take deviations from the mean instead of Arithmetic Mean.
X X r
We can find out the value of r

r x=
y
x
(Y  Y )
y
x
by applying the following formula.
y
dx dy
N
, where dx = X – A,dy = Y – A
2



dy
dy 2 
N
dx dxy 
The regression equations of Y on X is Y  Y  r
y
(X  X )
x
dx dy
 y dx dxy 
N
=
r
x
dx 2
2
dx 
N
Regression equation in a Bivariate grouped frequency distribution
r
x
=
y
f dx X f dy
ix
N
X
2
iy
f dy 
f dy 2 
N
f dx dy 
f dx X f dy
 y f dx dy 
iy
N
=
r
X
2
x
ix
f dy 
f dy 2 
N
Where ix = width of the class interval of x variable
iy = width of the class interval of y variable.
Angle between two regression lines:
Let the lines of regression of X on Y and Yon X are given by
xx  r
x
( y  y)
y
Slope of a line = m1 =
y
1 y
and y  y  r
( x  x)
x
r x
Slope of line = m2 = r
Therefore tan  =
y
x

 1 r 2 
m1  m2 1  y
=
- r y = y 

1  m1m2 r  x
 x  x  r 
Note: 1. If  = acute tan  =
 y 1 r 2 


 x  r 
2. If  = obtuse tan  =
 y  r 2 1


 x  r 
3. if r =0 then tan  =  implies  = /2
thus there is no relationship between the two variables that is they are independent
then tan  = /2
4. if r =  1 then tan  = 0 implies  = 0 or 
Hence the two regression lines are parallel or coincidenct. The correlation between
two variables is perfect.
Curvilinear regression:
In the previous study of regression, one of the criteria set forth is the variable X and
Yare related linearly. But, in many cases, this assumption may not be valid.
A curvilinear regression may explain more of the variability of Y than by a linear line.
Non-linear curve fitting:
We discuss now a power function, a polynomial of n th degree and an exponential function to
fit the given data points (xi, yi) for i =1, 2, 3, 4,…..
1. Power function: let y = a xc is the function to be fitted using the given data.
Taking log both sides, we get log y = log a + c log x
Which is of the form of Y = a0 + a1 x where a0 = log a, a1 = c and Y = log y and X =
log x. we can find a0 and a1 using the procedure describer earlier.
2. Polynomial of n th degree: Y = a0 + a1x + a2x + . . .+ an x
3. Parabola: Considering m = 2, we get the curve to be fitted is parabola
y = a0 + a1 x + a2 x2
The normal equations are
yi = ma0 + a1 xi + a2 xi2
xiyi = a0xi + a1 xi2 + a2 xi3
And xi2yi = a0xi2 + a1 xi3 + a2 xi4
Derive the normal equations to fit the parabola y = a + bx + cx2
The normal equations can be written as
Yi = NA + B Xi + C Xi2
XiYi = AXi + B Xi2 + C Xi3
And Xi2Yi = AXi2 + B Xi3 + C Xi4
or
Y = NA + B X + C X2
XY = AX + B X2 + C X3
And X2Y = AX2 + B X3 + C X4
4. Exponential function
(i) Suppose the curve to be fitted with the given data is y = a 0 e a1x
Taking logarithms on both sides we get, log y = log a0 +a1x
Which can be written in the form Z = A + Bx where Z = log y, A = log a0, B = a1
(ii) let the exponential function curve be y = a b x
Taking logarithms on both sides we get, log10 y = log10 a +x log10 b
Which can be written in the form Y = A + Bx where Y = log10 y, A = log10 a,
B = log10 b
Normal equations are given by
Y = mA + B X
XY = AX + B X2
Lakireddy Bali Reddy College of Engineering, Mylavaram
LBRCE
Chapter 4 correlation - regression theory
Quiz -1
Quiz
11-JAN-2014
By N V Nagendram
--------------------------------------------------------------------------------------------------------------1. The coefficient of correlation
a) can not be +ve
b) can not be -ve
c) either +ve or –ve
d) None
[ c ]
2. Which of the following is the highest range of r
a) 0 and 1
b) -1 and 0
c) -1 and +1
d) 1 and 1
[c ]
3. The coefficient of correlation is independent of
a) change of scale only
b) change of origin c) both a and b
d) No change [ c ]
4. The value of r2 for a particular situation is 0.81. What is coefficient of correlation….
a) 0.81
b) 0.9
c) 0.09
d) 0.085
[b ]
5. The coefficient of correlation =
a) has no limits
b) can not be < 1
c) can be > 1
d) -1  r  1 [ d ]
6. The coefficient of correlation
a) bxy X byx
b)
c)
d)
bxy X b yx
bxy
[ b ]
b yx
7. One regression coefficient is +ve then the other regression coefficient is
a) +ve
b) -ve
c) = 0
d) can’t say [ a ]
8. The regression coefficient is independent of
a) origin
b) scale
c) both a and b
d) None
9. When two regression lines coincide then r is
a) 0
b) – 1
c) 1
d) 0.5
10. The two regression lines cut each other at the point of
a) average of x and Y
b) average of X only c) average of Y only d) None
[ a ]
[
c ]
[ a ]
Lakireddy Bali Reddy College of Engineering, Mylavaram
MC105- Probability and Statistical Applications
Chapter 4 correlation - regression theory
LBRCE
Tutorial 1
Correlation Problems
07-JAN-2014
By N V Nagendram
--------------------------------------------------------------------------------------------------------------Linear correlation /non-linear correlation:
Problem #1 Calculate coefficient of correlation from the following data:
X
Y
12
14
9
8
8
6
10
9
11
11
13
12
7
3
Solution: in both series items are in small number.
So there is no need to take deviations.
We use formula for (coefficient of correlation) r =
Computation of coefficient of correlation
Sl no.
X
Y
(col. 1)
(col.2)
(col.3)
1
2
3
4
5
6
7
Totals
12
9
8
10
11
13
7
 X= 70
r
14
8
6
9
11
12
3
 Y= 63
X2
(col. 4)
144
81
64
100
121
169
49
2
 X = 728
Y2
(col. 5)
196
64
36
81
121
144
9
2
 Y = 651
 X x N  ( X ) 2 X  Y 2 x N  ( Y ) 2
(676 x 7)  (70 x 63)
(728 x 7  (70) 2 X 651 x 7  (63) 2

4732  4410
(5096  4900) X (4557  3969)

322
322

 0.95
196 x 588 339.48
-1  0.95  1
Hence the solution.
 ( X ) (Y )
 XYx N -  X  Y
2
Here N = 7
r
Co var iance XY
XY
(col. 6)
=col(2 x 3)
12x14 = 168
9 x 8 = 72
8 x 6 = 48
10 x 9 = 90
11 x 11 =121
13 x 12 =156
7 x 3 = 21
 XY = 676
Problem #2 find if there is any significant correlation between the heights and weights given
below:
Height
57
59
62
63
64
65
55
58
57
inch()
Weights in
113
117
126
126
130
129
111
116
112
lbs(pounds)
We use formula for (coefficient of correlation) r =
Co var iance XY
 ( X ) (Y )
=
 XY
 X Y
2
2
Computation of coefficient of correlation
Sl
No.
1
2
3
4
5
6
7
8
9
Total

Heights
In
Inches
()
Deviation from
mean(60)
57
59
62
63
64
65
55
58
57
540
57 - 60 = -3
59 – 60 = -1
62 – 60 = 2
63 – 60 = 3
64 – 60 = 4
65 – 60 = 5
55 – 60 =-5
58 – 60 =-2
57 – 60 =-3
X= 0
X = x- x
squares of
deviations
Weights
In Lbs
(pounds)
y
X2
9
113
1
117
4
126
9
126
16
130
25
129
25
111
4
116
9
112
2
X =102 Y=1080
Deviation from
mean(60)
Y = y-
113 - 120=-7
117 - 120=-3
126 - 120= 6
126 - 120= 6
130 -120=10
129 - 120= 9
111 - 120=-9
116 - 120=-4
112 - 120=-8
0
Mean is x = 540 / 9 = 60; mean y = 1080 / 9 = 120
r=
216
 0.98
102 x 471
-1  0.98  1.
Hence the solution.
y
squares of
deviations
Y2
49
9
36
36
100
81
81
16
64
Y2=471
Product of
Deviations
X and Y series
(XY)
-3 x -7 =21
-1 x -3 = 3
2 x 6 = 12
3 x 6 = 18
4 x 10 =40
5 x 9 = 45
-5 x -9 =45
-2 x -4 = 8
-3 x -8 =24
XY= 216
Lakireddy Bali Reddy College of Engineering, Mylavaram
LBRCE
Chapter 4 correlation - regression theory
Tutorial 2
Correlation Problems
07-JAN-2014
By N V Nagendram
--------------------------------------------------------------------------------------------------------------Problem #1 The ranks of the 15 students in two subjects A and B are given below, the two
numbers denoting the ranks of the same student in A and B respectively.
Sl No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
B
10
7
2
6
4
8
3
1
11
15
9
5
14
12
13
Use Spear son’s formula to find the rank correlation coefficient?
[Ans. r5=0.514
Problem #2 the following table gives the score obtained by 11 students in English and telugu
translation .find the correlation coefficient?
Scores
in
English
Scores
in
Telugu
40
46
54
60
70
80
82
85
85
90
95
45
45
50
43
40
75
55
72
65
42
70
[Ans. r5 = 0.359
Problem # 3 the following table gives the distribution of the total population and those who
are totally and partially blind them. Find the coefficient of correlation.
SL No.
1
2
3
4
5
6
7
8
Age
0 – 10
10 – 20
20 – 30
30 – 40
40 – 50
50 – 60
60 – 70
70 – 80
No. Of persons
100
60
40
36
24
11
6
3
Blind
55
40
40
40
36
22
18
15
[Ans. r = 0.898
Problem #4 find the coefficient of correlation between age and playing habit from the
following data:
SL No.
1
2
3
4
5
6
7
8
9
Age
15 – 20
20 – 25
25 – 30
30 – 35
35 – 40
40 – 45
45 – 50
50 – 55
55 – 60
No. Of persons
1500
2000
4000
3000
2500
1000
800
500
200
Blind
1200
1560
2280
1500
1000
300
200
50
6
[Ans. r = - 0.993
Problem #5 the following data gives the marks obtained by 10 students in accountancy and
statistics.
SL No.
1
2
3
4
5
6
7
8
9
10
R No.
1
2
3
4
5
6
7
8
9
10
accountancy
45
70
65
30
90
40
50
75
85
60
Statistics
35
90
70
40
95
40
80
80
80
50
[Ans. r = 0.903
Problem #6 Calculatethe coefficient of correlation X and Y from the following data :
X
1
2
3
4
5
6
7
Y
2
4
5
3
8
6
7
[Ans. r = 0.79
Problem #7 Obtain the rank correlation coefficient for the following data
X
Y
68
62
64
58
75
68
50
45
64
81
80
60
75
68
40
55
64
48
50
70
[Ans:  (read as rou) = 0.545]
Problem #8 From the following data calculate the rank correlation coefficient after making
adjustment for tied ranks?
X
Y
48
13
33
13
40
24
9
6
16
15
16
4
65
20
24
16
57
9
16
19
[Ans:  (read as rou) = 0.733]
Lakireddy Bali Reddy College of Engineering, Mylavaram
Chapter 4 correlation - regression theory
LBRCE
Tutorial 3
Correlation Problems
11-JAN-2014
By N V Nagendram
--------------------------------------------------------------------------------------------------------------Problem #1 Ten competitors in a musical test were ranked by the three judges A, B and C in
the following order?
Ranks
by
A
1
6
5
10
3
2
4
9
7
8
B
4
5
8
4
7
10
2
1
6
9
C
6
4
9
8
1
2
3
10
5
7
Using rank correlation method, discuss which pair of judges has the nearest approach
common likings in music?
[Ans:  (read as rou) = 0.733]
Problem #2 A random sample of 5 college students is selected and their grades in
mathematics and statistics are found to be?
Subject
1
2
3
4
5
Mathematics
85
60
73
40
90
Statistics
93
75
65
50
80
Calculate Pearsons’ rank correlation coefficient?
[Ans.  (read as rou) = 0.8]
Problem #3 the ranks of 60 students in maths and statistics are as follows:
Maths
Stat.
1
1
2
10
3
3
4
4
5
5
6
7
7
2
8
6
9
8
10
11
11
15
12
9
13 14 15
16
14 12 16
13
[Ans:  (read as rou) = 0.8]
Problem #4 Following are the rank obtained by 10 students in two subjects, stat and maths.
To what extent the knowledge of the students in two subjects is released?
Stat.
Maths
1
2
2
4
3
1
4
5
5
3
6
9
7
7
8
9
10
10
6
8
[Ans:  (read as rou) = 0.76]
Problem #5 Calculate coefficient of correlation between the marks obtained by a batch of
100 students in accountancy and statistics as given below:
Serial
Number
1
2
3
4
5
6
Total
Age of
Husbands 15 - 25
15 – 25
1
25 – 35
2
35 - 45
45 – 55
55 – 65
65 - 75
3
25 - 35
1
12
4
17
Age of wives
35 - 45 45 - 55 55 - 65
1
10
1
3
6
1
2
4
1
14
9
6
65 - 75 Total
2
15
15
10
2
8
2
3
4
33
[Ans. r= 0.9082]
Problem #6 Psychological tests of intelligence and of engineering ability were applied to 10
students. Here is a record of ungrouped data showing intelligence ratio (I.R) and engineering
ratio (E.R) calculate the co-efficient of correlation?
Student
A
B
C
D
E
F
G
H
I
J
I.R
105
104
102
101
100
99
98
96
93
92
E.R
101
103
100
98
95
96
104
92
97
94
[Ans: r = 0.59]
Problem #7 Find karl Pearsons’ coefficient of correlation from the following data:
Wages 100
101
102
102
100
99
97
98
96
95
Cost of 98
99
99
97
95
92
95
94
90
91
living
[Ans: r = 0.847]
Problem #8 Calculate the coefficient of correlation between age of cars and annual
maintenance cost and comment:
Age of
2
4
6
7
8
10
12
cars
Yrs.
Maint. 1600 1500
1800
1900 1700 2100
2000
P.a.
[Ans: r = 0.836]
Problem #9 With the following data in 6 cities, calculate the coefficient of correlation by
Pearson’s method between the density of population and death rate:
Cities
Area in Km2
Population ‘000
Number of deaths
A
150
30
300
B
180
90
1440
C
100
40
560
D
60
42
840
E
120
72
1224
F
80
24
312
[Ans. r=0.988]