Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Choice modelling wikipedia , lookup
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Linear regression wikipedia , lookup
Lakireddy Bali Reddy College of Engineering, Mylavaram (Autonomous) Master of Computer Applications (I-Semester) MC105- Probability and Statistical Applications Lecture : 4 Periods/Week Internal Marks: 40 External Marks: 60 Credits: 4 External Examination: 3 Hrs. Faculty Name: N V Nagendram UNIT – I Probability Theory: Sample spaces Events & Probability; Discrete Probability; Union, intersection and compliments of Events; Conditional Probability; Baye’s Theorem . UNIT – II Random Variables and Distribution; Random variables Discrete Probability Distributions, continuous probability distribution, Mathematical Expectation or Expectation Binomial, Poisson, Normal, Sampling distribution; Populations and samples, sums and differences. Central limit Elements. Theorem and related applications. UNIT – III Estimation – Point estimation, interval estimation, Bayesian estimation, Text of hypothesis, one tail, two tail test, test of Hypothesis concerning means. Test of Hypothesis concerning proportions, F-test, goodness of fit. UNIT – IV Linear correlation coefficient Linear regression; Non-linear regression least square fit; Polynomial and curve fittings. UNIT – V Queing theory – Markov Chains – Introduction to Queing systems- Elements of a Queuing model – Exponential distribution – Pure birth and death models. Generalized Poisson Queuing model – specialized Poisson Queues. ________________________________________________________________________ Text Book: Probability and Statistics By T K V Iyengar S chand, 3rd Edition, 2011. References: 1. Higher engg. Mathematics by B V Ramana, 2009 Edition. 2. Fundamentals of Mathematical Statistics by S C Gupta & V K Kapoor Sultan Chand & Sons, New Delhi 2009. 3. Probability & Statistics by Schaum outline series, Lipschutz Seymour,TMH,New Delhi 3rd Edition 2009. 4. Probability & Statistics by Miller and freaud, Prentice Hall India, Delhi 7th Edition 2009. Planned Topics UNIT – IV 1. Introduction 2. Definition 3. Types (four) of correlation 4. Methods of studying Correlation 5. Scatter diagram or Scatter-gram 6. Advantages of Scatter-gram 7. Simple graphs 8. Coefficient of Correlation 9. KARL Pearson’s coefficient of Correlation 10. Properties(I,II and III) of correlation coefficient 11. When deviations are taken from an assumed mean 12. Correlation of grouped bi-variate data 13. Rank Correlation Coefficient 14. Properties of Rank Correlation Coefficient 15. Equal or repeated ranks Laki Reddy Bali Reddy College of Engineering, Mylavaram Master of Computer Applications (I-Semester) MC105- Probability and Statistical Applications Chapter 4 correlation - regression theory LBRCE lecture 1 Correlation Vs Regression 04-jan-2014 By N V Nagendram --------------------------------------------------------------------------------------------------------------Introduction: We are used to study the characteristics of only one variable like marks, weights, heights, prices, sales etc., This type of analysis is called “Univariate Analysis” . Definition: If there exists some relationship between two variables then the statistical analysis of such data is called “Bivariate Analysis”. Here we are interested to find any relation between two variables under study. Example: we can easily conclude that there is some relation between the price and the sale, price and the production of a commodity. Correlation refers to the relationship of two or more variables. We know that there exists relationship between the heights of a father and a son, wage and price index. The study of relation is called “Correlation”. Definition: Correlation is a statistical analysis which measures and analyses the degree or extent to which two variables fluctuate with reference to each other. The correlation expresses the relationship or independence of two set of variables upon each other. One variable may be called the subject (independent) and the other relative (dependent). Types of Correlation: Correlation is classified into many types. 1. Positive and negative 2. Simple and multiple 3. Partial and total 4. Linear and non-linear 1. Positive and Negative correlation: Definition: If two variables tend to move together in the same direction that is an increase in the value of one variable is accompained by an increase in the value of the other variable or a decrease in the value of one variable is accompained by an decrease in the value of the other variable then the correlation is called “positive or direct correlation”. Example: height and weight, rainfall and yield of crops, price and supply. Definition: If two variables, tend to move together in the same directions that is an increase or decrease in the values of one variable is accompained by a decrease or increase in the value of the other variable, then the correlation is called “negative or inverse correlation”. 2. Simple and multiple correlation: Definition: About the study of only two variables, the relationship is described as simple correlation. Example: Quantity of money and price level, demand and price. Definition: About the study of more than two variables simultaneously, the relationship is described as multiple correlation. Example: The relationship of price, demand and supply of a commodity. 3. Partial and total correlation: Definition: The study of two variables excluding some other variables is called “partial correlation”. Example: Price and demand, eliminating the supply side. Note: In total correlation, all the facts are taken into account. 4. Linear and non-linear correlation: Definition: If the ratio of change between two variables is uniform, then there will be linear correlation between them. A 2 7 12 17 B 3 9 15 21 The ratio of change between the variables is the same. If we plot these on the graph, we get a straight line. Definition: If a curvilinear or non-linear correlation, the amount of change in one variable does not bear a constant ratio of the amount of change in the other variables. The graph of non-linear or curvilinear relationship will be a curve. Methods of studying Correlation: There are two different methods for finding out the relationship between variables. They are 1) Graphics methods are (i) Scatter diagram or Scatter-gram (ii) Simple graph 2) Mathematical methods are (i) Karl ‘Pearson’s coefficient of correlation (ii) Spearman’s rank coefficient correlation (iii) coefficient of concurrent deviation (iv) Method of least squares Scatter diagram / Scattergram: The scatter-gram is a chart obtained by ploting two variables to find out whether there is any relationship between them. Here X variables are ploted on the horizontal axis and Y variables are ploted on the vertical axis.Thus we can know the scatter or concentration of various points as shown below: r = +1 //// //// //// //// //// //// High degree of positive correlation r = -1 \\\\ \\\\ \\\\ \\\\ \\\\ \\\\ \\\\ High degree of negative correlation No correlation Lakireddy Bali Reddy College of Engineering, Mylavaram Chapter 4 correlation - regression theory LBRCE Lecture 2 Correlation 07-JAN-2014 By N V Nagendram --------------------------------------------------------------------------------------------------------------Advantages of Scatter Diagram / Scattergram: 1. Scatter diagram is a simple, attractive method to find out the nature of correlation. 2. It is easy to understand. 3. A rough idea is got at a glance whether it is positive or negative correlation. Simple graph: the values of the two variables are plotted on a graph paper. We get two curves, one for X variables and another for Y variables. These two curves reveal the direction and closeness of the two variables and also reveal whether are not the variables are related. Definition: If both the curves move in the same direction, i.e., parallel to each other, either upward or downward, correlation is said to positive correlation. Definition: If both the curves move in the opposite directions, i.e., opposite to each other, either upward or downward, correlation is said to negative correlation. Uses: this method is used in the case of time series. Note: this method does not reveal the extent to which the variables are related. Coefficient correlation: Correlation is statistical technique used for analysing the behaviour of two or more variables. Its analysis deals with the association, between two or more variables. Statistical measures of correlation relates to co-variation between series but not of function or casual relationship. Karl Pearson’s coefficient of correlation: Karl Pearson (1867 – 1936) a british biometrician and statistician suggested a mathematical method for measuring the magnitude of linear relationship between two variables. This is known as Pearsonian coefficient of correlation. It is denoted by “r”. This method is most widely used. It is also called Product – Moment correlation coefficient. There are several formulae to calculate “r” as below: 1. r = cov ariance of xy x X y 2. r = xy N x y 3. r = XY X Y 2 X = (x - X ), Y = (y - Y ) where X , Y are means of this series x and y. x is standard deviation of series x. y is standard deviation of series y. 2 Properties of correlation coefficient: Property 1. The maximum value of rank correlation coefficient is 1. i.e., coefficient correlation r lies between -1 and 1 symbolically, |r| 1 or -1 r 1. Property 2. The coefficient of correlation is independent of the change of origin and scale of measurements. ( x X ) ( y Y ) = (ui u) (vi v) = r i.e., rxy = uv ( xi x) 2 ( yi Y ) 2 (ui u) 2 (vi v) 2 where u,v are obtained by change of origin and scale of variables x and y. Property 3. If X, Y are random variables and a,b,c,d are any numbers such that a 0, c 0 then r(aX + b, cY + d) = ac r ( X ,Y ) | ac | Property 4. Two independent variables are uncorrelated. i.e., X and Y are independent variables then r(X, Y) = 0. When deviations are taken from an assumed mean: When actual mean is not a whole number, but a fraction or when the series is large, the calculation by direct method will involve a lot of time. To avoid such tedious calculations, we can use the assumed mean method. XY 3200 3200 0.988 Formula = X 2 Y 2 175000 X 60 3240.37 Where X is deviation of the items of x – series from an assumed mean i.e., X = x - A Y is deviation of the items of y – series from an assumed mean i.e., Y = y - A N is number of items XY = the total of the product of the deviations of x and y-series from their assumed mean. X2 = the total of the squares of the deviations of x-series from an assumed mean. Y2 = the total of the squares of the deviations of y-series from an assumed mean. X = the total of the deviations of x -series from assumed mean. Y = the total of the deviations of y -series from assumed mean. Correlation of grouped Bivariate data: When the number of observations is very large the data is classified into correlation table or two-way frequency distribution. The class intervals for y are in the column headings and for x is the stubs. The formula for calculating the coefficient of correlation is f XY - r fX fY N ( f X ) 2 ( f Y ) 2 X f Y2 N N where f is the frequency X, Y are the deviated values. f X2 Rank correlation coefficient: A british psychologist Charles Adward spearman found out the method of finding the coefficient of correlation by ranks. It can not be measured quantitatively as in the case of Pearson’s coefficient correlation. It is based on the ranks given to the observations. The formula for spearman’s rank correlation is given by = 1- 6 D 2 N ( N 2 1) Where = rank coefficient of correlation D2 = sum of the squares of the differences of two ranks N = number of paired observations. Properties of Rank correlation coefficient: 1. The value of lies between -1 and 1 that is -1 1. 2. If = 1, there is complete agreement in the order of the ranks and the direction of the rank is same. 3. If = - 1, there is complete dis-agreement in the order of the ranks and they are in the opposite directions. A) Procedures to solve: 1. When ranks are given: Step 1: compute the difference of two ranks and denote by D. Step 2: square D and get D2 Step 3: obtain by substituting the figures in the formula. B) when ranks are not given: But actual data are given, then we must give ranks. We can give ranks by taking the highest as 1 and lowest as 1, next to the highest (lowest) as 2 and follow the same procedure for both the variables. Equal or related works: If any two or more persons are bracketed equal in any classification or if there is more than one item with the same value in the series then the spearman’s formula for calculating the rank correlation coefficient breaks down. In this case common ranks are given to repeated items. The common rank is the average of the ranks which these items would have assumed, if they were different from each other and the next item will get the rank next to ranks all ready assumed. Example: if two individuals are placed in the 7th place, each of them are given by the rank 7.5 and the next rank will be 9. Similarly if 3 are ranked at the 7th place then they are given (7 + 8 + 9) = 8 which is common rank assigned to each, and the next rank will by the rank 3 be 10. We use a slightly different formula. 1 1 2 3 3 D 12 (m m) 12 (m m) .... = 1 - 6 N3 N where m is the number of items whose ranks are common. Lakireddy Bali Reddy College of Engineering, Mylavaram Chapter 4 correlation - regression theory LBRCE Lecture 3 Regression 08-JAN-2014 By N V Nagendram --------------------------------------------------------------------------------------------------------------- Introduction: The study of correlation measures the direction and strength of the relationship between two variables. In correlation we can estimate the value of one variable, when the value of the other variable is given. In regression, we can estimate the value of one variable with the value of the other variable which is known. Definition: the statistical method which helps us to estimate the unknown value of one variable from the known value of the related variable is called “regression”. Definition: the line described in the averae relationship between two variables is known as “line of regression”. Note: we are using now-a-days the term estimating line instead of regression line. Uses of Regression: 1.It is used to estimate the relation between two economic variables like income and expenditure. 2.It is highly valuable tool in Economics and Business. 3.Widely used for prediction purpose. 4.we can calculate coefficient of correlation and coefficient of determination with the help of the coefficient of regression. 5.It is useful in statistical estimation of demand curves, supply curves, production function, cost function and consumption function etc. Comparison between correlation and regression: The correlation coefficient is a measure of degree of covariability between two variables, while the regression establishes a functional relation between dependent and independent variables. So that the former can be predicted for a given value of the later. In correlation, both the variables x and y are random variables, whereas in regression, x is a random variable and y is a fixed variable. The coefficient of correlation is a relative measure whereas regression coefficient is an absolute figure. Methods of studying Regression: We have two methods for studying regression: 1.Graphic method 2. Algebraic method. 1. Graphic method: The points representing the pairs of values of the variables are plotted on a graph. The independent variable is taken on X-axis and the dependent variable on Y-axis. These points form a scatter diagram or scatter-gram. A regression line is between these points by free hand. X 65 67 62 70 67 69 71 Y 68 68 66 68 67 68 70 71 70 69 68 67 66 65 64 63 62 regression of line X on Y regression of line X on Y 63 64 65 66 67 68 69 70 2. Algebraic method: Regression line: A regression line is a straight line fitted to the data by the method of least squares. It indicates the best possible mean value of one variable corresponding to the mean value of the other. There are always two regression lines constructed for the relationship between two variables X and Y. Thus one regression line shows the regression of X upon Y and the other shows the regression of Y on X. Regression Equation: Definition: Regression equation is an algebraic expression line. It can be classified into regression equation, regression coefficient, individual observation and group discussion. The standard form of the regression equation is Y = a + b X where a, b are called constants. “a” indicates the value of Y when X = 0. It is called Y-intercept. “b” indicates the value of slope of the regression line and gives a measure of change of Y for a unit change in X. it is also called as regression coefficient of Y on X. Thus if we know the value of a and b we can easily compute the value of Y for any given value of X. the values of a and b are found with the help of the following Normal equations. Regression equation of Y on X: Y = Na + b X and XY = a X + b X2 Regression equation of X on Y: X = Na + b Y and XY = a Y + b Y2 Deviation taken from arithmetic mean of X and Y: This method is easier and simpler than previous method to find the values of a and b. We find the deviations of X and Y series from their respective means. Regression equation of X on Y X X r x (Y Y ) y Where X = mean of X series, Y = mean of Y series The regression coefficient of X on Y = r x XY = =bXY y Y 2 The regression coefficient of Y on Y = r y XY = =bYX x X 2 Thus r2 = bXY bYX Deviations taken from the assumed mean If the actual mean is fraction this method is used. In this method we take deviations from the mean instead of Arithmetic Mean. X X r We can find out the value of r r x= y x (Y Y ) y x by applying the following formula. y dx dy N , where dx = X – A,dy = Y – A 2 dy dy 2 N dx dxy The regression equations of Y on X is Y Y r y (X X ) x dx dy y dx dxy N = r x dx 2 2 dx N Regression equation in a Bivariate grouped frequency distribution r x = y f dx X f dy ix N X 2 iy f dy f dy 2 N f dx dy f dx X f dy y f dx dy iy N = r X 2 x ix f dy f dy 2 N Where ix = width of the class interval of x variable iy = width of the class interval of y variable. Angle between two regression lines: Let the lines of regression of X on Y and Yon X are given by xx r x ( y y) y Slope of a line = m1 = y 1 y and y y r ( x x) x r x Slope of line = m2 = r Therefore tan = y x 1 r 2 m1 m2 1 y = - r y = y 1 m1m2 r x x x r Note: 1. If = acute tan = y 1 r 2 x r 2. If = obtuse tan = y r 2 1 x r 3. if r =0 then tan = implies = /2 thus there is no relationship between the two variables that is they are independent then tan = /2 4. if r = 1 then tan = 0 implies = 0 or Hence the two regression lines are parallel or coincidenct. The correlation between two variables is perfect. Curvilinear regression: In the previous study of regression, one of the criteria set forth is the variable X and Yare related linearly. But, in many cases, this assumption may not be valid. A curvilinear regression may explain more of the variability of Y than by a linear line. Non-linear curve fitting: We discuss now a power function, a polynomial of n th degree and an exponential function to fit the given data points (xi, yi) for i =1, 2, 3, 4,….. 1. Power function: let y = a xc is the function to be fitted using the given data. Taking log both sides, we get log y = log a + c log x Which is of the form of Y = a0 + a1 x where a0 = log a, a1 = c and Y = log y and X = log x. we can find a0 and a1 using the procedure describer earlier. 2. Polynomial of n th degree: Y = a0 + a1x + a2x + . . .+ an x 3. Parabola: Considering m = 2, we get the curve to be fitted is parabola y = a0 + a1 x + a2 x2 The normal equations are yi = ma0 + a1 xi + a2 xi2 xiyi = a0xi + a1 xi2 + a2 xi3 And xi2yi = a0xi2 + a1 xi3 + a2 xi4 Derive the normal equations to fit the parabola y = a + bx + cx2 The normal equations can be written as Yi = NA + B Xi + C Xi2 XiYi = AXi + B Xi2 + C Xi3 And Xi2Yi = AXi2 + B Xi3 + C Xi4 or Y = NA + B X + C X2 XY = AX + B X2 + C X3 And X2Y = AX2 + B X3 + C X4 4. Exponential function (i) Suppose the curve to be fitted with the given data is y = a 0 e a1x Taking logarithms on both sides we get, log y = log a0 +a1x Which can be written in the form Z = A + Bx where Z = log y, A = log a0, B = a1 (ii) let the exponential function curve be y = a b x Taking logarithms on both sides we get, log10 y = log10 a +x log10 b Which can be written in the form Y = A + Bx where Y = log10 y, A = log10 a, B = log10 b Normal equations are given by Y = mA + B X XY = AX + B X2 Lakireddy Bali Reddy College of Engineering, Mylavaram LBRCE Chapter 4 correlation - regression theory Quiz -1 Quiz 11-JAN-2014 By N V Nagendram --------------------------------------------------------------------------------------------------------------1. The coefficient of correlation a) can not be +ve b) can not be -ve c) either +ve or –ve d) None [ c ] 2. Which of the following is the highest range of r a) 0 and 1 b) -1 and 0 c) -1 and +1 d) 1 and 1 [c ] 3. The coefficient of correlation is independent of a) change of scale only b) change of origin c) both a and b d) No change [ c ] 4. The value of r2 for a particular situation is 0.81. What is coefficient of correlation…. a) 0.81 b) 0.9 c) 0.09 d) 0.085 [b ] 5. The coefficient of correlation = a) has no limits b) can not be < 1 c) can be > 1 d) -1 r 1 [ d ] 6. The coefficient of correlation a) bxy X byx b) c) d) bxy X b yx bxy [ b ] b yx 7. One regression coefficient is +ve then the other regression coefficient is a) +ve b) -ve c) = 0 d) can’t say [ a ] 8. The regression coefficient is independent of a) origin b) scale c) both a and b d) None 9. When two regression lines coincide then r is a) 0 b) – 1 c) 1 d) 0.5 10. The two regression lines cut each other at the point of a) average of x and Y b) average of X only c) average of Y only d) None [ a ] [ c ] [ a ] Lakireddy Bali Reddy College of Engineering, Mylavaram MC105- Probability and Statistical Applications Chapter 4 correlation - regression theory LBRCE Tutorial 1 Correlation Problems 07-JAN-2014 By N V Nagendram --------------------------------------------------------------------------------------------------------------Linear correlation /non-linear correlation: Problem #1 Calculate coefficient of correlation from the following data: X Y 12 14 9 8 8 6 10 9 11 11 13 12 7 3 Solution: in both series items are in small number. So there is no need to take deviations. We use formula for (coefficient of correlation) r = Computation of coefficient of correlation Sl no. X Y (col. 1) (col.2) (col.3) 1 2 3 4 5 6 7 Totals 12 9 8 10 11 13 7 X= 70 r 14 8 6 9 11 12 3 Y= 63 X2 (col. 4) 144 81 64 100 121 169 49 2 X = 728 Y2 (col. 5) 196 64 36 81 121 144 9 2 Y = 651 X x N ( X ) 2 X Y 2 x N ( Y ) 2 (676 x 7) (70 x 63) (728 x 7 (70) 2 X 651 x 7 (63) 2 4732 4410 (5096 4900) X (4557 3969) 322 322 0.95 196 x 588 339.48 -1 0.95 1 Hence the solution. ( X ) (Y ) XYx N - X Y 2 Here N = 7 r Co var iance XY XY (col. 6) =col(2 x 3) 12x14 = 168 9 x 8 = 72 8 x 6 = 48 10 x 9 = 90 11 x 11 =121 13 x 12 =156 7 x 3 = 21 XY = 676 Problem #2 find if there is any significant correlation between the heights and weights given below: Height 57 59 62 63 64 65 55 58 57 inch() Weights in 113 117 126 126 130 129 111 116 112 lbs(pounds) We use formula for (coefficient of correlation) r = Co var iance XY ( X ) (Y ) = XY X Y 2 2 Computation of coefficient of correlation Sl No. 1 2 3 4 5 6 7 8 9 Total Heights In Inches () Deviation from mean(60) 57 59 62 63 64 65 55 58 57 540 57 - 60 = -3 59 – 60 = -1 62 – 60 = 2 63 – 60 = 3 64 – 60 = 4 65 – 60 = 5 55 – 60 =-5 58 – 60 =-2 57 – 60 =-3 X= 0 X = x- x squares of deviations Weights In Lbs (pounds) y X2 9 113 1 117 4 126 9 126 16 130 25 129 25 111 4 116 9 112 2 X =102 Y=1080 Deviation from mean(60) Y = y- 113 - 120=-7 117 - 120=-3 126 - 120= 6 126 - 120= 6 130 -120=10 129 - 120= 9 111 - 120=-9 116 - 120=-4 112 - 120=-8 0 Mean is x = 540 / 9 = 60; mean y = 1080 / 9 = 120 r= 216 0.98 102 x 471 -1 0.98 1. Hence the solution. y squares of deviations Y2 49 9 36 36 100 81 81 16 64 Y2=471 Product of Deviations X and Y series (XY) -3 x -7 =21 -1 x -3 = 3 2 x 6 = 12 3 x 6 = 18 4 x 10 =40 5 x 9 = 45 -5 x -9 =45 -2 x -4 = 8 -3 x -8 =24 XY= 216 Lakireddy Bali Reddy College of Engineering, Mylavaram LBRCE Chapter 4 correlation - regression theory Tutorial 2 Correlation Problems 07-JAN-2014 By N V Nagendram --------------------------------------------------------------------------------------------------------------Problem #1 The ranks of the 15 students in two subjects A and B are given below, the two numbers denoting the ranks of the same student in A and B respectively. Sl No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 B 10 7 2 6 4 8 3 1 11 15 9 5 14 12 13 Use Spear son’s formula to find the rank correlation coefficient? [Ans. r5=0.514 Problem #2 the following table gives the score obtained by 11 students in English and telugu translation .find the correlation coefficient? Scores in English Scores in Telugu 40 46 54 60 70 80 82 85 85 90 95 45 45 50 43 40 75 55 72 65 42 70 [Ans. r5 = 0.359 Problem # 3 the following table gives the distribution of the total population and those who are totally and partially blind them. Find the coefficient of correlation. SL No. 1 2 3 4 5 6 7 8 Age 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 No. Of persons 100 60 40 36 24 11 6 3 Blind 55 40 40 40 36 22 18 15 [Ans. r = 0.898 Problem #4 find the coefficient of correlation between age and playing habit from the following data: SL No. 1 2 3 4 5 6 7 8 9 Age 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 No. Of persons 1500 2000 4000 3000 2500 1000 800 500 200 Blind 1200 1560 2280 1500 1000 300 200 50 6 [Ans. r = - 0.993 Problem #5 the following data gives the marks obtained by 10 students in accountancy and statistics. SL No. 1 2 3 4 5 6 7 8 9 10 R No. 1 2 3 4 5 6 7 8 9 10 accountancy 45 70 65 30 90 40 50 75 85 60 Statistics 35 90 70 40 95 40 80 80 80 50 [Ans. r = 0.903 Problem #6 Calculatethe coefficient of correlation X and Y from the following data : X 1 2 3 4 5 6 7 Y 2 4 5 3 8 6 7 [Ans. r = 0.79 Problem #7 Obtain the rank correlation coefficient for the following data X Y 68 62 64 58 75 68 50 45 64 81 80 60 75 68 40 55 64 48 50 70 [Ans: (read as rou) = 0.545] Problem #8 From the following data calculate the rank correlation coefficient after making adjustment for tied ranks? X Y 48 13 33 13 40 24 9 6 16 15 16 4 65 20 24 16 57 9 16 19 [Ans: (read as rou) = 0.733] Lakireddy Bali Reddy College of Engineering, Mylavaram Chapter 4 correlation - regression theory LBRCE Tutorial 3 Correlation Problems 11-JAN-2014 By N V Nagendram --------------------------------------------------------------------------------------------------------------Problem #1 Ten competitors in a musical test were ranked by the three judges A, B and C in the following order? Ranks by A 1 6 5 10 3 2 4 9 7 8 B 4 5 8 4 7 10 2 1 6 9 C 6 4 9 8 1 2 3 10 5 7 Using rank correlation method, discuss which pair of judges has the nearest approach common likings in music? [Ans: (read as rou) = 0.733] Problem #2 A random sample of 5 college students is selected and their grades in mathematics and statistics are found to be? Subject 1 2 3 4 5 Mathematics 85 60 73 40 90 Statistics 93 75 65 50 80 Calculate Pearsons’ rank correlation coefficient? [Ans. (read as rou) = 0.8] Problem #3 the ranks of 60 students in maths and statistics are as follows: Maths Stat. 1 1 2 10 3 3 4 4 5 5 6 7 7 2 8 6 9 8 10 11 11 15 12 9 13 14 15 16 14 12 16 13 [Ans: (read as rou) = 0.8] Problem #4 Following are the rank obtained by 10 students in two subjects, stat and maths. To what extent the knowledge of the students in two subjects is released? Stat. Maths 1 2 2 4 3 1 4 5 5 3 6 9 7 7 8 9 10 10 6 8 [Ans: (read as rou) = 0.76] Problem #5 Calculate coefficient of correlation between the marks obtained by a batch of 100 students in accountancy and statistics as given below: Serial Number 1 2 3 4 5 6 Total Age of Husbands 15 - 25 15 – 25 1 25 – 35 2 35 - 45 45 – 55 55 – 65 65 - 75 3 25 - 35 1 12 4 17 Age of wives 35 - 45 45 - 55 55 - 65 1 10 1 3 6 1 2 4 1 14 9 6 65 - 75 Total 2 15 15 10 2 8 2 3 4 33 [Ans. r= 0.9082] Problem #6 Psychological tests of intelligence and of engineering ability were applied to 10 students. Here is a record of ungrouped data showing intelligence ratio (I.R) and engineering ratio (E.R) calculate the co-efficient of correlation? Student A B C D E F G H I J I.R 105 104 102 101 100 99 98 96 93 92 E.R 101 103 100 98 95 96 104 92 97 94 [Ans: r = 0.59] Problem #7 Find karl Pearsons’ coefficient of correlation from the following data: Wages 100 101 102 102 100 99 97 98 96 95 Cost of 98 99 99 97 95 92 95 94 90 91 living [Ans: r = 0.847] Problem #8 Calculate the coefficient of correlation between age of cars and annual maintenance cost and comment: Age of 2 4 6 7 8 10 12 cars Yrs. Maint. 1600 1500 1800 1900 1700 2100 2000 P.a. [Ans: r = 0.836] Problem #9 With the following data in 6 cities, calculate the coefficient of correlation by Pearson’s method between the density of population and death rate: Cities Area in Km2 Population ‘000 Number of deaths A 150 30 300 B 180 90 1440 C 100 40 560 D 60 42 840 E 120 72 1224 F 80 24 312 [Ans. r=0.988]