Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Continuous distribution : normal , exponential , uniform . Correlation and regression . Curve fitting DISCUSS ABOUT CONTINUOUS DISTRIBUTIONS: (a) UNIFORM: A random variable X is uniformly distributed on the interval ( a, b) if its pdf is given by, 1 a β€ x β€ b f(x) = {bβa 0 Its cdf is, 0 πππ π π₯<π aβ€xβ€b (xβa) f(x) = {(bβa) 1 E(X) = (a+b)/2 V(X)=(b-a)2/12 f(x) {0 π₯>π f(x) 1/(b-a) 1 x 0 a b x 0 a b Example: If a wheel is spun and then allowed to come to rest, the point on the circumference of the wheel that is located opposite a certain fixed marker could be considered the value of a random variable X that is uniformly distributed over the circumference of the wheel. One could then compute the probability that X will fall in any given arc. If we assume that it is uniform in the interval[3,6], we can obtain, Average point of outcome, E[X]= [a+b]/12 = [3+6]/12=9/12=3/4. Variance var[X]= [b-a]2/12= [6-3]2/12=6/12=1/2. 2. EXPONENTIAL: A Random variable X is said to be exponentially distributed if its pdf is given by, f(x) = { Ξ» e-Ξ»x xβ₯0 {0 otherwise Where Ξ» β parameter. f(x) = { 0 x<0 -Ξ»x { 1- Ξ» e xβ₯0 E(X) = 1/Ξ». V(X) = 1/Ξ»2. Exponential distribution is useful in representing lifetime of items, model interarrival times when arrivals are completely random and service times which are highly variable. Exponential distribution has a property called memory less property given by, P(X > s + t / X > s) = p(X > t) This is why we are able to use exponential to model lifetimes. Example: Let us assume that a company is manufacturing burettes whose lifetime is assumed to be exponential with average life, 950 days. What is the probability that it is in working condition for up to 1000 days. Solution: It is given that , X= Lifetime of the burette , is exponential with average life 950 days i.e Ξ»=950. 1000 P[life time is up to 1000 days] = P[0<X<1000] = β«0 Ξ» eβΞ»x 1000 =β«0 950 eβ950x = 950 [eβ950x /β950]1000 . 0 3. NORMAL: A normal variable X with mean µ( -β < µ < β ) and variance Ο2 > 0 has a normal distribution if its pdf is, f(x) = ( 1/β2Ο ) exp [ -1/2 ( x-µ/Ο)2 ] -β < x < β A normal distribution is used when we are having a sum of many random variables. A normal random variable with µ = 0 and Ο = 1 is called a standard normal r.v. Its curve is symmetrically distributed about the average µ = 0. Example: Let us assume that heights of students in II M.Pharm is normally distributed with an average of 165 cm and a standard deviation of 10 cms. What is the probability that a studentβs height is less than 175 cms. Solution: Let, X= Height of students in II M.Pharm. It is normal with, mean µ= 165; standard deviation Ο =10. P[ a studentβs height is less than 175 cms]=P[-β<X<175] First, we should convert X into Z by Z= x- µ/ Ο. We have x=175, µ= 165; Ο =10. Z= 175- 165/ 10 =1. So when X=175; Z=1 and so P[-β<X<175] = P[-β<Z<1]= P[-β<Z<0]+ P[0<Z<1]. =0.5+0.34 = 0.84. Note: 1. The same question may have the following variations: P[ a studentβs height is more than 175 cms]=P[175<X<-β] = P[0<X<-β]- P[0<X<175] =0.5- table value P[ a studentβs height is between 165 and 175 cms]=P[165 < X <175] =P[0 < X <175]- P[ 0< X <165]=table value for 175 β table value for 165 CORRELATION Correlation is measure of check whether two variables are related or not. We can start that by simply plotting their related values using Scatter diagram. Plot the pair of values and weβll obtain a diagram and based on the level and pattern of the scatter of the points, we can understand the amount of correlation between the two variables. Y Y x x x x x x x x x x X X Positive correlation [They are around this line] Negative Correlation [They are around this line] Y Y x x x x x x x x x x X Positive Perfect correlation [They are on this line] X Negative Perfect Correlation [They are on this line] Y x x x x x X No correlation [ As they are not around any line] The Karl Pearson correlation coefficient (typically denoted by r) is a measure of the Correlation (linear dependence) between two variables X and Y, giving a value between +1 and β1 inclusive. It is widely used in the sciences as a measure of the strength of linear dependence between two variables. It was first introduced by Francis Galton in the 1880s, and is named after Karl Pearson. The correlation coefficient is sometimes called "Pearson's r", given by the formula πβππ β βπβπ π= βπβπ 2 β [βπ]2 ββπβπ 2 β [βπ]2 Obtain the correlation coefficient to the following data. X 5 7 9 10 3 7 7 9 10 12 6 8 Y Sol: X Y XY X2 Y2 5 7 35 25 49 7 9 49 49 81 9 10 90 81 100 10 12 120 100 144 3 6 7 8 β=41 β=5 2 18 9 36 56 49 64 β=368 β=31 3 β=474 N=6. Therefore. πβππ β βπβπ π= βπβπ 2 β [βπ]2 ββπβπ 2 β [βπ]2 6[368] β [41][52] = β6[313] β [41]2 ββ6[474] β [52]2 = 0.458 SPEARMANβS RANK CORRELATION Spearmanβs Rank correlation is the study of relationships between different rankings on the same set of items. A rank correlation coefficient measures the correspondence between two rankings and assesses its significance, given by the formula π =1β 6βπ 2 π[π 2 β 1] Example: Calculate Spearmanβs Rank correlation for the data X: 10 8 1 2 6 9 3 5 4 7 Y: 6 10 5 4 3 1 2 9 8 7 X 10 8 1 2 6 9 3 5 4 7 Y 6 10 5 4 3 1 2 9 8 7 .d=X-Y 4 -2 -4 -2 3 8 1 -4 -4 0 π =1β .d2 16 4 16 4 9 64 1 16 16 0 146 6βπ 2 π[π 2 β 1] 6[146] 10[10 β 1] =0.115 =1β REGRESSION Regression is the procedure to obtain the type of relation existing between the variables under discussion. The term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with Linear regression model. The designation "linear" is used to identify a subclass of models for which substantial reduction in the complexity of the related Statistical theory is possible. Let us consider two variables, X and Y. Since we are theoretically considering their relation, keeping each as an independent variable we βll derive an equation. Regression line of X on Y[X depending on Y] X-πΜ =bxy [Y-πΜ ] Where, πΜ - mean of X πΜ - mean of Y βπ₯π¦ bxy β regression coefficient of X on Y = βπ¦ 2 π₯= X-πΜ π¦ =Y-πΜ Regression line of Y on X[Y depending on X] Y-πΜ =byx [ X-πΜ ] Where, πΜ - mean of X πΜ - mean of Y βπ₯π¦ byx β regression coefficient of X on Y = βπ₯ 2 π₯= X-πΜ π¦ =Y-πΜ Note: 1. The regression coefficients bxy and byx are of the same sign. 2. The correlation coefficient and the regression coefficients are connected by .r= β[bxy byx] Example: Calculate the regression lines for the following data. X:6 Y:9 2 11 10 5 4 8 8 7 Solution: X Y π₯= X-πΜ π¦ =Y-πΜ 6 2 10 4 8 β=30 9 11 5 8 7 β=40 0 -4 4 -2 2 β=0 1 3 -3 0 -1 β=0 π₯2 0 16 16 4 4 β=40 π¦2 1 9 9 0 1 β=20 π₯π¦ 0 -12 -12 0 -2 β=-26 πΜ = βπ 30 βπ 40 = = 6 ; πΜ = = =8 π 5 π 5 Regression coefficients βπ₯π¦ β26 βπ₯π¦ 20 β26 bxy =βπ¦ 2 = byx = βπ₯ 2 = 40 = β1.3 = β0.65 Regression line of X on Y[X depending on Y] X-6 =-1.3 [Y-8] X =-1.3Y+1.64 Regression line of Y on X[Y depending on X] Y-8=-0.65 [ X-6] Y= -0.65X+11.9 CURVE FITTING Different types of equations or curves can be obtained from a given data. But the problem is to find the equation of the curve of ' Best Fit' which is most suitable for predicting the unknown values. This process of finding an equation of best fit is known as Curve fitting. For fitting the curve we use the principle of least squares. The form of the curve To fit a statistical data should be known to apply the principle of least squares. The principle of least squares will enable us to determine the parameters involved in the Relationship connecting the variables. Using this Principle , we shall fit the following curves. i. ii. iii. iv. v. (i) A Straight line Y = a X + b A Second degree parabola Y = a X2 + b X + c The exponential curve Y = a ebX The curve Y = a Xb The curve Y = a bX Fitting a straight line: Suppose (x1 , y1) , (x2 , y2) ,β¦ (xn , yn) be n pairs of values and we have to determine the line of best fit for this data. Let us assume that Y = aX + b (or) Y = a + bX as a line of Best fit. Using the principle of least Squares , we can determine the parameters 'a' and 'b'.if the curve is Y = a + bX It can be shown that a and b are determined by the equation β Y = na + b β X βXY = aβX + b β X2 These equations are called normal equations . Example: 1. Fit a straight line method of least squares to the following data. X 1 2 3 4 5 14 27 40 55 68 Y Estimate the values of best fit of Y when X=6 Sol: X Y XY X2 14 14 1 27 54 4 40 120 9 55 220 16 68 340 25 1 2 3 4 5 βX=15 βY=204 βXY=748 βX2=55 β Y = na + b β X ; 204 = 5a +15b β (I) (I) x 3 β (2) x 1 We get β βXY = aβX + b β X2 748 = 15a + 55 b β (2) 612 = 15a + 45b 748 = 15a + 55b -136 = -10b b= 13.6 substitute the value of b in (1) equation we get 612 = 15a + 45 (13.6) a=0 hence Y = 13.6 X