Download Continuous distribution : normal , exponential , uniform . Correlation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Continuous distribution : normal , exponential , uniform . Correlation and
regression . Curve fitting
DISCUSS ABOUT CONTINUOUS DISTRIBUTIONS:
(a) UNIFORM:
A random variable X is uniformly distributed on the interval ( a, b) if its pdf is
given by,
1
a ≀ x ≀ b
f(x) = {bβˆ’a
0
Its cdf is,
0
𝑒𝑙𝑠𝑒
π‘₯<π‘Ž
a≀x≀b
(xβˆ’a)
f(x) = {(bβˆ’a)
1
E(X) = (a+b)/2
V(X)=(b-a)2/12
f(x)
{0
π‘₯>𝑏
f(x)
1/(b-a)
1
x
0
a
b
x
0
a
b
Example:
If a wheel is spun and then allowed to come to rest, the point on the circumference
of the wheel that is located opposite a certain fixed marker could be considered the value of a
random variable X that is uniformly distributed over the circumference of the wheel. One could
then compute the probability that X will fall in any given arc.
If we assume that it is uniform in the interval[3,6], we can obtain,
Average point of outcome, E[X]= [a+b]/12 = [3+6]/12=9/12=3/4.
Variance
var[X]= [b-a]2/12= [6-3]2/12=6/12=1/2.
2. EXPONENTIAL:
A Random variable X is said to be exponentially distributed if its pdf is
given by,
f(x) = { Ξ» e-Ξ»x
xβ‰₯0
{0
otherwise
Where
Ξ» – parameter.
f(x) = { 0
x<0
-Ξ»x
{ 1- Ξ» e
xβ‰₯0
E(X) = 1/Ξ».
V(X) = 1/Ξ»2.
Exponential distribution is useful in representing lifetime of items, model interarrival times
when arrivals are completely random and service times which are highly variable.
Exponential distribution has a property called memory less property given by,
P(X > s + t / X > s) = p(X > t)
This is why we are able to use exponential to model lifetimes.
Example:
Let us assume that a company is manufacturing burettes whose lifetime is assumed to be
exponential with average life, 950 days. What is the probability that it is in working condition
for up to 1000 days.
Solution:
It is given that , X= Lifetime of the burette , is exponential with
average life 950 days i.e Ξ»=950.
1000
P[life time is up to 1000 days] = P[0<X<1000] = ∫0
Ξ» eβˆ’Ξ»x
1000
=∫0
950 eβˆ’950x
= 950 [eβˆ’950x /βˆ’950]1000
.
0
3. NORMAL:
A normal variable X with mean µ( -∞ < µ < ∞ ) and variance Οƒ2 > 0 has a normal
distribution if its pdf is,
f(x) = ( 1/√2Ο€ ) exp [ -1/2 ( x-µ/Οƒ)2 ] -∞ < x < ∞
A normal distribution is used when we are having a sum of many random
variables. A normal random variable with µ = 0 and Οƒ = 1 is called a standard normal r.v. Its
curve is symmetrically distributed about the average
µ = 0.
Example:
Let us assume that heights of students in II M.Pharm is normally distributed with an
average of 165 cm and a standard deviation of 10 cms. What is the probability that a
student’s height is less than 175 cms.
Solution:
Let, X= Height of students in II M.Pharm.
It is normal with, mean µ= 165; standard deviation Οƒ =10.
P[ a student’s height is less than 175 cms]=P[-∞<X<175]
First, we should convert X into Z by
Z= x- µ/ Οƒ.
We have x=175, µ= 165; Οƒ =10.
Z= 175- 165/ 10 =1.
So when X=175; Z=1 and so
P[-∞<X<175] = P[-∞<Z<1]= P[-∞<Z<0]+ P[0<Z<1].
=0.5+0.34 = 0.84.
Note:
1. The same question may have the following variations:
P[ a student’s height is more than 175 cms]=P[175<X<-∞]
= P[0<X<-∞]- P[0<X<175] =0.5- table value
P[ a student’s height is between 165 and 175 cms]=P[165 < X <175]
=P[0 < X <175]- P[ 0< X <165]=table value for 175 – table value
for 165
CORRELATION
Correlation is measure of check whether two variables are related or not.
We can start that by simply plotting their related values using Scatter diagram.
Plot the pair of values and we’ll obtain a diagram and based on the level and pattern of
the scatter of the points, we can understand the amount of correlation between the two
variables.
Y
Y
x
x
x
x
x
x
x
x
x
x
X
X
Positive correlation
[They are around this line]
Negative Correlation
[They are around this line]
Y
Y
x
x
x
x
x
x
x
x
x
x
X
Positive Perfect correlation
[They are on this line]
X
Negative Perfect Correlation
[They are on this line]
Y
x
x
x
x
x
X
No correlation
[ As they are not around any line]
The Karl Pearson correlation coefficient (typically denoted by r) is a measure of the
Correlation (linear dependence) between two variables X and Y, giving a value between
+1 and βˆ’1 inclusive. It is widely used in the sciences as a measure of the strength of
linear dependence between two variables. It was first introduced by Francis Galton in the
1880s, and is named after Karl Pearson. The correlation coefficient is sometimes called
"Pearson's r", given by the formula
π‘βˆ‘π‘‹π‘Œ βˆ’ βˆ‘π‘‹βˆ‘π‘Œ
π‘Ÿ=
βˆšπ‘βˆ‘π‘‹ 2 βˆ’ [βˆ‘π‘‹]2 βˆšβˆšπ‘βˆ‘π‘Œ 2 βˆ’ [βˆ‘π‘Œ]2
Obtain the correlation coefficient to the following data.
X
5
7
9
10
3
7
7
9
10
12
6
8
Y
Sol:
X
Y
XY
X2
Y2
5
7
35
25
49
7
9
49
49
81
9
10
90
81
100
10
12
120
100
144
3
6
7
8
βˆ‘=41
βˆ‘=5
2
18
9
36
56
49
64
βˆ‘=368
βˆ‘=31
3
βˆ‘=474
N=6.
Therefore.
π‘βˆ‘π‘‹π‘Œ βˆ’ βˆ‘π‘‹βˆ‘π‘Œ
π‘Ÿ=
βˆšπ‘βˆ‘π‘‹ 2 βˆ’ [βˆ‘π‘‹]2 βˆšβˆšπ‘βˆ‘π‘Œ 2 βˆ’ [βˆ‘π‘Œ]2
6[368] βˆ’ [41][52]
=
√6[313] βˆ’ [41]2 √√6[474] βˆ’ [52]2
= 0.458
SPEARMAN’S RANK CORRELATION
Spearman’s Rank correlation is the study of relationships between different rankings
on the same set of items. A rank correlation coefficient measures the correspondence
between two rankings and assesses its significance, given by the formula
𝑅 =1βˆ’
6βˆ‘π‘‘ 2
𝑁[𝑁 2 βˆ’ 1]
Example:
Calculate Spearman’s Rank correlation for the data
X: 10 8 1 2 6 9 3 5 4 7
Y: 6 10 5 4 3 1 2 9 8 7
X
10
8
1
2
6
9
3
5
4
7
Y
6
10
5
4
3
1
2
9
8
7
.d=X-Y
4
-2
-4
-2
3
8
1
-4
-4
0
𝑅 =1βˆ’
.d2
16
4
16
4
9
64
1
16
16
0
146
6βˆ‘π‘‘ 2
𝑁[𝑁 2 βˆ’ 1]
6[146]
10[10 βˆ’ 1]
=0.115
=1βˆ’
REGRESSION
Regression is the procedure to obtain the type of relation existing between the
variables under discussion.
The term linear model is used in different ways according to the context. The most
common occurrence is in connection with regression models and the term is often taken
as synonymous with Linear regression model. The designation "linear" is used to identify
a subclass of models for which substantial reduction in the complexity of the related
Statistical theory is possible.
Let us consider two variables, X and Y. Since we are theoretically considering
their relation, keeping each as an independent variable we β€˜ll derive an equation.
Regression line of X on Y[X depending on Y]
X-𝑋̅ =bxy [Y-π‘ŒΜ…]
Where,
𝑋̅ - mean of X
π‘ŒΜ… - mean of Y
βˆ‘π‘₯𝑦
bxy – regression coefficient of X on Y = βˆ‘π‘¦ 2
π‘₯= X-𝑋̅
𝑦 =Y-π‘ŒΜ…
Regression line of Y on X[Y depending on X]
Y-π‘ŒΜ…=byx [ X-𝑋̅]
Where,
𝑋̅ - mean of X
π‘ŒΜ… - mean of Y
βˆ‘π‘₯𝑦
byx – regression coefficient of X on Y = βˆ‘π‘₯ 2
π‘₯= X-𝑋̅
𝑦 =Y-π‘ŒΜ…
Note:
1. The regression coefficients bxy and byx are of the same sign.
2. The correlation coefficient and the regression coefficients are
connected by
.r= √[bxy byx]
Example: Calculate the regression lines for the following data.
X:6
Y:9
2
11
10
5
4
8
8
7
Solution:
X
Y
π‘₯= X-𝑋̅
𝑦 =Y-π‘ŒΜ…
6
2
10
4
8
βˆ‘=30
9
11
5
8
7
βˆ‘=40
0
-4
4
-2
2
βˆ‘=0
1
3
-3
0
-1
βˆ‘=0
π‘₯2
0
16
16
4
4
βˆ‘=40
𝑦2
1
9
9
0
1
βˆ‘=20
π‘₯𝑦
0
-12
-12
0
-2
βˆ‘=-26
𝑋̅ =
βˆ‘π‘‹
30
βˆ‘π‘Œ 40
=
= 6 ; π‘ŒΜ… =
=
=8
𝑁
5
𝑁
5
Regression coefficients
βˆ‘π‘₯𝑦
βˆ’26
βˆ‘π‘₯𝑦
20
βˆ’26
bxy =βˆ‘π‘¦ 2 =
byx = βˆ‘π‘₯ 2 =
40
= βˆ’1.3
= βˆ’0.65
Regression line of X on Y[X depending on Y]
X-6 =-1.3 [Y-8]
X =-1.3Y+1.64
Regression line of Y on X[Y depending on X]
Y-8=-0.65 [ X-6]
Y= -0.65X+11.9
CURVE FITTING
Different types of equations or curves can be obtained from a given data. But
the problem is to find the equation of the curve of ' Best Fit' which is most suitable for
predicting the unknown values. This process of finding an equation of best fit is known as
Curve fitting.
For fitting the curve we use the principle of least squares. The form of the curve
To fit a statistical data should be known to apply the principle of least squares. The
principle of least squares will enable us to determine the parameters involved in the
Relationship connecting the variables.
Using this Principle , we shall fit the following curves.
i.
ii.
iii.
iv.
v.
(i)
A Straight line Y = a X + b
A Second degree parabola Y = a X2 + b X + c
The exponential curve Y = a ebX
The curve Y = a Xb
The curve Y = a bX
Fitting a straight line:
Suppose (x1 , y1) , (x2 , y2) ,… (xn , yn) be n pairs of values and we have to
determine the line of best fit for this data. Let us assume that
Y = aX + b (or) Y = a + bX as a line of Best fit. Using the principle of least
Squares , we can determine the parameters 'a' and 'b'.if the curve is Y = a + bX
It can be shown that a and b are determined by the equation
βˆ‘ Y = na + b βˆ‘ X
βˆ‘XY = aβˆ‘X + b βˆ‘ X2
These equations are called normal equations .
Example:
1. Fit a straight line method of least squares to the following data.
X
1
2
3
4
5
14
27
40
55
68
Y
Estimate the values of best fit of Y when X=6
Sol:
X
Y
XY
X2
14
14
1
27
54
4
40
120
9
55
220
16
68
340
25
1
2
3
4
5
βˆ‘X=15
βˆ‘Y=204 βˆ‘XY=748 βˆ‘X2=55
βˆ‘ Y = na + b βˆ‘ X ;
204 = 5a +15b β†’ (I)
(I) x 3 – (2) x 1
We get
↔
βˆ‘XY = aβˆ‘X + b βˆ‘ X2
748 = 15a + 55 b β†’ (2)
612 = 15a + 45b
748 = 15a + 55b
-136 = -10b
b= 13.6
substitute the value of b in (1) equation we get
612 = 15a + 45 (13.6)
a=0
hence
Y = 13.6 X