Download Lecture-5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Introduction to Probability and
Statistics I
Lecture 5
Chebyshev’s Theorem
and
Exercises
Review Example

Cruise agency – number of weekly specials to the
Caribbean: 20, 73, 75, 80, 82
Compute the mean, median and
mode and interpret your
results?
Review Example:
Summary Statistics
20, 73, 75, 80, 82
x 
xi 330

 66
n
5

Mean:

Median: middlemost observation = 75

Mode: no unique mode exists
The median best describes the data due to the
presence of the outlier of 20. This skews the
distribution to the left. The manager should first check
to see if the value ‘20’ is correct.
Review Example:
Summary Statistics
20, 73, 75, 80, 82
x 
xi 330

 66
n
5

Mean:

Median: middlemost observation = 75

Mode: no unique mode exists
The median best describes the data due to the
presence of the outlier of 20. This skews the
distribution to the left. The manager should first check
to see if the value ‘20’ is correct.
Review Example




common stocks
4
14.3 19
-14.7 -26.5
treasury bills
6.5
4.4 3.8 6.9 8
stocks
x


i
N

57.12
 8.16
7
Tbills
37.2 23.8
5.8
5.1
x


40.502

 5.786
N
7
i
The mean annual % return on stocks is higher than the
return for U.S. Treasury bills
Review Example




common stocks
4
14.3 19
-14.7 -26.5
treasury bills
6.5
4.4 3.8 6.9 8
 stocks   2 
 (x  )
37.2 23.8
5.8
5.1
2
i
N
(4.0  8.16)  (14.3  8.16)  (19  8.16)  ( 14.7  8.16)  ( 26.5  8.16)  (37.2  8.16)  (23.8  8.16)
2
2
2
2
2
2
2
= 20.648
7
 Tbills   2 
 (x  )
2
i
N
(6.5  5.8)  (4.4  5.8)  (3.8  5.8)  (6.9  5.8)  (8.0  5.8)  (5.8  5.8)  (5.1  5.8)
2
2
2
2
2
2
2
7
The variability of the U.S. Treasury bills is much smaller than the return on stocks.
=1.362
Review Example




common stocks
4
14.3 19
-14.7 -26.5
treasury bills
6.5
4.4 3.8 6.9 8
 stocks   2 
 (x  )
37.2 23.8
5.8
5.1
2
i
N
(4.0  8.16)  (14.3  8.16)  (19  8.16)  ( 14.7  8.16)  ( 26.5  8.16)  (37.2  8.16)  (23.8  8.16)
2
2
2
2
2
2
2
= 20.648
7
 Tbills   2 
 (x  )
2
i
N
(6.5  5.8)  (4.4  5.8)  (3.8  5.8)  (6.9  5.8)  (8.0  5.8)  (5.8  5.8)  (5.1  5.8)
2
2
2
2
2
2
2
7
The variability of the U.S. Treasury bills is much smaller than the return on stocks.
=1.362
Chebyshev’s Theorem

For any population with mean μ and
standard deviation σ , and k > 1 , the
percentage of observations that fall within
the interval
[μ + kσ]
Is at least
100[1 (1/k )]%
2
Chebyshev’s Theorem
(continued)

Regardless of how the data are distributed,
at least (1 - 1/k2) of the values will fall
within k standard deviations of the mean
(for k > 1)

Examples:
At least
within
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)
The Empirical Rule


If the data distribution is bell-shaped, then
the interval:
μ  1σ contains about 68% of the values in
the population or the sample
68%
μ
μ  1σ
The Empirical Rule


μ  2σ contains about 95% of the values in
the population or the sample
μ  3σ contains about 99.7% of the values
in the population or the sample
95%
99.7%
μ  2σ
μ  3σ
Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Can be used to compare two or more sets of
data measured in different units
 s
CV     100%
x 
Review Example



A random sample of data has Mean = 75, variance
= 25.
Use Chebychev’s theorem to determine the
percent of observations between 65 and 85.
If the data are mounded use the emprical rule to
find the approximate percent of observations
between 65 and 85.
Review Example



A random sample of data has Mean = 75, variance
= 25.
Use Chebychev’s theorem. +/- 2 standard
deviations:
proportion must be at least
100[1  (1/ k 2 )]% = 100[1  (1/ 22 )]% = at least 75%
The empirical rule. +/- 2 standard deviations:
Approximately 95% of the observations are within 2
standard deviations from the mean
Comparing Coefficient
of Variation

Stock A:
 Average price last year = $50
 Standard deviation = $5
s
$5
CVA    100% 
100%  10%
$50
x

Stock B:


Average price last year = $100
Standard deviation = $5
s
$5
CVB    100% 
100%  5%
$100
x
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
Weighted Mean

The weighted mean of a set of data is
n
w x
x
w
i1


i
i
w 1x1  w 2 x 2    w n x n

 wi
Where wi is the weight of the ith observation
Use when data is already grouped into n classes, with
wi values in the ith class
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK

For a population of N observations the mean is
K
μ

 fimi
K
where N   fi
i1
i1
N
For a sample of n observations, the mean is
K
x
fm
i1
i
i
K
where n   fi
i1
n
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK

For a population of N observations the variance is
K
σ2 

2
f
(m

μ)
i i
i1
N
For a sample of n observations, the variance is
K
s2 
2
f
(m

x
)
i i
i1
n 1
The Sample Covariance

The covariance measures the strength of the linear relationship
between two variables

The population covariance:
N
Cov (x , y)   xy 

 (x  
i
i1
x
)(yi   y )
N
The sample covariance:
n
Cov (x , y)  s xy 


 (x  x)(y  y)
i1
i
i
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Interpreting Covariance

Covariance between two variables:
Cov(x,y) > 0
x and y tend to move in the same direction
Cov(x,y) < 0
x and y tend to move in opposite directions
Cov(x,y) = 0
x and y are independent
Coefficient of Correlation

Measures the relative strength of the linear relationship
between two variables

Population correlation coefficient:
Cov (x , y)
ρ
σXσY

Sample correlation coefficient:
Cov (x , y)
r
sX sY
Features of
Correlation Coefficient, r

Unit free

Ranges between –1 and 1

The closer to –1, the stronger the negative linear
relationship

The closer to 1, the stronger the positive linear
relationship

The closer to 0, the weaker any positive linear
relationship
Scatter Plots of Data with Various
Correlation Coefficients
Y
Y
Y
X
X
r = -1
X
r = -.6
r=0
Y
Y
Y
r = +1
X
X
r = +.3
X
r=0
Interpreting the Result

Scatter Plot of Test Scores
r = .733
100


There is a relatively
strong positive linear
relationship between
test score #1
and test score #2
Test #2 Score
95
90
85
80
75
70
70
75
80
85
90
Test #1 Score
Students who scored high on the first test tended
to score high on second test
95
100
Obtaining Linear Relationships

An equation can be fit to show the best linear
relationship between two variables:
Y = β 0 + β 1X
Where Y is the dependent variable and X is the
independent variable
Least Squares Regression


Estimates for coefficients β0 and β1 are found to
minimize the sum of the squared residuals
The least-squares regression line, based on sample
data, is
yˆ  b0  b1 x

Where b1 is the slope of the line and b0 is the yintercept:
sy
Cov(x, y)
b1 
r
2
sx
sx
b0  y  b1x
Review Example


The following data give X, the price charged per
piece of plywood($) and Y, the quantitiy sold ( in
thousands)
(6,80) (7,60) (8,70) (9,40)(10,0)




Compute the covariance
Correlation coefficient
Compute and interpret regression coefficients.
What quantity of plywood is expected to be sold if
the price were $7 per piece?
Review Example

(6,80) (7,60) (8,70) (9,40)(10,0)
( xi  x )
= 8.00




( yi  y )
( xi  x ) 2
( yi  y ) 2
( xi  x )
( yi  y )
6
80
-2
4
30
900
-60
7
60
-1
1
10
100
-10
8
70
0
0
20
400
0
9
40
1
1
-10
100
-10
10
0
2
4
-50
2500
-100
40
250
0
10
0
4000
-180
= 50.00
= 2.5
=1000
= 1.5811
=31.623
Cov(x,y) = -45
Compute the covariance = -45
Correlation coefficient= -.900. The correlation coefficient indicates
the strength of the linear association between the two variables
Compute and interpret regression coefficients.
What quantity of plywood is expected to be sold if the price were $7
per piece?
Review Example

(6,80) (7,60) (8,70) (9,40)(10,0)

Compute and interpret regression coefficients.
b1 


Cov( x, y )  45

 18.0
sx2
2.5
For a one dollar increase in the price per piece of plywood, the
quantity sold of plywood is estimated to decrease by 18 thousand
pieces
b0  y  b1 x = 50.0 – (-18)(8.0) = 194.00
What quantity of plywood is expected to be sold if the price were $7
per piece?
yˆ  b0  b1 x  194.00  18.0(7)  68
Summary

Described measures of central tendency


Illustrated the shape of the distribution



Symmetric, skewed
Described measures of variation


Mean, median, mode
Range, interquartile range, variance and standard deviation,
coefficient of variation
Discussed measures of grouped data
Calculated measures of relationships between
variables

covariance and correlation coefficient