Download Lecture 7 - Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Survey

How would you judge the pace of the lectures?

Do you find the notes meaningful?

Can you offer any suggestions for improving the slide/lectures?

Have you found the labs meaningful for the intended objectives (e.g., graphical presentation and probability)?

Can you offer any suggestions for improvement for the labs?
Survey

If you could have me go over one topic again what would it be?

In the integers from 1 to 10, what is your favorite number?
Statistics 400 - Lecture 7
 Distribution of scores on a standardized test can be approximated
by a normal distribution with mean of 500 and standard deviation
of 100. Find probability that a randomly selected student scores:
 Over 650
 Between 325 and 675
 What proportion of students score better than 680?
Checking Normality
 Does normal distribution reasonably approximate distribution of
data
 Can use a normal probability plot (or normal scores plot) to assess
normality
 Plots sorted data versus percentiles of standard normal distribution
 If data is normally distributed, plot should display:
Example
 It is felt that the distribution of scores on a standardized test can be
approximated by a normal distribution
 To see if this is true, a random sample of 15 students’ scores is
taken
403 633 315 630 505
711 546 561 574 420
428 474 473 489 565
P
T
e
s
t
S
c
o
r
30 40 50 60 70
No r ma l
-1
0
1
Qu a n t i l e s
o
Sampling Distributions
 A parameter is a numerical feature of a distribution or population
 Statistic is a function of sample data
 Suppose you draw a sample and compute the value of a statistic
 Suppose you draw another sample of the same size and compute
the value of the statistic
 Would the 2 statistics be equal?
 Use statistics to estimate parameters
 Will the statistics be exactly equal to the parameter?
 Observed value of the statistics depends on the sample
 There will be variability in the values of the statistic over repeated
sampling
 Probability distribution of a statistic is called the sampling
distribution (or distribution of the statistic)
 Based on repeated random samples of the same size from the
population
 In a random sample, the observations are independent and
identically distributed
Example
 Large population is described by the probability distribution
X
P(X=x)
0
0.2
3
0.3
12 0.5
 If a sample of size 2 is computed, what is the sampling distribution
for the sample mean?
Sampling Distribution of the
Sample Mean
 Have a random sample of size n
 The sample mean is
n
 xi
x  i1
n
 What is it estimating?
Properties of the Sample Mean
 Expected value:
 Variance:
 Standard Deviation:
Sampling from a Normal Distribution
 Suppose have a sample of size n from a
N ( , ) distribution
 What is distribution of the sample mean?
Example
 Distribution of moisture content per pound of a dehydrated protein
concentrate is normally distributed with mean 3.5 and standard
deviation of 0.6.
 Random sample of 36 specimens of this concentrate is taken
 Distribution of sample mean?
 What is probability that the sample mean is less than 3.5?
Central Limit Theorem
 In a random sample (iid sample) from any population with mean 
and standard deviation  when n is large, the distribution of the
sample mean
is approximately normal.
x
 That is,
 Thus,
x
Z
/ n
Implications
 So, for random samples, if have enough data, sample mean is
approximately normally distributed...even if data not normally
distributed
 If have enough data, can use the normal distribution to make
probability statements about x
Example
 A busy intersection has an average of 2.2 accidents per week with a
standard deviation of 1.4 accidents
 Suppose you monitor this intersection of a given year, recording the
number of accidents per week.
 Data takes on integers (0,1,2,...) thus distribution of number of
accidents not normal.
 What is the distribution of the mean number of accidents per week
based on a sample of 52 weeks of data
Example
 What is the approximate probability that
x
is less than 2
 What is the approximate probability that there are less than 100
accidents in a given year?