Download Experimental Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
STAT 5372:
Experimental Statistics
Wayne Woodward
Office: 143 Heroy
Phone: (214)768-2457
e-mail: [email protected]
URL: faculty.smu.edu/waynew
Hours: 2:00 - 3:00 MWF
3:00 - 4:00 Th
- others by appointment
On a sheet of paper:
• Name
• Major (undergraduate/graduate)
• Previous stat courses:
– STAT 5371?
– STAT/CSE/EMIS 4340?
– other – describe briefly
• Have you used SAS?
2
Review
• Sampling Distributions
• Statistical Inference
– Confidence Intervals
– Hypothesis Tests
3
Sampling / Sampling
Distributions
• Population -- totality of all observations of interest
• Random Variable (rv) -- a characteristic that can
take on different values from object to object
• Sample -- subset of a population
– random sample: observations made independently
and at random
Y1, Y2, … , Yn – typical notation for a random sample

Parameter – a characteristic of a population
-- population mean (m), standard deviation (s), …
4
Random Variables
• Discrete – you can count the possible outcomes
– Discrete distributions:
 binomial, Poisson, …
• Continuous – possible values fall along a continuum
– Continuous distributions:
 normal (Gaussian), chi-square, t, F, …
5
Normal Curve:
-- symmetric, bell-shaped
-- for this particular distribution:
- data concentrated about 60
- very few data values above 100
or less than 20
6
Standard Normal
(Z-score)
Z
X m
s
 Has mean zero and standard deviation 1
 Graph of standard normal is symmetric about 0
 Normal table gives P[Z ≤ z]
7
8
9
Find: P[Z≤ 2.5]
P[Z > 1.6]
Suppose m = 50 and s = 10.
Find: P[X≤ 45]
P[X > 70]
10
Statistic - function of random variables
- typically used to estimate parameters
Examples of Statistics:
1 n
Y   Yi
n i 1
sample mean
n
S2 
2
(
Y

Y
)
 i
i 1
n 1
sample variance
11
Key Concept
Statistics are random variables and
have their own distributions
- called sampling distributions
12
Sampling Distribution of the
Sample Mean
IF:
• Data are Normally distributed
• Observations are independent
Then:
The Sample Mean has a
Normal Probability Distribution with
-- Mean  m
-- Standard Error  s/n
Z
Y m
s/ n
has a standard normal distribution
13
Suppose m = 50 and s = 10 for a normal population
and suppose further that a random sample of size
n = 25 is taken.
Find:
P[ X  45]
P[ X  47.5]
14
Central Limit Theorem
IF:
• Independent Observations
• Sample Size is Sufficiently Large
Then:
The Sample Mean (Y ) is
Approximately Normally Distributed with
-- Mean  m
-- Standard Error  s / n
Z
Y m
s/ n
has a an approximate
Standard Normal distribution
15
Suppose m = 50 and s = 10 for a non-normal
population and suppose further that a random
sample of size n = 50 is taken.
Find:
P[ X  52]
16
Distribution of Sample Mean
- s Unknown
IF:
•
•
Data Values are Normally Distributed
Observations are Independent
Then:
t
Y m
S/ n
has a Student’s t distribution
with n - 1 df
17
t-distribution -- Figure 5.16, page 229
18
19
ta Notation
a
t.05,20 
ta
t.01,15 
t.9,18 
za is obtained from bottom (inf.) row of t-table
z.05 
z.025 
20
(1-a)x100% Confidence Intervals for m
Setting:
• Data are Normally Distributed
• Observations are Independent
• We want an interval that probably
contains the population mean m
Case 1: s known
X  za / 2
s
n
 m  X  za / 2
s
n
Case 2: s unknown
X  ta / 2
s
s
 m  X  ta / 2
(df  n  1 )
n
n
21
CI Example
An insurance company is concerned about the number and magnitude of
hail damage claims it received this year. A random sample 20 of the
thousands of claims it received this year showed an average claim
amount of $6,500 and a standard deviation of $1,500. (You can assume
that claims have a normal distribution.
Find a a 95% confidence interval on the mean claim damage amount.
Suppose that company actuaries believe the company does not need
to increase insurance rates for hail damage if the mean claim damage
amount is no greater than $7,000. Use the above information to make
a recommendation regarding whether rates should be raised.
22
Last time we found 95% CI to be:
($5798, $7202)
What does this mean?
“There is a .95 probability that the population
mean (m) is between $5798 and $7202”?
Not exactly.
23
Interpretation of 95%
Confidence Interval
100 different 95% CI plotted
in the case for which true
mean is 80
i.e. about 95% of these
confidence intervals should
“cover” the true mean
24
Last time we found 95% CI to be:
($5798, $7202)
What does this mean?
“There is a .95 probability that the population
mean (m) is between $5798 and $7202”?
Not exactly.
A better statement;
“About 95% of confidence intervals obtained
in this manner will cover the true mean.”
We say:
“we are 95% confident that the mean
falls in the interval … ”
25
Concern has been mounting
that SAT scores are falling.
• 3 years ago -- National AVG = 955
• Random Sample of 200 graduating high school
students this year (sample average = 935)
(each year the standard deviation is about 100)
Question: Have SAT scores dropped ?
Procedure: Determine how “extreme” or “rare” our
sample AVG of 935 is if population AVG really is 955.
If Population average = 955, what is the probability of
getting a sample average (from a sample of size 200)
that is less than or equal to 935?
27
We must decide:
• The sample came from population with population
AVG = 955 and just by chance the sample AVG is
“small.”
OR
• We are not willing to believe that the pop. AVG
this year is really 955. (Conclude SAT scores
have fallen.)
Hypothesis Testing Terminology
Statistical Hypothesis
- statement about the parameters of
one or more populations
Null Hypothesis (H0)
- hypothesis to be “tested”
(standard, traditional, claimed, etc.)
- hypothesis of no change, effect, or
difference
(usually what the investigator wants to disprove)
Alternative Hypothesis (Ha)
- null is not correct
(usually what the hypothesis the
investigator suspects or wants to show)
29
Basic Hypothesis Testing Question:
Do the Data provide sufficient evidence to
refute the Null Hypothesis?
Test Statistic
- measures how far the observed statistic is
from the hypothesized parameter (under H0)
Example: H0: m = 50
X  50
Test statistic: t 
s/ n
30
Hypothesis Testing (cont.)
Critical Region (Rejection Region)
- region of test statistic that leads to
rejection of null (i.e. t > c, etc.)
Critical Value
- endpoint of critical region
Significance Level
- probability that the test statistic will
be in the critical region if null is true
- probability of rejecting H0 when it is true
31
Types of Hypotheses
One-Sided Tests
H 0 : m  m0
H 0 : m  m0
H a : m  m0
H a : m  m0
Two-sided Tests
H 0 : m  m0
H a : m  m0
32
Rejection Regions for One- and
Two-Sided Alternatives
H 0 : m  m0 vs. H a : m  m0
Reject H 0 if t  ta
H 0 : m  m0 vs. H a : m  m0
a
-ta
Critical Value
Reject H 0 if t  ta
H 0 : m  m0 vs. H a : m  m0
Reject H 0 if |t | ta / 2
33
A Standard
Hypothesis Test Write-up
1. State the null and alternative
2. Give significance level, test statistic,and the
rejection region
3. Show calculations
4. State the conclusion
- statistical decision
- give conclusion in language of the problem
34
Hypothesis Testing Example 1
A solar cell requires a special crystal. If properly manufactured,
the mean weight of these crystals is .4g. Suppose that 25
crystals are selected at random from a batch of crystals and it is
calculated that for these crystals, the average is .41g with a
standard deviation of .02g. At the a = .01 level of significance,
can we conclude that the batch is bad?
35
Hypothesis Testing Example 2
A box of detergent is designed to weigh on the average
3.25 lbs per box. A random sample of 18 boxes taken from
the production line on a single day has a sample average
of 3.238 lbs and a standard deviation of 0.037 lbs.
Test whether the boxes seem to be underfilled.
36
Errors in Hypothesis Testing
Actual Situation
Null is True
Do Not
Reject Ho
Conclusion
Reject Ho
Null is False
Correct
Decision
(1-a)
Type II
Error
Type I
Error
Correct
Decision
(a)
(Power)
Power
(b)
(1-b)
37
Note:
There are many ways that H0 can be false
Example:
H0: m  50
This null hypothesis is “false” if:
(a) m  51
(b) m  60
(c) m  80
 If
(c) is the actual situation, then the “power” of
the test will probably be large

In the case of (a), the “power” will likely be small
38
H 0 : m  m0 vs. H a : m  m0
Reject H 0 if t  ta
Note: “Large negative values” of t make us believe
alternative is true
p-Value
the probability of an observation as
extreme or more extreme than the
one observed when the null is true
Suppose t = - 2.39 is observed from data for test above
p-value
-2.39
(observed value of t)
39
Note:
-- if p-value is less than or equal to a, then
we reject null at the a significance level
-- the p-value is the smallest level of
significance at which the null hypothesis
would be rejected
40
Find the p-values for Examples 1 and 2
41