Download Math 10 - Elementary Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
2

The Chi-Square
Distribution
1
The student will be able to
 Perform
a Goodness of Fit
hypothesis test
 Perform a Test of Independence
hypothesis test
2
 Chi-square
is a distribution test
statistics used to determine 3
things



Does our data fit a certain
distribution? Goodness-of-fit
Are two factors independent? Test
of independence
Does our variance change? Test of
single variance
3
 Notation


new random variable
µ = df 2 = 2df
 Facts



about Chi-square
Nonsymmetrical and skewed right
value is always > zero
curve looks different for different
degrees of freedom. As df gets
larger curve approaches normal


 2 ~  2 df
df > 90
mean is located to the right of the
peak
4
 Hypothesis
test steps are the
same as always with the
following changes




Test is always a right-tailed test
Null and alternate hypothesis are in
words rather than equations
degrees of freedom = number of
intervals - 1
test statistic defined as
2
 
n
2
(O  E )
E
5
A 6-sided die is rolled 120 times.
The results are in the table below.
Conduct a hypothesis test to
determine if the die is fair.
Face Value
Frequency
1
15
2
29
3
16
4
15
5
30
6
15
6
 Contradictory


Ho: observed data fits a Uniform
distribution (die is fair)
Ha: observed data does not fit a
Uniform distribution (die is not fair)
 Determine


hypotheses
distribution
Chi-square goodness-of-fit
right-tailed test
 Perform
calculations to find
pvalue


enter observed into L1
enter expected into L2
7
 Perform

TI83




Access LIST, MATH, SUM
enter sum((L1 - L2)2/L2)
this is the test statistic
 For our problem chi-square = 13.6
Access DISTR and chicdf


syntax is (test stat, 199, df)
generate pvalue
 For our problem pvalue = 0.0184
 Make

calculations (cont.)
decision
since α > 0.0184, reject null
 Concluding

statement
There is sufficient evidence to
conclude that the observed data
does not fit a uniform distribution.
(The die is not fair.)
8
 Hypothesis
testing steps the
same with the following edit



Null and alternate in words
have a contingency table
expected values are calculated
from the table


(row total)(column total)
sample size
Test statistic same
2
(
O

E
)
2  
E
n


df = (#columns - 1)(#row - 1)
always right-tailed test
9
 Conduct
a hypothesis test to
determine whether there is a
relationship between an
employees performance in a
company’s training program and
his/her ultimate success on the
job. Use a level of significance
of 1%.


Ho: Performance in training and
success on job are independent
Ha: Performance in training and
success on job are not
independent (or dependent).
10
 Performance
on job versus
performance in training
Performance in training
Performance on Job
Below
Average Above
TOTAL
Average
Average
Poor
23
60
29
112
Average
28
79
60
167
Very
Good
9
49
63
121
60
188
152
400
TOTAL
11
 Determine


distribution
right tailed
chi-square
 Perform
calculations to find
pvalue

Calculator will calculated expected
values. We must enter contingency
table as a Matrix (ack!)


Access MATRIX and edit Matrix A
Access Chi-square test
 Matrix A = observed
 Matrix B calculator places expected
here
12
 Perform

pvalue = 0.0005
 Make


calculations (cont.)
decision.
 = 0.01 > pvalue = 0.0005
reject null hypothesis
 Concluding

statement.
Performance in training and job
success are dependent.
13
Linear Regression and Correlation
Chapter Objectives
14
The student should be able to:
 Discuss
basic ideas of linear
regression and correlation.
 Create and interpret a line of
best fit.
 Calculate and interpret the
correlation coefficient.
 Find outliers.
15
 Method
for finding the “best fit”
line through a scatterplot of
paired data

independent variable (x) versus
dependent variable (y)
 Recall

from Algebra
equation of line y = a + bx
where a is the y-intercept
 b is the slope of the line
if b>0, slope upward to right
if b<0, slope downward to right
if b=0, line is horizontal




16
 The


Draw what looks to you to be the best
straight line fit
Pick two points on the line and find
the equation of the line
 The


eye-ball method
calculated method
from calculus, we find the line that
minimizes the distance each point is
from the line that best fits the
scatterplot
letting the calculator do the work
using LinRegTTest
An example
17
Used to determine if the
regression line is a “good fit”
ρ is the population correlation
coefficient
 r is the sample correlation
coefficient
Formidable equation
 see text
 Calculator does the work




r positive - upward to right
r negative - downward to right
r zero - no correlation
Graphs
18
Determining if there is a “good fit”

Gut method


if calculated r is close to 1 or -1, there’s a good fit
Hypothesis test (LinRegTest)

Ho: ρ = 0



Ho means here IS NOT a significant linear
relationship(correlation) between x and y in the
population.
Ha means here IS A significant linear relationship
(correlation) between x and y in the population
To reject Ho means that there is a linear relationship
between x and y in the population.


Ha ρ ≠ 0
Does not mean that one CAUSES the other.
Comparison to critical value

Use table end of chapter



Determine degrees of freedom df = n - 2
If r < negative critical value, then r is significant and we
have a good fit
If r > positive critical value, then r is significant and we
have a good fit
19
 If
the line is determined to be a
good fit, the equation can be
used to predict y or x values
from x or y values


Plug the numbers into the equation
Equation is only valid for the paired
data DOMAIN
20
Compare 1.9s to |y - yhat|for each
(x, y) pair

if |y - yhat| > 1.9s, the point
could be an outlier
LinRegTest gives us s
 y – yhat is put into the RESID list
when the LinRegTest is done


To see the RESID list: go to STAT,
Edit, move cursor to a blank list
name and type RESID, the
residuals will show up.
21
F Distribution and ANOVA
22
The student should be able to:
 Interpret the F distribution as
the number of groups and the
sample size change.
 Discuss two uses for the F
distribution and ANOVA.
 Conduct and interpret ANOVA
23

What is it good for?


Basic assumptions






Each population from which a sample
is taken is assumed to be normal.
Each sample is randomly selected and
independent.
The populations are assumed to have
equal standard deviations (or
variances).
The factor is the categorical variable.
The response is the numerical
variable.
The Hypotheses



Determines the existence of
statistically significant differences
among several group means.
Ho: µ1=µ2=µ2=…=µk
Ha: At least two of the group means
are not equal
Always a right-tailed test
24
 Named
after Sir Ronald Fisher
 F statistic is a ratio (i.e.
fraction)


two sets of degrees of freedom
(numerator and denominator)
F ~ Fdf(num),df(denom)
 Two
estimates of variance are
made

Variation between samples



Estimate of σ2 that is the variance of
the sample means
Variation due to treatment (i.e.
explained variation)
Variation within samples


Estimate of σ2 that is the average of
the sample variances
Variations due to error (i.e.
unexplained variation)
25
Curve is skewed right.
 Different curve for each set of
degrees of freedom.
 As the dfs for numerator and
denominator get larger, the curve
approximates the normal
distribution

F statistic is greater than or equal
to zero
 Other uses



Comparing two variances
Two-Way Analysis of Variance
26

Formula

MSbetween – mean square explained
by the different groups
MSbetween
F
MS within
MSbetween 

SSbetween
df between
MSwithin – mean square that is due to
chance
MS within 
SS within
df within
SSbetween – sum of squares that
represents the variations among
different samples
 SSwithin – sum of squares that
represents the variation within
samples that is due to chance

27
 Enter
the table data by columns
into L1, L2, L3….
 Do ANOVA test – ANOVA(L1, L2,..)
 What the calculator gives



F – the F statistics
p – the pvalue
Factor – the between stuff




df = # groups – 1 = k – 1
SSbetween
MSbetween
Error – the within stuff



df = total number of samples – # of
groups = N – k
SSwithin
MSwithin
28
Four sororities took a random
sample of sisters regarding their
grade averages for the past term.
The results are shown below:
Sorority1 Sorority 2 Sorority 3 Sorority 4
2.17
2.63
2.63
3.79
1.85
1.77
3.78
3.45
2.83
3.25
4.00
3.08
1.69
1.86
2.55
2.26
3.33
2.21
2.45
3.18
Using a significance level of 1%, is
there a difference in grade
averages among the sororities?
29
 What’s

Chapter 1, Chapter 2., Chapter 3,
Chapter 4, Chapter 5, Chapter 6,
Chapter 7, Chapter 8, Chapter 9,
Chapter 10, Chapter 11, Chapter 12
 42

multiple choice questions
Do problems from each chapter
 What

fair game
to bring with you
Scantron (#2052), pencil, eraser,
calculator, 2 sheets of notes
(8.5x11 inches, both sides)
30
 Prepare
for the Final exam
 It has been a pleasure having you
in class. Good luck and
Godspeed with whatever path
you take in life.
31