Download Review of Chapters 1 - 7 - UF-Stat

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Review of Chapters 1 - 7
Chapter 1 Introduction
 What? Who? Why?
 Some Jargon
o Subject: A unit (a person, animal, plant, etc.) on
which we make observations or measurements.)
o Population: Set of all subjects of interest
o Sample: Part of the whole.
o Random Sample: A sample of population units
selected according to some rule of probability
o Simple random sample: A sample selected in
such a way that every unit in the population has
an equal chance
o Random variable: A measurement or
observation on units in a random sample or in a
population
o Parameter: A number that summarizes the
observations on a population. Parameters belong
to a population. A parameter can be calculated
only if we have population data.
o Statistic (A number that summarizes the
observations in a sample. A statistic belongs to a
sample; it is calculated using sample data.)
Chapters 1 – 7 Summary Fall 2007
Page 1 of 14
Chapter 2 Summarizing Data
 Identifying Types of Data
o Categorical (Ordinal, Nominal)
o Quantitative (Discrete, Continuous)
 Graphical Summaries
o Bar Graph & Pie Chart [Categorical Data]
o Dot plot, Histogram, Box-plot or Stem-andleaf display (Stem-plot) [Quantitative data]
o Some Common Shapes
 Mound-shaped (Bell-shaped)
 Left Skewed
 Right Skewed
o Checking for normality (very important for
small samples)
 Numerical summaries
o Sample Mean = X
o Sample Variance = S2 = (S)2 > 0
o Sample Standard Deviation= S   S 2 >0
o Sample proportion = p = X / n
 Learn to find the sample mean and sample
standard deviation using your calculators.
Chapter 3 Relation between Two Variables
We will go over this chapter in detail later)
Chapters 1 – 7 Summary Fall 2007
Page 2 of 14
Chapter 4: Gathering Data
Through randomization (ALWAYS)

Experimental vs. Observational Studies

Simple Random Sampling
o Sample, Random sample, Simple Random
Sample (SRS)
o Sampling error (ME = Margin of Error)
o Non-sampling errors (Sources of bias)

Statistically significant difference

Experimental Design
o Technical terms
o Some types of experiments

Some types of Observational Studies
o Cross-sectional studies
o Case-control studies
o Prospective studies
Chapters 1 – 7 Summary Fall 2007
Page 3 of 14
Chapter 5: Probability

Statistical Experiments
o 2 or more outcomes
o Uncertainty
 Sample space and events
o Sample space = S = {all possible outcomes
of a statistical experiment}
o An Event = Any subset of the sample space
(that may contain one or more or all of none
of the elements of s). Capital letters at the
beginning of the alphabet are used to denote
events.
o Impossible event =  = { }
o Definite event = Sample space = S

Probability of an event (A)
f
o
P( A )  if the outcomes are equally likely.
n
 f 
P( A )  lim   = Long-run relative frequency
o
n  n
 
Chapters 1 – 7 Summary Fall 2007
Page 4 of 14

Basic Rules of Probability
o General Rule:
For any event A, 0 ≤ P(A) ≤ 1
o Complementary rule: P(Ac) = 1 – P(A)
o
Conditional Probability: The probability of
observing an event given that (or conditional
on) another event has occurred
P( A and B ) P( A  B )
P( A| B ) 

P( B )
P( B )
P( A and B ) P( A  B )
P( B | A ) 

P( A )
P( A )
o
Multiplication Rule of Probability
By cross-multiplication of the above
definition of conditional probability we can
easily show the following equalities:
P( A and B )  P( A )  P( B | A )
 P( B )  P( A| B )
Special case of the multiplication rule:
IF A and B are INDEPENDENT,
Then P( A and B )  P( A )  P( B )
[See definition of independence below.]
o
Addition Rule of Probability
P( A or B )  P( A )  P( B )  P( A and B )
Chapters 1 – 7 Summary Fall 2007
Page 5 of 14

Independence of events:
o Four equivalent statements:
 A and B are independent events
 P(A and B) = P(A) × P(B)
 P(A | B) = P(A)
 P(B | A) = P(B)
o The above statements are true ONLY when
the events A and B are independent.
o If one of the above is true, then all are true
o If one of the above is false, then all are false.
Chapters 1 – 7 Summary Fall 2007
Page 6 of 14
Chapter 6: Probability Distributions





Random Variables (rv)
Assume we know the values of the
Population parameters (e.g.,  and )
Distribution of a discrete rv
 Discrete uniform distribution
 Binomial Distribution
Distribution of a continuous rv
 Uniform Distribution
 The normal distribution N(, )
 The t-distribution
 More to come
Finding Probabilities given value of rv
o Always sketch the problem.
o Using Standard Normal Distribution
For example, find P(Z < 1.23).
o Using the t-distribution,
For example, find P(T > 1.23)
 Finding value of the rv given probability
o The opposite of the above processes,
e.g., find the constants c and d such
that P(Z > c) = 0.05, P(T < d) = 0.01
Chapters 1 – 7 Summary Fall 2007
Page 7 of 14
Chapter 7
Statistical Inference: Confidence Intervals
The following are some important concepts you
should have learned in Chapter 7 (some were also
used in earlier chapters):
 A Parameter
o It is a numerical summary of the population.
o We calculate parameters using population data.
However, we usually (almost always) do not
have population data. Thus, the values of
population parameters are almost always
unknown.
o So we estimate the population parameters using
data from a random sample.
 A Statistic
o It is a numerical summary of a sample.
o We calculate the values of sample statistics using
data from random samples.
o We use sample statistics to make statistical
inferences about the unknown population
parameters.
Interpret the following:
“Statistics are everywhere, statistics is nowhere.”
Richard L Sheaffer
Chapters 1 – 7 Summary Fall 2007
Page 8 of 14
 Statistical Inference is the process of making a
statement about one or more population parameter
using one or more sample statistic, obtained from a
random and representative sample
 Types of Statistical Inference:
o Point Estimation (Gives just one number as an
estimate of the parameter)
o Interval Estimation (or Confidence Interval,
gives an interval of the number line as possible
values of the parameter with some fixed
confidence.)
o Significance Tests (or Test of hypotheses is a
process that yields a decision on whether a claim
about the value of the parameter is supported by
data observed from a random sample.)
Chapters 1 – 7 Summary Fall 2007
Page 9 of 14
 Some Point Estimators of parameters:
Population
Parameters
(Unknown)
Sample Statistics
(Point Estimators)
N
Mean
X 
X
i 1
i
X
N
N
Standard
Deviation  X 
n
( X
i 1
2


)
i
X
i 1
( X
i 1
i
n
n
SX 
N
X
i
 X )2
n 1
N
Proportion
p
X
i 1
N
i
,
1 if i outcome is a "Success"
Xi  
th
0 if i outcome is a "Failure"
th
X
,
n
Where X = Number
of “Success”s in the
sample.
p̂ 
o The Sample Mean is an unbiased point
estimator of the population mean.
o The Sample Standard Deviation is a point
estimator of the population standard deviation.
o The Sample Proportion is an unbiased point
estimator of the population proportion.
Chapters 1 – 7 Summary Fall 2007
Page 10 of 14
 Properties of Estimators:
o Unbiasedness: An estimator is said to be
unbiased if the sampling distribution of the
estimator is centered at the parameter.
 The sample mean is an unbiased estimator
of the population mean, , since
 X  E( X )   X  Population Mean
 X = The average of the population of
ALL sample means
 The sample proportion is an unbiased
estimator of the population proportion, p
because
 p̂  E( ˆp )  p  Population Proportion
 p̂ = The average of the population of
ALL sample proportions
 The sample standard deviation, S, is not
unbiased; it has a small bias that decreases
as n (sample size) increases.
o Small Standard Error: The standard error of a
statistic (an estimator) is the standard deviation
of the sampling distribution of the statistic It
describes the variability in the possible values of
the statistic. A good estimator has a small
standard error. The estimators mentioned
above all have small standard errors.
Chapters 1 – 7 Summary Fall 2007
Page 11 of 14
o Standard Errors of Estimators:
* SE( X )   X / n
Estimated by : Est. SE( X )  S / n
* SE( ˆp ) 
p( 1  p )
n
Estimated by : Est. SE( ˆp ) 
ˆp( 1  ˆp )
n
 Interval Estimation:
o General Form:  Estimator  ME 
o ME = Margin of Error: Measures how
accurate the estimate is likely to be in estimating
the parameter.
o ME = (Table value )  SE( Estimator )
o CI for :
S 

*
X

ME

X

t


 

n

o CI for p:

ˆp( 1  ˆp ) 
*
 ˆp  ME    ˆp  z 

n


o Make sure you remember how to use tables of
standard normal distribution and t-distribution.
o More to come later
Chapters 1 – 7 Summary Fall 2007
Page 12 of 14
 Significance Tests (or Tests of hypotheses)
o A hypothesis is a statement (a sentence) about
one or more population parameters.
o Null hypothesis (Ho): is the statement that
shows the status quo (current knowledge or
belief). Equality is always with Ho.
o Alternative hypothesis (Ha): is a statement of
change (a claim).
o Examples:
 Ho:  = 0 vs. Ha:  < 0; or
 Ho:  = 0 vs. Ha:  > 0; [1-sided Ha] or
 Ho:  = 0 vs. Ha:   0; [2–sided Ha]





Ho: p = p0 vs. Ha: p < p0; or
Ho: p = p0 vs. Ha: p > p0; [1–sided Ha] or
Ho: p = p0 vs. Ha: p  p0; [2–sided Ha]
More in Chapters 8 – 14.
Chapters 1 – 7 Summary Fall 2007
Page 13 of 14
 Determining sample size (n):
2
z




o For estimating : n  

 m 
o For estimating p:
 Note that   p( 1  p ) .
 This () is maximum when p = ½,
i.e., 2 ≤ ¼ and hence  = ≤ ½.
 Thus the formula for determining the sample
z2
z2
size is n  p( 1  p )  2 ≤ ¼× 2 for any p.
m
m
o If n is not an integer, always round it up to the
next integer
Chapters 1 – 7 Summary Fall 2007
Page 14 of 14