Download Review of Core Statistical Concepts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Review of Basic Statistical
Concepts
Statistical Literacy means knowing…
• …how to read rates and percentages:
– e.g., percentage of MPPAL students who are full-time
employees versus the percentage of full-time
employees who are MPPAL students
• …how to interpret different definitions of a group
– e.g., which rate is bigger? Child birth rate among
women or child birth rate among women ages 20-44?
• …the difference between (1) deterministic causes
and (2) probabilistic causes:
– e.g., (1) gravity causes the pen to fall and (2) drunk
driving causes automobile accidents
Probability
• The likelihood or chance an event will occur
• Expressed as a number between 0 (impossibility)
and 1 (certainty) or as a percentage between 0
(impossibility) and 100% (certainty)
• Objectivist Frame: when repeating an
experiment, how often does the event occur?
• Subjectivist Frame: what is the degree of belief in
the likelihood of an event occurring (e.g.,
Bayesian probability)
Normal distribution
Population Mean: μ and Standard Deviation: σ
Describing the Normal Distribution
• If the mean μ = 0 and σ2 = 1 (so σ = 1) and μ is normally distributed
then then 95.4% of the values will fall between: μ ± 2*σ = μ ± 2
•
•
•
•
95% will fall between a slightly smaller interval: μ ± 1.96
90% will fall between μ ± 1.645
99% will fall between μ ± 2.576
99.7% will fall between μ ± 3
• What percentage of all conceivable means will lie between
-1.96 and +1.96?  95%
• 95% “Confidence Interval” is the interval one has the confidence will
contain the population mean (μ) 95% of the time
95% Confidence Interval and Confidence Level
• What is the probability that we will observe a
mean value for μ that lies outside of our 95%
Confidence Interval? 5% or 0.05
• Confidence Level is noted as α = 0.05 for a 95%
Confidence Interval
• For α = 0.05, σ = 1 the Confidence Level = 1.96
• When μ = 0; 95% of the population means will fall
within (-1.96 and +1.96)
• For μ = 6.5 and Confidence Level (95%) = 0.47,
what is the 95% Confidence Interval?
The Challenge
• We can never know if we are observing the “true”
population mean since any observed population mean will
deviate plus or minus σ (= a “standard deviation”)
• Any census of a graduating class will only be a sample of
the “true” population of all graduating classes
• Uncertainty Source #1: If we are seeking to explain the key
factors determining the CGPA of graduates, we have to
account for the fact that the observed CGPA might deviate
from the true mean by σ
• Uncertainty Source #2: If it is infeasible to conduct a census
of all graduating students in even one year, and all we can
do is sample the sample, then we have additional
uncertainty related to the size of the sample
Uncertainty from Sampling
• We know there is one inescapable source of
uncertainty (Uncertainty Source #1)
• The sampling error (Uncertainty Source #2)
complicates this uncertainty … but in a
predictable way.
• We know as the sample size increases the
standard deviation in the observed mean values
(the “standard error of the mean” denoted “s”)
will approach the true standard deviation (σ) of
the theoretical population:
s = σ/n1/2
Margin of Error
• Whenever a sample is less than the census, there is a
chance that s ≠ σ and the sample mean (avgX) ≠ μ
• If the sample observations X follow a “Student t”
distribution, the logic of the Confidence Interval
follows but some of the vocabulary changes
• We are interested in the probability that avgX from our
sample will equal μ
• The probability that avgX ≠ μ is the Margin of Error
• If the Margin of Error = 5%, then the sample interval
about the mean would include μ in 95% of similar
samples
Confidence Interval, Margin of Error
and Sample Size
• For a Margin of Error = 5%, we obtain a 95%
Confidence Interval around the sample mean
• For an approximate normal distribution, the
standard error is = s = σ/n1/2
• Then the 95% Confidence Interval = avgX ± 1.96*s
or the interval (-1.96*s and + 1.96*s) is 2*1.96*s
= 3.92*s units wide (= 3.92*σ/n1/2 )
• If σ is known and the Confidence Interval
specified, one could precisely calculate the sample
size needed from the above information
Sample Size when σ is unknown
When the true standard deviation of the
population is unknown, an indirect method of
determining the sample size yields:
• For a 95% confidence interval and Margin of
Error of …
– … 5%, a sample of 400 (n = 400) is needed
– … 10%, n = 100
– … 3%, n = 1000
– … 1%, n = 10,000