Download Sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Stat 31, Section 1, Last Time
•
•
Sampling
Experiments
–
–
–
–
•
Design
Controls
Randomization
Blind & Double Blind
Pepsi Challenge
Midterm I
Coming up: Tuesday, Feb. 15
Material:
HW Assignments 1 – 4
Extra Office Hours:
Mon. Feb. 14, 8:30 – 12:00, 2:00 – 3:30
(Instead of Review Session)
Bring Along:
1 8.5” x 11” sheet of paper with formulas
Sec. 3.4: Basics of “Inference”
Idea: Build foundation for statistical inference, i.e.
quantitative analysis
(of uncertainty and variability)
Fundamental Concepts:
Population described by parameters
e.g. mean  , SD  .
Unknown, but can get information from…
Fundamental Concepts
Last page: Population, here:
Sample (usually random), described by
corresponding “statistics”
e.g. mean x , SD s .
(Will become important to keep these apart)
Population vs. Sample
E.g. 1: Political Polls
• Population is “all voters”
• Parameter of interest is:
p = % in population for A
(bigger than 50% or not?)
• Sample is “voters asked by pollsters”
• Statistic is p̂ = % in sample for A
(careful to keep these straight!)
Population vs. Sample
E.g. 1: Political Polls
• Notes
– p̂ is an “estimate” of p
–
–
–
–
Variability is critical
Will construct models of variability
Possible when sample is random
Recall random sampling also reduces bias
Population vs. Sample
E.g. 2:
•
•
Measurement Error
(seemingly quite different…)
Population is “all possible measurem’ts”
(a thought experiment only)
Parameters of interest are:
 = population mean
 = population SD
Population vs. Sample
E.g. 2:
Measurement Error
•
Sample is “measurem’ts actually made”
•
Statistics are:
x
= mean of measurements
s
= SD of measurements
Population vs. Sample
E.g. 2:
•
Measurement Error
Notes:

–
x
estimates
–
s
estimates 
–
Again will model variability
–
“Randomness” is just a model for measurement
error
Population vs. Sample
HW:
3.59
3.61
Basic Mathematical Model
Sampling Distribution
Idea: Model for “possible values” of statistic
E.g. 1:
Distribution of
p̂ in “repeated
samplings (thought experiment only)
E.g. 2:
Distribution of
x
in “repeated
samplings (again thought experiment)
Basic Mathematical Model
Sampling Distribution Tools
Can study these with:
•
Histograms  “shape”: often Normal
•
Mean  Gives measure of “bias”
•
SD  Gives measure of “variation”
Bias and
Variation
Graphical
Illustration
Scanned
from text:
Fig. 3.9
Bias and Variation
Class Example:
Results from previous class
on “Estimate % of males at UNC”
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg16.xls
Recall several approaches to estimation
(3 bad, on sensible)
E.g. % Males at UNC
At top:
–
Counts
–
Corresponding proportions (on [0,1] scale)
–
Bin Grid (for histograms on [0,1] numbers)
Next Part:
–
Summarize mean of each
–
Summarize SD (spread) of each
Histograms (appear next)
E.g. % Males at UNC
Recall 4 way to collect data:
Q1: Sample from class
Q2: Stand at door and tally
–
Q1 “less spread and to left”?
Q3: Make up names in head
–
Q3 “more to right”?
Q4: Random Sample
–
Supposed to be best, can we see it?
E.g. % Males at UNC
Better comparison: Q4 vs. each other one
Use “interleaved histograms”
Q1 & Q4:
–
Q1 has smaller center: x1  0.39  0.42  x4
–
i.e. “biased”, since Class
–
And less spread:
–
since “drawn from smaller pool”

Population
s1  0.086  0.109  s4
E.g. % Males at UNC
Q2 & Q4:
–
Centers have Q2 bigger:
x2  0.47  0.42  x4
–
Reflects bias in door choice
–
And Q2 is “more spread” :
s2  0.139  0.109  s4
–
Reflects “spread in doors chosen” +
“sampling spread”
E.g. % Males at UNC
Q3 & Q4:
–
Center for Q3 is bigger:
x3  0.48  0.42  x4
–
Reflects “more people think of males”?
–
And Q3 is “more spread” :
s3  0.124  0.109  s4
–
Reflects “more variation in human choice”
E.g. % Males at UNC
A look under the hood:
• Highlight an interleaved Chart
• Click Chart Wizard
• Note Bar (and interleaved subtype)
• Different colors are in “series”
• Computed earlier on left
• Using Tools  Data Anal.  Histo’m
E.g. % Males at UNC
Interesting question:
What is “natural variation”?
Will model this soon.
This is “binomial” part of this example,
which we will study later.
Bias and Variation
HW:
3.62 (Hi bias – hi var, lo bias – lo var, lo
bias – hi var, hi bias – lo var)
3.65
Chapter 4: Probability
Goal: quantify (get numerical) uncertainty
•
Key to answering questions above
(e.g. what is “natural variation”
in a random sample?)
(e.g. which effects are “significant”)
Idea: Represent “how likely” something is
by a number
Simple Probability
E.g. (will use for a while, since simplicity
gives easy insights)
Roll a die (6 sided cube, faces 1,2,…,6)
• 1 of 6 faces is a “4”
• So say “chances of a 4” are:
“1 out of 6”  1 6.
• What does that number mean?
• How do we find such for harder
problems?
Simple Probability
A way to make this precise:
“Frequentist Approach”
In many replications (repeat of die roll),
expect about 16 of total will be 4s
Terminology (attach buzzwords to ideas):
Think about “outcomes” from an
“experiment”
e.g. #s on die
e.g. roll die, observe #
Simple Probability
Quantify “how likely” by assigning
“probabilities”
I.e. a number between 0 and 1, to each
outcome, reflecting “how likely”:
Intuition:
• 0 means “can’t happen”
• ½ means “happens half the time”
• 1 means “must happen”
Simple Probability
HW:
C10: Match one of the probabilities:
0, 0.01, 0.3, 0.6, 0.99, 1
with each statement about an event:
a. Impossible, can’t occur.
b. Certain, will happen on every trial.
c. Very unlikely, but will occur once in a
long while.
d. Event will occur more often than not.
Simple Probability
Main Rule:
Sum of all probabilities (i.e. over all
outcomes) is 1:
P1  1 6
E.g. for die rolling:
P2  1 6
P3  1 6
P4  1 6
P5  1 6
P6  1 6

1
Simple Probability
HW:
4.13a
4.15
Probability
General Rules for assigning probabilities:
i. Frequentist View
(what happens in many repititions?)
ii. Equally Likely: for n outcomes
P{one outcome} = 1/n (e.g. die rolling)
iii. Based on Observed Frequencies
e.g. life tables summarize when people die
Gives “prob of dying” at a given age
“life expectancy”
Probability
General Rules for assigning probabilities:
iv. Personal Choice:
–
–
–
HW:
4.16
Reflecting “your assessment:
E.g. Oddsmakers
Careful: requires some care