Download Week 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Two Main Uses of Statistics:
1) Descriptive:
•
•
To describe or summarize a collection of data points
The data set in hand = the population of interest
2) Inferential:
•
•
•
•
To make decisions or draw conclusions under conditions
of uncertainty and incompleteness
The data set in hand = a sample or an indicator of some
larger population of interest
Use these data to make an “educated guess” (or
calculated guess) about what we would find if we had full
information
Use the mathematical idea of probability to make
calculated guesses with plausible degrees of certainty
Probability (the key concept):
1) Probability = mathematical construct:
• An idealized theory about hypothetical data
• Use probability theory to develop
mathematical models of these data
• Many physical events display patterns that
follow these mathematical models
– Apply approximately & “in the long run”
– Use these models to make predictions &
decisions about real world outcomes (with a
calculated chance of error or uncertainty)
– This represents “rational decision-making” –
i.e., making uncertain but calculated guesses
Probability (cont.):
2) Definition: Probability of an outcome =
•
•
•
# of occurrences of specific outcome .
# of all possible outcomes
E.g., flipping a coin and getting “heads” (1/2)
A mathematical expectation of what happens “in the
(infinitely) long run”
Note difference between probabilities and frequencies
– Probabilities = calculated and idealized (expected)
– Frequencies = measured and counted (observed)
3) Arithmetic of probabilities
•
•
•
Can be combined (by adding or multiplying)
Probabilities of all possible outcomes sum to 1.0
Used to predict likelihood of complex events
Number of Combinations of 2 dice
7
#r of combination
6
5
4
3
2
1
0
2
3
4
5
6
7
8
Total score of 2 dice
9
10
11
12
Probability Distribution:
•
Refers to the distribution of all possible outcomes
by the likelihood of each one occurring (similar to
frequency distribution)
• Sum of all these likelihoods = 1.0
• Note difference between:
– Discrete outcomes
 A specific number of values each with a specific probability
 The specific probabilities add up to 1.0
– Continuous outcomes
 An infinite number of possible values each with a near-zero
probability (of being exactly that value)
 Described by a probability density function where “probability
density” = mathematical likelihood of being in that area
Probability of 2 dice scores
0.180
0.160
probability of score
0.140
0.120
0.100
0.080
0.060
0.040
0.020
0.000
2
3
4
5
6
7
8
9
10
Total of 2 dice
Probabilities of a Discrete Variable
11
12
Probability
Density
Probability Density function for a Continuous Variable
Probability Distributions:
•
•
The sum of all the probabilities in a distribution = 1.0
(unity)
The probability of any particular value (or range of
values) can be determined from the distribution
 By adding together the probabilities of all the values in a
range (or by computing the fraction of the total area under
the probability curve that is within the range)
 By subtracting the probabilities of all values NOT in the
range from 1.0.
Probability of 2 dice scores
0.180
0.160
probability of score
0.140
0.120
0.100
0.080
0.060
0.040
0.020
0.000
2
3
4
5
6
7
8
Total of 2 dice
9
10
11
12
Probability Distributions:
•
•
•
•
We can describe probability distributions in much the
same way as we describe frequency distributions –
except we report a probability for each value of the
variable rather than a frequency
Both types of distributions can be described by
central tendency and dispersion statistics
Statistics of probability distributions are called
parameters and they are referenced by Greek
letters (as mathematical idealizations) – σ2
These compare to measured statistics of frequency
distributions (which are called estimates and are
referenced by ordinary English letters) – s2
Probability Distributions:
•
•
Many different probability distributions are
possible, each with its own unique likelihood
function or “probability density function”
A few have proven very useful because:
a) They fit well to observed empirical patterns
b) They are calculable and usable
•
Most famous & useful = “Normal Distribution”
 Shows up in many naturally occurring patterns
 Constitute the “limiting form” of many different
distributions (as their numbers get larger)
 Probabilities & distributions can be exactly
calculated
The Normal Distribution:
•
A very specific probability distribution that:
– Yields a completely symmetric distribution that
always has the same “bell” shape” curve
(relating values to probabilities) (i.e., the
Normal Curve)
– Contains only 2 parameters which exactly
determine the size of every Normal distribution:
• μ (central tendency) (Greek letter mu) (= the
mean)
• σ (dispersion around the center) (Greek letter
sigma) (= the standard deviation)
4 different normal distributions
The Normal Distribution:
•
Very well known with exactly calculated
probabilities (“probability densities”)
– +1σ  68%
– +2σ  95%
– +3σ  99%
(68.2%)
(95.4%)
(99.6%)
• All normal distributions fit exactly this
probability curve
• We can use the normal curve to determine
how unusual or unlikely different scores are
– But to be useful, they need to be converted to a
common standard units
The Normal Distribution:
•
If we know the parameters of population’s
distribution of scores: μ & σ, we can covert
the scores to a standard normal distribution
─
─
•
Where scores = deviations from the mean : μ
And expressed in σ units (standard units)
Convert scores into standard scores by:
– Z=X–μ
σ
(“Z Scores”)
• Look up computed Z-score in the Normal
CurveTable (in Appendix C – Table A)
The Normal Distribution (cont):
•
Score from Normal Distribution Table tells
proportion of scores above and below the
specific (standardized) data value.
– E.g., a score of 125 on IQ test where
μ = 100 & σ = 10  Z = +2.5
o What % of scores are below this score?
o How likely are we to get this score or above?
– E.g., a score of 75 on IQ test where
μ = 100 & σ = 10  Z = -2.5
o What % of scores are below this score?
o How likely are we to get this score or above?