Download How do we quantify uncertainty: through Probability!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Statistical inference wikipedia , lookup

Gibbs sampling wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Kinds of statistics
• Descriptive- goal is to describe the variables
and relationships among them. Often used
when you are working with an entire
population (e.g. all SSU students)
• Inferential- used when a sample is taken from
the larger population, and inferences about
the population are made from the sample.
Requirements for inferential
statistics
• Individuals selected for sampling should be
chosen at random
• If sampling is not random, it is biased and
inferences about the population are suspect
• Biased samples include those where
individuals that are easy to capture for
measurement differ in some way from those
who escape capture
Class measurement example
Requirement of independence
• Choice of any individual from population
does not affect the likelihood that another
individual will be chosen.
• Each measurement should represent a
separate replicate from the population
• Measurements taken from some common
arena or the same individuals are not
independent, and statistical analyses should
take account of these
• We obtained 210 measurements last week
in lab.
• These measurements were based on 15
replicates (people), where each replicate
was measured twice by two groups.
• If we wanted to statistically compare gender
with respect to our characters, our sample
size is 15, NOT 60.
• One way to deal with lack of independence
would be to average the 4 measures per
person.
Reasons for uncertainty in data
– Process uncertainty- the true population
changed between samples due to some process
of interest
– Observation uncertainty- difference is
generated by sampling error
How do we quantify
uncertainty: through
Probability!
• Summary statistics (e.g. mean) reflect imperfect
estimates of the mean value of a population
• Measurement error- where the measuring device is
imperfect (often not nearly as great as variation
among individuals in biology)
1
Probabilities
• Important to understanding significance of
differences between samples
• Range from zero to one
• Expressed as P(A)
– Relative frequency of event over long term
– Degree of belief whether event will occur
• Often probabilities must be combined, or
conditioned on other events
Probability distributions (1.1)
• Random variables have probability
distributions associated with them
• Possible values for the variable are
indicated on the horizontal (X) axis
• The relative probability for each value is
shown on the Y axis
– For a continuous variable, probability can be
established by some function f(Y)
– Area under the curve = 1, the sum of
probabilities
Using probability distributions
(2.1)
• Often we are interested in the
probability associated with a range of
values
• Thus, we examine the area beneath the
curve associated with that range
Distributions for inferential
statistics (fig 1.2)
• Z or normal distribution
• Student’s t distribution- used to compare sample
means with a sample standard deviation to
population values. Width depends on degrees of
freedom (sample size)
• Chi square distribution- used to compare observed
versus expected values, especially frequencies
• F distribution- used to compare distribution of a
ratio of two independent sum of square variables,
each divided by its own df.
Normal distribution
• Symmetrical probability distribution with a
bell shape (see formula in text)
• Can be defined by two parameters- mu and
sigma
• These parameters are independent
• Mu indicates the location of the peak
• Sigma indicates the breadth of the curve
Estimation of the center of a
distribution for a continuous variable
(Table 2.1)
• Median– middle measurement of a set of data, which estimates
the center of a distribution
– Is equal to the mean if data are normally distributed
– Is less sensitive than the mean to outliers if data are
skewed
• Mean
– Closest constant variable to a random variable
– If equal weighting used, calculated by multiplying the
sum by 1/N, the sample size
2