Download Populations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
POPULATIONS
BUT FIRST
• Let’s expand on the whole continuous versus discrete variables topic
TYPES OF CONTINUOUS VARIABLES
• Interval – numerical values that can be compared (i.e. the difference between
60 and 70 degrees Fahrenheit is the same as the difference between 80 and
90 degrees Fahrenheit)
• Ratio variables are interval variables with 1 added condition – a
measurement of zero indicates none of that variable (like tree height) thus
allowing you to compare measurements – like a tree that is 80 feet tall is
twice (ratio 2X) as tall as a 40 foot tree. Temperatures not measured in
Kelvin are not ratio variables.
TYPES OF DISCRETE OR QUALITATIVE VARIABLES
• Nominal – 2 or more independent categories (state in which person resides
for example)
• Ordinal – 2 or more related categories (cool, warm, hot for example)
• Dichotomous – Only 2 possible categories (yes/no for example)
POPULATION SIZE
• The totality of the individuals being studied
• In statistics this is represented by a capital N
• Examples
•
•
•
•
Number of sand grains on a beach
(actual count unknown)
Everyone who watched the ballgame (fixed)
All fish in a lake
(varies by when sampled)
All trees in the forest
(varies by when sampled)
• What is N for the population of students in this class?
POPULATION TOTAL
X is the population total
xi is the unit value
N is the population size
PARAMETRIC STATISTICS ASSUMPTIONS
1.
Non-discrete variables
• The value of a measurement can be any number on a continuous scale
• Examples include tree heights, tree diameters, animal body weights
2.
Normally distributed data
POPULATION CENTRAL TENDENCY
• The MEAN is the arithmetic average of the set of observations (can be
used for both continuous and discrete data)
• can be susceptible to influence of outliers (income example)
• The MEDIAN is the middle value of the series of observations when
they are arranged in magnitude order
• The MODE is defined as the most frequently appearing value or class
of values in a set of observations
• For a normally distributed (bell-curved shaped) population, these values
are all the same.
POPULATION CENTRAL TENDENCY MEDIAN
• For Skewed distributions like one
might find in the sizes of trees in a
newly regenerated forest, the
MEDIAN would be most appropriate
measure of central tendency.
• MEDIAN is also the best measure of
central tendency for Ordinal
Variables.
POPULATION CENTRAL TENDENCY MODE
•
MODE is often used for categorical
data since measurements of continuous
data have very few identical readings.
•
A dataset can have more than one
MODE.
•
MODE is likely the best measure of
central tendency for Nominal variables.
CENTRAL LIMIT THEOREM
• Regardless of the original
population distribution (normal,
skewed, uniform, etc.), the
distribution of the means of multiple
samples from that population
approximate a normal distribution.
• http://www.wessa.net/rwasp_sampli
ngdistributionmean.wasp
BACK TO NORMAL PARAMETRIC
STATISTICS FOR NOW
POPULATION VARIANCE
The population variance is used to
characterize the spread and is defined as
the average squared difference of the
observations from the population mean.
POPULATION STANDARD DEVIATION
To get a measure of variation expressed in the same units as the
original data, the square root of the variance is taken
like the variance,
standard deviation is
a measure of
dispersion of the
individual
observations about
the mean in a
normally distributed
population
POPULATION COEFFICIENT OF VARIATION
Because populations with large means tend to have larger standard deviations
than those with small means, the coefficient of variation permits a comparison of
relative variability about different means. It is independent of the units used and
is useful in comparing distributions where units may be different.
CV is the ratio of the standard
deviation to the mean and is
expressed as a percentage
PARAMETRIC POPULATION STATISTICS SUMMARY