Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

POPULATIONS BUT FIRST • Let’s expand on the whole continuous versus discrete variables topic TYPES OF CONTINUOUS VARIABLES • Interval – numerical values that can be compared (i.e. the difference between 60 and 70 degrees Fahrenheit is the same as the difference between 80 and 90 degrees Fahrenheit) • Ratio variables are interval variables with 1 added condition – a measurement of zero indicates none of that variable (like tree height) thus allowing you to compare measurements – like a tree that is 80 feet tall is twice (ratio 2X) as tall as a 40 foot tree. Temperatures not measured in Kelvin are not ratio variables. TYPES OF DISCRETE OR QUALITATIVE VARIABLES • Nominal – 2 or more independent categories (state in which person resides for example) • Ordinal – 2 or more related categories (cool, warm, hot for example) • Dichotomous – Only 2 possible categories (yes/no for example) POPULATION SIZE • The totality of the individuals being studied • In statistics this is represented by a capital N • Examples • • • • Number of sand grains on a beach (actual count unknown) Everyone who watched the ballgame (fixed) All fish in a lake (varies by when sampled) All trees in the forest (varies by when sampled) • What is N for the population of students in this class? POPULATION TOTAL X is the population total xi is the unit value N is the population size PARAMETRIC STATISTICS ASSUMPTIONS 1. Non-discrete variables • The value of a measurement can be any number on a continuous scale • Examples include tree heights, tree diameters, animal body weights 2. Normally distributed data POPULATION CENTRAL TENDENCY • The MEAN is the arithmetic average of the set of observations (can be used for both continuous and discrete data) • can be susceptible to influence of outliers (income example) • The MEDIAN is the middle value of the series of observations when they are arranged in magnitude order • The MODE is defined as the most frequently appearing value or class of values in a set of observations • For a normally distributed (bell-curved shaped) population, these values are all the same. POPULATION CENTRAL TENDENCY MEDIAN • For Skewed distributions like one might find in the sizes of trees in a newly regenerated forest, the MEDIAN would be most appropriate measure of central tendency. • MEDIAN is also the best measure of central tendency for Ordinal Variables. POPULATION CENTRAL TENDENCY MODE • MODE is often used for categorical data since measurements of continuous data have very few identical readings. • A dataset can have more than one MODE. • MODE is likely the best measure of central tendency for Nominal variables. CENTRAL LIMIT THEOREM • Regardless of the original population distribution (normal, skewed, uniform, etc.), the distribution of the means of multiple samples from that population approximate a normal distribution. • http://www.wessa.net/rwasp_sampli ngdistributionmean.wasp BACK TO NORMAL PARAMETRIC STATISTICS FOR NOW POPULATION VARIANCE The population variance is used to characterize the spread and is defined as the average squared difference of the observations from the population mean. POPULATION STANDARD DEVIATION To get a measure of variation expressed in the same units as the original data, the square root of the variance is taken like the variance, standard deviation is a measure of dispersion of the individual observations about the mean in a normally distributed population POPULATION COEFFICIENT OF VARIATION Because populations with large means tend to have larger standard deviations than those with small means, the coefficient of variation permits a comparison of relative variability about different means. It is independent of the units used and is useful in comparing distributions where units may be different. CV is the ratio of the standard deviation to the mean and is expressed as a percentage PARAMETRIC POPULATION STATISTICS SUMMARY