Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Part F- UNDERSTANDING STATISTCS (Descriptive and Inferential Statistics) Descriptive statistics - summarize data in order to improve understanding Frequency table (with percents) presents data so easier to grasp than just a list of all scores Inferential statistics - based on calculations on a sample with inferences made to the population Sampling error is possible, so often report margin of error (how likely sample is to be different from population) Significance tests help decide how reliable sample results are Populations are described using Parameters while Samples are described with Statistics Review questions What is the difference between descriptive and inferential statistics? What is the difference between parameters and statistics? How do you calculate a percentage? Scales of Measurement (N→O→I→R) Lower → → → → → → Higher Nominal - named categories with no order Examples: Gender; race; marital status Ordinal - categories in order high/low or more/less Examples: Finish in race; ranks Interval - equal distance between scale points Arbitrary zero, can have negative numbers Examples: Test scores; temperature scales (F & C) Ratio – equal distance between scale points Absolute zero, no negative numbers Examples: Number of children; income in dollars → Which to use? Highest level possible; the higher the level, the more powerful the analysis that is possible Review questions What is the most precise level of data? What level of data do you get with: Class rank? (freshman, sophomore, junior, senior) Number of minutes late to class? What county you live in? Your age: 0-20, 21-40, 41-60, 60+ Your annual income in dollars: Descriptions of Nominal Data Names not quantities, even if use numbers as name tags (1=male; 2=female) Give frequencies (f) in each group or number of cases (N for population; n for sample) Generally also give percentages (part ÷ whole) in addition to frequencies (more intuitive) Univariate analysis looks at a single variable See sample frequency table in text Bivariate looks at relationship between two variables See sample table in text – percentages are better for comparisons, especially if unequal sized groups Review questions What percent of this class is male? Why do we report percents? What is the difference between univariate and bivariate statistics? SHAPES OF DISTRIBUTIONS (curves) Describe quantitative data distribution with a frequency polygon and then smooth it to see shape Many distributions’ curves are normal, meaning bell shaped and symmetrical Examples are heights, weights, average rainfall, IQ Some distributions are not normal; a few scores in one direction (high or low) create a tail Skewed to the right is called a positive skew (a few high incomes on the right side of the curve create a tail on the right) Skewed to the left is called a negative skew (a few low scores on the left side of the curve create a tail on the left) Review questions What are the characteristics of normal curves? What causes a negative skew? What causes a positive skew? THE MEAN, MEDIAN, AND MODE (Measures of Central Tendency) Mean is most frequently used measure of average It’s the balance point (positive and negative deviations from the mean equal zero); Calculate: ∑x ÷ N _ M or μ =mean of population; m=mean of sample (or X) Major drawback is effect of extreme scores in skewed distribution (pull mean up or down) Use median (midpoint) if skewed distribution Put scores in order, high to low, if odd number of scores, it’s the middle one; if even, average two middle ones Mode is most frequently occurring score Interval/ratio-use all three measures; ordinal-use only median or mode; nominal-use mode only Review questions How do you calculate the mean? How do you calculate the median? What is the mode? The Mean and Standard Deviation Two statistics (center and variation) are used to describe a distribution of interval/ratio values Mean (average) tells the center of scores Standard deviation tells how spread out scores are Use SD, S, or σ (sigma) for populations; sd, s for samples Text ex: Grp 1: M=15, S=10; Grp 2: M=15, S=.93; Grp 3: M=15, S=0 → One with larger S has more variation SD has special relationship to the normal curve 68% of cases within plus or minus one SD (±1SD) 95% within ±2SD and 99.7% within ±3SD Review questions Mean, median and mode are measures of what? Range and standard deviation are measures of what? What percent of scores are within one SD of the mean? The Median and IQR Median » measure of center used with skewed data and with ordinal data (mean not normally used) Range (high score – low score) » measure of variation used with median (SD not used with medians) Because range is highly affected by extreme scores, inter-quartile range (IQR) is often used It is the range found in the middle 50% of scores Between the 25th and 75th percentile scores (see text examples) Lower median scores indicate lower average Lower IQR scores indicate less variation Review questions Medians and ranges can be used with what levels of data? How do you determine the 25th percentile? What does the IQR tell you? Conclude Part F