Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive statistics I Distributions, summary statistics Frequency distributions • Frequency means the number of cases at a single value of a variable • A “distribution” depicts the frequency (number of cases) at every value of a variable – Frequency distributions illustrate how values disperse – For categorical variables use a BAR graph – For continuous variables use a HISTOGRAM (also try AREA) • Open DEMO PLUS.SAV • For categorical choose variable SEX (1=Male, 2=Female) • For continuous choose variable AGE • Open Height weight gender age.sav (or .xls), choose a categorical and continuous variable, display their distributions as above Summarizing distributions • • • • Producing a single statistic that best depicts a distribution For categorical variables, use the statistic “proportion” – Proportions with a base 100 are called a “percentage” (per 100) For continuous variables, use a measure of central tendency – The statistic “mean” (arithmetic average) – The statistic “median” (midpoint value – half of cases above, half below) – The statistic “mode” (most frequent value – can be more than one) Open DEMO PLUS.SAV – For categorical choose variable SEX (1=Male, 2=Female) • Analyze|Descriptive Statistics|Frequencies • Ask for a Bar Chart – For continuous choose variable AGE • Analyze|Descriptive Statistics|Frequencies • Ask for a Histogram • Open Height weight gender age.sav (or .xls), choose a categorical and continuous variable, proceed as above Categorical variables • “Percent” is a summary statistic – it summarizes a distribution • “Percent” – per cent – per hundred. 100 is always the denominator • Increases in percentage are computed off the base amount: Increase in jail population of 100 prisoners • 100 percent increase - 100 percent of 100 is 100; 100 + 100 = 200 • 150 percent increase – 150 percent of 100 is 150, 150 plus 100 = 250 • 200 percent increase – 200 percent of 100 is 200, 200 plus 100= 300 (3 times the base amount) • Percentages of less than 1 percent are described as a fraction – Example - 0.2 percent is 2/10th of 1 percent – Do not confuse decimals and percentages • Decimal .20 = 20/100 = 20 percent • Decimal .0020 = 20/10,000 = .20 percent • Percentages (proportions) are usually the best way to summarize datasets using categorical variables – 70 percent of students are employed – 60 percent of parolees recidivate • Percentages can be used to summarize findings when large numbers are involved – 50,000 persons were asked whether crime is a serious problem: 32,700 said “yes” Compute… Divide 32,700 by 50,000 and multiply by 100 32,700 -------- = .65 50,000 .65 X 100 = 65% • Percentages can be used to compare datasets – This year, 65% of 10,000 people polled said crime is a serious problem – Last year, 12,000 people were polled and 9,000 said crime is a serious problem Compute… 9,000 --------- = .75 12,000 .75 X 100= 75% • Because both samples were standardized (responses per 100 persons) they are directly comparable even though different numbers of persons were polled – 65% v. 75% • Percentages can magnify differences when raw numbers are small • Percentages can deflate differences when numbers are large – Increase from 1 to 3 convictions is … – Increase from 5,000 to 6,000 convictions is … Compute both... • Increase from 1 to 3 convictions is 200 percent – 3-1 = 2 – 2/1 (base) X 100= 200% • Increase from 5,000 to 6,000 convictions is 20 percent – 6,000 - 5,000 = 1000 – 1000/5000 (base) X 100= 20% Summarizing a distribution for ordinal variables • Categorical variables – categories reflect an inherent rank or order • Can summarize the distribution of an ordinal variable two ways: – As a categorical variable, using proportions / percentages – As a continuous variable, treating categories as points on a scale • Assign a numerical value to each category and calculate a mean • Open DEMO PLUS.SAV – Variable “class” is ordinal – Display and summarize the distribution both ways... • As a categorical/ordinal variable • As a continuous variable Continuous variables • If variables are continuous, can summarize a distribution with one or more measures of “central tendency” – Mean, median, mode • Mean: arithmetic average of scores – Pulled in the direction of extreme scores – Experiment with Height weight gender age.sav • Median: Middle score – half higher, half lower – If there is an even number of scores, average the two center scores – If there is an odd number of scores, use the center score • Exercise 1: • Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Answer: 8 Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Answer: 10 12-8 = 4 • 4/2 = 2 8+2 or 12-2 = 10 Median is a useful summary statistic when there are extreme scores – Extreme scores make the mean a misleading summary measure of a distribution • Median can be used with continuous or ordinal variables • Mode: Score that occurs most often (with the greatest frequency) – There can be more than one mode (bi-modal, tri-modal, etc.) • Exercise 1: • Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Mode = 5 (uni-modal) Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Modes = 5, 21 (bi-modal) • Modes are a useful summary statistic for distributions where cases cluster at particular scores – an interesting condition that would be missed by the mean or median Range • Another way to describe a distribution of a continuous variable – Not a measure of central tendency • Range depicts the lowest and highest scores in a distribution 2, 3, 5, 5, 8, 12, 17, 19, 21 Range is 221 or 19 (21-2)