Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Statistical Evaluation • Statistics are tools for summarizing data – Descriptive: Simple facts – Inferential: What do the numbers mean? Populations vs. Samples • Population: Every individual that meets the criteria • Sample: The individuals you actually measured Populations vs. Samples • Statistic: A summary value that describes a sample • Parameter: A summary value that describes a population Each statistic has a corresponding parameter. Inferential statistics predict parameters Descriptive Statistics • Frequency Distributions (Table or Graph) – Discrete categories – Number of individuals in each one 30 Age Group Participants 25 14 20-29 12 30-39 22 40-49 24 50-59 6 5 60 and Up 3 0 Participants Under 20 20 15 10 Under 20 20-29 30-39 40-49 Age Group 50-59 60 and Up Frequency Distributions 30 30 25 25 Participants Participants • Histogram vs. Polygon 20 15 10 20 15 10 5 5 0 0 Under 20 20-29 30-39 40-49 Age Group 50-59 60 and Up Under 20 20-29 30-39 40-49 Age Group 50-59 60 and Up Frequency Distributions • Bar Graph – Use when data categories are not numerical – Leave space between bars 30 Participants 25 20 15 10 5 0 English Math Philosophy Physics Major Psych Undeclared Frequency Distributions • A good first step after data collection • Seldom presented in a final report Measures of Central Tendency • Get a single score that identifies the center of your distribution – Mean: Mathematical average – Median: Splits data in half – Mode: Most Common Value IQs of Freshmen Class 12 Participants 10 8 Mean: 132 Median: 133 Mode: 127 (3) 6 4 2 0 100-109 110-119 120-129 130-139 IQ 140-149 150-159 Choosing a Measure of C.T. • Mean: Commonly used and reader often assumes a normal distribution 12 Participants 10 8 Mean: 132 Median: 133 Mode: 127 (3) 6 4 2 0 100-109 110-119 120-129 130-139 IQ 140-149 150-159 Measures of Central Tendency • Median: Useful when a few values distort the mean Household Income (k) Household Income (k) 1 22 1 22 2 32 2 32 3 40 3 40 4 46 4 46 5 48 5 48 6 51 6 51 7 56 7 2357 Mean 42.1 Mean Median 46.0 Median 370.9 46.0 Measures of Central Tendency • Mode: Values are non-numerical – Favorite New England vacation spot State Responses Maine 5 Vermont 7 New Hampshire 25 Connecticut 4 Rhode Island 6 Massachusetts 2 Measures of Central Tendency • Mode: Bimodal (or multimodal) Distributions 16 14 Number 12 10 8 6 4 2 0 0-9 10-19 20-29 30-39 40-49 50-59 Age Group 60-69 70-79 80-89 Variability • A measure of the range or spread of the scores 35 30 PA NJ 25 20 15 10 5 0 0-9 10-19 20-29 30-39 40-49 Age Group 50-59 60-69 70-79 80-89 Variability Variance: A measure of variability determined by 1. 2. 3. 4. Computing the mean Determining each value’s distance from the mean Square each distance Take the average of the squared distances* Standard Deviation: The square root of the variance SD is a measure of how much the scores scatter around the mean. Correlations • Used to measure the direction and degree of a relationship – Create a scatter plot – Determine Pearson Correlation Coefficient (r) 8 6 GPA Hours Sleeping 10 4 2 r = 0.48 0 0 2 4 Hours Studying 6 8 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 r = -0.92 0 2 4 Hours of TV 6 8 Hypothesis Testing • Null Hypothesis (H0) There is no difference between two populations – Differences in sample averages reflect expected sampling error • Alternative Hypothesis (H1): There is a difference – Differences in sample averages reflect a true difference in the populations Sample Hypothesis • School District A is accused of age discrimination when hiring new faculty – They favor older teachers • Test the Hypothesis: Are the new teachers at School District A older, on average, than teachers at School District B? • Ave. for A: 28.2 Ave. for B: 26.5 Hypothesis Testing • Standard Error: A measure of how close your sample values (means etc.) are likely to be to the population values • Test Statistic: A mathematical technique that determines the “strength” of your test statistic based on Standard Error Test Stat. = Sample Stat/S.E. Level of Significance • Each test statistic will consider the acceptable risk of a Type I error (α) and compare it to the actual risk (p) How likely is it that a difference in sample means reflects a true difference in population means? α = 0.05 (sometimes 0.01) Types of Error You Claim That a Difference Exists (Reject Ho ) You Claim That no Difference Exists (Accept Ho ) A Difference Exists in the Population Correct! Type II Error No Difference Exists Type I Error Correct! Sample Case • Test the Hypothesis: Are the new teachers at School District A older, on average, than teachers at School District B? Ave. for A: 28.2 Ave. for B: 26.5 Result will depend on number of teachers sampled, range and standard deviation of the teachers’ ages Inferrential Statistics • The t-test for comparing means • Simplified version of t t = (M1-M2)/(Standard Deviations) Large values of t are associated with small values of p (lower risk of Type I Error)