Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Regression toward the mean wikipedia , lookup
Learning the Language of the Statistician โข The following slides contain many of the symbols we will be using in this class. These are the symbols we will be using in formulas. While I do not require you to memorize all of the formulas, it is important that you know what these symbols mean. You will be expected to memorize a few of the simpler formulas for the departmental final. โข To do responsible research, you must assimilate, integrate and apply. This power point presentations concentrates on assimilating this basic information. Sample Sampling Population Distribution -------------------------------------------------------------------------------------------------------- Individual Score yi yi Sample Size n N Mean ำฏ µ Mu ฯ Sigma Standard Deviation ๐ 2 ฯ/n estimated by s/ ๐ ฯ2 Variance S2 Sum โ โ Proportion p ฯ Hypothesized Mean ำฏo µo Hypothesized Proportion p0 ฯo Pi Stating Hypotheses with Symbols โข One Sample Hypothesis Test for a Proportion o Null hypothesis โข P = ฯ The sample proportion is the same as the population proportion. o Research hypothesis โข P โ ฯ The sample proportion is NOT the same as the population proportion. If you have a theory, you can use a one-tailed test and indicate that it is greater or less than the population proportion. โข One Sample Hypothesis Test for a Mean o Null hypothesis โข ำฏ = µ The sample mean is the same as the population mean. o Research hypothesis โข ำฏ โ µ The sample mean is not the same as the population mean. If you have a theory, you can use a one-tailed test and indicate that it is greater or less than the population mean. Stating Hypotheses with Symbols โข Chi Square o Null hypothesis โข H0 E=O, The expected value equal the observed value โข The dependent variable is contingent on the independent variable in the population o Research hypothesis โข H1 Eโ O, The expected value does not equal the observed value โข The dependent variable is NOT contingent on the independent variable in the population NOTE โ For an Elaborated Chi Square you simply state that E=0 for all of the independent/dependent combinations for the null hypothesis. For the research hypothesis you state that E โ 0 for at least one of the combinations. You would actually test each dependent/independent combination separately. Stating Hypotheses with Symbols โข One-Way Anova - with 2 groups o Null hypothesis โข H0 µ1 = µ2, The Means are equal Or The Mean of Group 1 is the same as the Mean of Group 2 in the population o Research hypothesis โข Two Tailed โ one the computer uses โข H0 µ1 โ µ2, The Means are not equal OR the Mean of Group 1 is not the same as the Mean of Group 2 in the population โข One Tailed - state a direction โข H0 µ1 < µ2, or µ1 > µ2 The Mean of Group 1 lower than the Mean of Group 2 in the population. The Mean of Group 1 is higher then the mean of Group 2 in the population. Stating Hypotheses with Symbols โข One-Way Anova - with more than 2 groups* o Null hypothesis โข H0 µ1 = µ2โฆโฆ..µk The Means of all the groups are equal. o Research hypothesis โข Two Tailed โ one the computer uses โข H0 µ1 โ µ2,โฆโฆ.. µk The Means are not equal. The Mean of one group is not equal to the Mean of at least one other group. o * This is still bi-variate. You donโt have more variables โ only more categories in the categorical variable. Stating Hypotheses with Symbols โข Bi-Variate Regression o Null hypothesis โข H0 ฮ1 = 0, The regression slope is not different from 0 in the population โข There is no relationship between the independent and dependent variables in the population. o Research hypothesis โข H0 ฮ1 โ 0, The Slope is different from 0 in the population โข There is a relationship between the independent and dependent variable in the population. โข Multi-Variate Regression o Null hypothesis โข H0 ฮ1โฆ..ฮฒk = 0, The regression slope is not different from 0 in the population โข There is no relationship between the independent and dependent variable in the population. o Research hypothesis โข H0 ฮ1โฆโฆฮฒk โ 0, At leas one of the Slopes is different from 0 in the population. โข There is a relationship between the independent variable and at least one of the dependent variables in the population. Matching Variables with Types of Analysis ๏ Chi-square (2 categorical variables) type of car you drive by gender race by political preference race by eye color gender by YES/NO questions ๏ Anova (1 categorical and one continuous variable) gender by yearly income gender by score on self esteem index race by yearly income political preference by yearly income age by whether or not you have children ๏ Bi Varate Regression (Two Continuous Variables) yearly income by years of education years married by marital satisfaction (scale score) age by number of children ๏ Multiple Regression ( continuous/dummy independent and continuous dependent) number of dates per year by yearly income, age, height, gender (dummy variable). poverty rates by sex ratio, percent single headed household, percent employed. Statistics That Do Not Use Hypotheses โข Confidence Intervals o We generally do not state a hypothesis for a Confidence Interval. Confidence Intervals are used to estimate a population mean or proportion based on a sample mean or proportion. Opinion polls use Confidence Intervals to predict election results etc. โข Pearson Correlation (correlation co-efficient or r) o We generally do not associate Pearson Correlation Matrixes with hypotheses. We generally use Pearson Correlation Matrixes for diagnostic purposes and to test the strength of bi-variate relationships. Equations/Formulas Z Tests โข Z scores o Z= ๐๐ โ µ ๐ o Where yi = individualโs score o µ = population mean o ฮฃ = population standard deviation o Information needed โข Population mean and standard deviation o Example of when we would use this โข If you knew an individualโs SAT/ACT score, you could determine what percentile they scored in (i.e., the 95%) โข OR if you know what percentile they are in, you can determine their score. Equations for Inferential Statistics โข Summary Statistics o Mean โข ำฏ= โ๐ฒ๐ฒ/n o Median โข ๐+๐ ๐ Order values and count up this far o Variance โข S2 = โ( ๐ฆ๐ฆ โ ำฏ)2 ๐โ1 o Standard Deviation โข S = ๐ 2 Inferring a Population Mean or Proportion Based on Sample Mean or Proportion โข The following Slides Focus on How to Estimate a Population Mean or Proportion if we ONLY have a random sample. โข In these cases we estimate one point in the population (i.e., the mean IQ of USU students) โข BUT we build a confidence interval around this single point โ generally a 95% confidence interval error A One or Large Sample Hypothesis Test โข In the following slides we compare a sample mean or proportion with a population mean or proportion. โข We want to know if our sample mean or proportion is different from the population mean or proportion โข The population mean or proportion could actually be a mean/proportion that is specified by a theory or by past research (rather than a number computed from a population data set) Equations/Formulas for One Sample Hypotheses Tests โข The equations are outlined in red โข What do the symbols mean o o o o o One sample hypothesis test for Proportion P = proportion in the sample ฮ 0 =proportion or hypothesized proportion in the population n = sample size Z = computed statistic o o o o o o One sample hypothesis test for Mean ำฎ = mean in the sample µ0 = mean or hypothesized mean in the population n = sample size sำฎ = standard error or an estimate of the standard deviation in the population s ๐ = computation for estimating the standard error using standard deviation of the sample size times the square root of the sample size. o Symbols for Statistics that Infer the Relationship in the Sample to the Population Chi Square Regression Symbol(s) X2 Interpretation Chi Square Statistic ฮฒ b แบก beta โ slope in population slope in sample alpha โ intercept or constant in prediction formula value of the X variables y-hat or predicted Y Y bar or the mean of Y X1โฆX ลถ ำฏ Anova µ yi - ำฏ Mu or mean in population Chi-Square Equation Equations/Formulas for Inferential Statistics o Pearson Correlation Coefficient and R2 โข Formula o r = โ(๐ฟ๐ฟ โ ) (yi โ ำฎ) โ ๐๐ โ ๐ ๐ โ( ๐๐ โ ำฏ) ๐ o R2 = r squared o Multiple Regression o Prediction Equation โข ลถ = ฮฌ + b1x1 + b2x2 + b3x3 +โฆ.. โข ลถ = predicted score for the dependent variable โข a = intercept or constant โข b = slope or parameter estimate for independent variables โ unit increase in Y variable for ever 1 unit increase in X โข X = value of the X values โ taken from the codebook o Equations/Formulas for Inferential Statistics โข Anova o Formula o TSS = โ ๐๐๐๐ - G2 ๐ ๐ป ๐๐ n o SSB = โ( ) โ ๐ฎ2 o TSS = Total Sum of Squares SSB = Sum of Squares Within SSW = Sum of Squares Between n SSW = TSS โ SSB s2B = F statistic s2w s2B = SSB/k-1 S2w = SSW/n-k o F = S2B/S2W df between = k-1 df within = n-k Anova and Regression Sums of Squares โข Anova o TSS = Total Sum of Squares o SSW = Sum or Squares within each group o SSB = Sum of Squares between the groups SSB/TSS = R square or the proportion of the total sum of squares that is explained by group membership โข Regression o TSS โ Total Sum of Squares o SSM โ Sum of Squares Model o SSE โ Sum of Squares Error Equations/Formulas for Inferential Statistics โข Two Sample T-test o Formula โข T = ำฏ1 โ ำฏ2 __________ sำฏ1 โ ำฏ2 this part is computed as follows sำฏ1 โ ำฏ2 = SP ๐/๐๐ + ๐/๐๐ o Pooled standard deviation Sp = standard deviation of sample 1 o โข ๐๐ โ๐ ๐บ๐๐+ ๐๐ โ๐ ๐บ๐๐ ๐๐+๐๐ โ๐ What symbols mean โข t = critical value โข ำฎ1 = mean of sample one โข ำฎ2 = mean of sample two โข n1 = size of sample 1 and n2 = size of sample 2 โข Degrees of freedom = df = n1 + n2 โ 2 o Uses a T distribution Estimated standard error of the difference between the two means standard deviation of sample 2 Equations/Formulas for Inferential Statistics โข Mann Whitney o Focuses on ranks rather than on means โ medians o Two Groups o Formula โข Z= T1 โ E(T1) ๐๐๐ (๐ป1) โข โข โข โข E(T1) = n1 (n+1) 2 Rank values from smallest to largest Sum ranks in smaller group = T1 Compute E(T1) Compute Variance Var T1 = n1 n2 S2 n s2 = โ(Yi - ำฎ )2 n-1 Uses a Z dsitribution. Equations/Formulas for Inferential Statistics โข Kruskal Wallis o Focuses on ranks (medians) rather than on means o More than Two Groups o Formula ๐ ๐ = ๐๐ ๐ป๐๐ -3 (n+1) โ ๐ (๐+๐) ๐ ๐๐ T = total sum of ranks for each sample n = total number of cases nk = number of cases for the k sample Uses X2 Distribution Degrees of Freedom = k-1 (where K is number of groups) Use when you want to compare more than two groups, and the distribution is not normal. Equations/Formulas for Inferential Statistics โข Formulas for Sample Size Sample size (n) = ๐๐ .9604 ๐ (๐+1) D = degrees of freedom or margin of error (usually .05) N= population size .9604 = a constant related to at least 95% sure This sample size is large enough that we can be at least 95% sure we can generalize to the population with a margin of error of .05 โข Prepared by Dr. Carol Albrecht