* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistics PowerPoint
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Regression toward the mean wikipedia , lookup
Psychometrics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Overview • Identify and explain descriptive and inferential (parametric) statistics. • Descriptive statistics are the measures of central tendency. • Mean, median, mode, standard deviation, and standard error of the mean. • The inferential statistics are: • The z-test, t-test, f-test, and the Pearson correlation coefficient (r). Overview • To provide the information necessary to interpret the tables for these tests as presented in the Statistical Package for the Social Sciences (SPSS). • This includes: • • • • Confidence level Confidence interval Significance (sig.) Assumption of equal variance Quantitative Research • Most of the quantitative research done at the MBA level is survey research. • Issues concerning quantitative survey research design. • What are the variables for the study? • How will these variables be measured? • How will the sample be selected? • How will the data be collected? • How many participants are needed? • How will the data be analyzed? Quantitative Research • There are four types of quantitative research questions. Type of Research Question Survey Research Example Descriptive What are participants’ opinion about hazardous waste? Group Difference Is there a significant difference in opinions about hazardous waste between Democrats and Republicans? Relationship Is there a significant relationship between age of participants and their opinions about hazardous waste? Prediction Can opinions about hazardous waste be accurately predicted by level of education or religious preference? Descriptive Questions • Descriptive questions count and group responses. • In our study regarding participants’ opinions about hazardous waste descriptive questions would be used to: • Count similar opinions • Group opinions by demographic characteristics (age, education level, sex, etc.) • Describe grouped opinions by average, most often expressed, degree of intensity, etc. • Descriptive statistics are used to answer these questions • Mean, median, mode, standard deviation. Group Difference Questions • Group difference questions compare the responses of one group of participants with those of another group to identify significant differences. • A single sample can be compared to a constant – a researcher might want to test whether a groups’ average level of concern expressed on a hazardous waste survey differs from 10 (on a 1 to 10 scale). • Two independent samples can be compared with each other - a researcher might want to test whether the average level of concern expressed by Republicans on a hazardous waste survey differs from that expressed by Democrats. Group Difference Questions • Two paired samples can be compared with each other – a researcher might want to test whether the average level of concern expressed by a group on a hazardous waste survey before a hazardous waste ad campaign differs from that expressed by the same group after the ad campaign. • Mean comparison statistics are used to answer these questions. • One-sample t test • Independent samples t test • Paired samples t test Relationship Questions • Relationship questions look for relationships between dependent and independent variables. • A researcher may want to determine if there is a significant relationship between a participants age and the level of concern expressed on a hazardous waste survey. • Bivariate correlation statistics are used to answer these questions. • Pearson’s correlation coefficient. Prediction Questions • Prediction questions look for independent variables that can be used to predict the outcome of the dependent variable. • A researcher may want to determine if participants’ level of education or religious preference predict the average level of concern expressed on a hazardous waste survey. • Forecasting statistics are used to answer these questions. • Linear regression analysis Levels of Measurement • In putting together a survey the researcher must pay attention to the level of measurement used to collect data on the variables in the study. • There are four levels or scales of measurement. • Nominal • Ordinal • Interval • Ratio Nominal Measurements • In nominal measurement the numerical values just "name" the attribute uniquely. No ordering of the cases is implied. • For example, jersey numbers in basketball are measures at the nominal level. A player with number 30 is not more of anything than a player with number 15, and is certainly not twice whatever number 15 is. • Examples include: • 1. Male 2. Female • 1. Caucasian 2. Indian 3. Asian • 1. Married 2. Single 3. Divorced Ordinal Measurements • In ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not have any meaning. • For example, you might code Educational Attainment as 0=less than H.S. 1=some H.S. 2=H.S. degree 3=some college 4=college degree 5=post college • In this measure, higher numbers mean more education. But is distance from 0 to 1 same as 3 to 4? Of course not. The interval between values is not interpretable in an ordinal measure. • Examples include: • Movie ratings • Class standing – Freshman, Sophomore, Junior, Senior Interval Measurements • In interval measurement the distance between attributes does have meaning. For temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. • Because the interval between values is interpretable, we can compute an average of an interval variable, where we cannot for ordinal scales. • Note that in interval measurement ratios don't make any sense - 80 degrees is not twice as hot as 40 degrees. • Examples include measures where 0 (zero) does not mean the absence of the attribute. Ratio Measurements • In ratio measurement there is always an absolute zero that is meaningful. This means that you can construct a meaningful fraction (or ratio) with a ratio variable. • Weight is a ratio variable. • In applied social research most "count" variables are ratio, for example, the number of clients in past six months. Why? Because you can have zero clients and because it is meaningful to say that "...we had twice as many clients in the past six months as we did in the previous six months.“ • Examples include: degrees Kelvin, annual income in dollars, length or distance in inches, feet, miles, etc. Levels of Measurement Allowable Nominal Ordinal Interval Descriptive Mode Median, Mode Mean, Median, Mode, Standard Deviation Arithmetic Operations Counts Greater or less than Addition and subtraction of scale values Statistics Ratio Multiplication and division of scale values. z test, t test, f test, Pearson’s correlation coefficient, linear regression • In social science research the Likert scale is often used to promote ordinal values to interval values. This allows the data to be analyzed using inferential statistics. The Likert Scale • The scale is named after Rensis Likert, who first developed it. • The Likert scale is a psychometric scale commonly used in questionnaires, and is the most widely used scale in survey research. • When responding to a Likert questionnaire item, respondents specify their level of agreement to a statement. A Seven Level Likert Scale 1 2 3 4 5 6 7 The Likert Scale • Likert scales can be used to evaluate subjective or objective criteria. • They are bipolar scales in that generally some level of agreement or disagreement is measured. • Many social science researchers recommend scales consisting of 7 to 9 items. A Seven Level Likert Scale 1 2 3 4 5 6 7 The Likert Scale • Likert scale values should always be accompanied by item labels. • Without labels a mean result with a scale value of 1.9 would be reported as 1.9 on a scale of 1 to 7. • With labels a mean result of 1.9 can additionally be reported as ‘Dissatisfied’. This adds meaning to the interpretation of the results. A Seven Level Likert Scale 1 2 3 4 5 6 7 Very Dissatisfied Dissatisfied Somewhat Dissatisfied Neither Dissatisfied or Satisfied Somewhat Satisfied Satisfied Very Satisfied The Likert Scale • Some researchers object to the middle item indicating the absence of satisfaction (or absence of agreement or disagreement). • They suggest that participants should be forced to respond or to select a no response option. A Seven Level Likert Scale 1 2 3 4 5 6 7 Very Dissatisfied Dissatisfied Somewhat Dissatisfied Neither Dissatisfied or Satisfied Somewhat Satisfied Satisfied Very Satisfied The Likert Scale • Moving the no response option out of the scale adds to the meaningfulness of measures of central tendency (mean, median, mode) • It also allows data to be interpreted in two groups. • The researcher can indicate the number of participants responding as neither satisfied or dissatisfied. • And, when the calculations are made they are based on participants with an opinion about satisfaction. A Six Level Likert Scale 1 2 3 4 5 6 0 Very Dissatisfied Dissatisfied Somewhat Dissatisfied Somewhat Satisfied Satisfied Very Satisfied Neither Satisfied or Dissatisfied Types of Statistics • There are two types of statistics used in social science research. • Descriptive • Inferential • Descriptive statistics refer to methods used to organize, summarize, and tabulate data. • Descriptive statistics provide a picture of what happened in the study. • Descriptive statistics provide a basis for inferential statistics. Types of Statistics • Inferential statistics refers to methods used to draw inferences about a population based on descriptive data available on a sample drawn from the population. The Mean • The mean is the sum of the individual samples (χ) divided by the number of samples. • μ (mu) is the mean of the entire population. • x (x-bar) is the mean of the sample. Observation x 1 60 2 34 3 74 4 10 5 86 6 59 7 34 8 50 9 43 10 59 11 68 12 35 13 53 14 28 15 82 16 47 17 60 18 40 19 19 20 59 1,000 x  x n 1,000 50  20 The Median • The median is the number that separates the higher half of a sample from the lower half. • It is found by arranging the observations from highest to lowest and picking the middle one. • When the number of observations is even, the median is the mean* of the two middle observations. Observation x 1 86 2 82 3 74 4 68 5 60 6 60 7 59 8 59 9 59 10 53 11 50 12 47 13 43 14 40 15 35 16 34 17 34 18 28 19 19 20 10 (53  50) 51.5  2 * The strength of the median as a measure of central tendency is that, unlike the mean, it is a value that occurs in the sample. This strength is nullified when the median is the mean of the two middle observations. The Mode • The mode is the value that occurs most frequently in a sample. • When multiple values occur with the highest frequency the sample is said to be bi-modal or multi-modal. Observation x 1 86 2 82 3 74 4 68 5 60 6 60 7 59 8 59 9 59 10 53 11 50 12 47 13 43 14 40 15 35 16 34 17 34 18 28 19 19 20 10 Standard Deviation • It is a measure of the dispersion of a collection of numbers. • It indicates how widely spread the values in a dataset are with respect to their mean. A data set with a mean of 50 (shown in blue) and a standard deviation (σ) of 20. Standard Deviation   (x  x) 2 (n  1) 7,612 20  19 • It is calculated by determining the square root of the variance. xx ( x  x )2 x x 1 60 50 10 100 2 34 50 -16 256 3 74 50 24 576 4 10 50 -40 1600 5 86 50 36 1296 6 59 50 9 81 7 34 50 -16 256 8 50 50 0 0 9 43 50 -7 49 10 59 50 9 81 11 68 50 18 324 12 35 50 -15 225 13 53 50 3 9 14 28 50 -22 484 15 82 50 32 1024 16 47 50 -3 9 17 60 50 10 100 18 40 50 -10 100 19 19 50 -31 961 20 59 50 9 81 7,612 Standard Error of the Mean (SEM) • Because the mean of the population is usually unknown it is important to estimate the error between the sample mean and the population mean. • The SEM is an unbiased estimate of expected error in the sample estimate of a population mean. SEM   n 20 4.47  20 Inferential Statistics • With these measures of central tendency (mean, median, mode, standard deviation, and standard error of the mean), we are now able to statistically compare samples. • We have our sample (n = 20) • • • • • Mean = 50 Median = 51.5 Mode = 59 Standard Deviation = 20 Standard Error of the Mean = 4.47 • Let’s assume that we want to compare our sample with another sample having a mean of 60 to determine if there is a significant difference between the samples. The z-test • The z-test is used primarily with standardized testing to determine if the test scores of a particular sample of test takers are within or outside of the standard performance of test takers (a second group of test scores). • The assumptions necessary for the z-test are: • The population standard deviation must be known. • The sample must be random. • The sample must be normally distributed. • The null hypothesis for the test is that the means are equal H0: μ1=μ2 (the samples come from the same population). The z-test • First we need to determine the confidence level (CL) we require for this comparison. • The CL is determined by the researcher. • The CL tells us how sure we can be of our results. • Most social science research uses a CL of 95%. • Next we need to determine the alpha level (α). • α = 1-CL • Most social science research uses an α of 5%. The z-test • The z score shows the distance of our test mean from the population mean in units of the population standard deviation. • Our test mean = 60 • Our sample mean = 50 • Our SEM = 4.47 • Therefore, • Our z-score is 2.23 xx z SEM 60  50 2.23  4.47 The Normal Distribution That means our comparison mean is 2.23 standard deviations above our population mean. Mean Median Mode The Normal Distribution A z-table tells us that 48.75% of the scores fall between 0 and our score of 60 (2.23σ). In the normal distribution, 50% of the scores fall between 0 and -∞. 48.75% 50% Mean Median Mode The Normal Distribution Our confidence interval was 95%. Therefore, we must reject H0: (μsamp=μcomp). This tells us that 98.75% of the time the comparison scores are higher than the scores of our sample. 98.75% Mean Median Mode The Normal Distribution The minimum/maximum allowable z-score for our 95% CL in this two-tailed test is ±1.96. 95% Mean Median Mode Confidence Interval • Now that we know the number of standard deviations for the 95% CL for the z-test we can calculate our confidence interval. • z-score for 95%CL = ±1.96 • SEM = 4.47 • Mean = 50 The upper bound is ub  x  ( SEM )( z ) 58.76  50  8.76 The lower bound is lb  x  ( SEM )( z ) 41.24  50  8.76 Confidence Interval • The range of means for which we can fail to reject H0: (μsamp=μcomp) is 41.24 to 58.76. Errors in Hypothesis Testing • Hypothesis testing provides a negative assessment. • Researchers do not prove hypothesis. They test them in an attempt to disprove them. • Researchers either reject H0 or fail to reject H0. • There are two possible errors in hypothesis testing. • Type 1 Error: reject H0 when H0 is true. • Type 2 Error: fail to reject H0 when H0 is false. • The α level of the test sets the probability of making a Type 1 or a Type 2 Error. • In most social science research the probability of an error is 5%. The t distribution • The t test is the mean comparison test most often used in social science research. The z-test requires that the population standard deviation is known. That is almost never the case in social science research. • The assumptions of the t test are: • The sample must be random. • The sample must be normally distributed. • However the t test is much more forgiving of violations of these assumptions (random samples are difficult). The t Distribution • The t test is much better with smaller samples. • The area under the curve changes with each sample size below 200. • With sample sized above 200 the t and z tests are the same. Mean Median Mode The t Distribution • By increasing the area under the curve for smaller samples the t test decreases the likelihood of a type 1 error. • t test and z test results are read and interpreted the same. Mean Median Mode The F Distribution • The F test has a variety of applications. • It measures variance within and between samples. • In inferential statistics it is used to test variance between two samples (t test) or to analyze the variance of more than two samples as in analysis of variance calculations (ANOVA). • It is read and interpreted in the same way as the t test and the z test. SPSS • Pause here and use the Statistical Package for the Social Sciences (SPSS) to compare sample means. Bivariate Correlation • The relationship between two sets of variables can be calculated numerically with a correlation coefficient. • The correlation coefficient is a measure of the strength of association between two sets of variables. • The correlation coefficient is a value between -1 and 1. • The stronger the correlation between two sets of variables the closer the correlation coefficient is to -1 (for negative correlations) and 1 (for positive correlations). Bivariate Correlation • Hinkle provides a table used by many social science researchers to describe the level of correlation present in their data. Interpretation Very High Size of Correlation Direction Size of Correlation Direction .90 to 1.00 Positive -.90 to -1.00 Negative High .70 to .89 Positive -.70 to -.89 Negative Moderate .50 to .69 Positive -.50 to -.69 Negative Low .30 to .49 Positive -.30 to -.49 Negative Little if any .00 to .29 Positive -.00 to -.29 Negative The Pearson r • The Pearson correlation coefficient or Pearson r is used when both the X and Y variables are measured on at least the interval scale. • All correlation coefficients operate in a similar fashion. • The H0 for the Pearson r is H0: ρ=0. Thus, there is no correlation. SPSS • Pause here and use the Statistical Package for the Social Sciences (SPSS) to calculate correlations.
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            