* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Section 10 - Data Ana+
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Mgt 540 Research Methods Data Analysis 1 Additional “sources” Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm  http://web.utk.edu/~dap/Random/Order/Start.htm  Data Analysis Brief Book (glossary)  http://rkb.home.cern.ch/rkb/titleA.html Exploratory  http://www.itl.nist.gov/div898/handbook/eda/eda.ht m Statistical  Data Analysis Data Analysis http://obelia.jde.aca.mmu.ac.uk/resdesgn/arsham/opre330. htm 2 3 Copyright © 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E FIGURE 12.1 Data Analysis  Get the “feel” for the data  Get Mean, variance' and standard deviation on each variable  See if for all items, responses range all over the scale, and not restricted to one end of the scale alone.  Obtain Pearson Correlation among the variables under study.  Get Frequency Distribution for all the variables.  Tabulate your data.  Describe your sample's key characteristics (Demographic details of sex composition, education, age, length of service, etc. )  See Histograms, Frequency Polygons, etc. 4 Quantitative Data  Each type of data requires different analysis method(s): Nominal Labeling  No inherent “value” basis  Categorization purposes only Ordinal Ranking, Interval sequence Relationship basis (e.g. age) 5 Descriptive Statistics Describing key features of data  Central Mean,  Spread Tendency median mode Variance, range standard deviation,  Distribution Skewness, (Shape ) kurtosis 6 Descriptive Statistics Describing key features of data  Nominal Identification only / categorization  Ordinal (Example on pg. 139) Non-parametric Do statistics not assume equal intervals  Frequency counts  Averages (median and mode)  Interval Parametric Mean, Standard Deviation, variance 7 Testing “Goodness of Fit” Split Half Reliability Internal Consistency Convergent Validity Involves Correlations and Factor Analysis Discriminant Factorial 8 Testing Hypotheses  Use appropriate statistical analysis T-test (single or twin-tailed) Test the significance of differences of the mean of two groups ANOVA Test the significance of differences among the means of more than two different groups, using the F test. Regression (simple or multiple) Establish the variance explained in the DV by the variance in the IVs 9 Statistical Power  Claiming Errors Type a significant difference in Methodology 1 error  Reject the null hypothesis when you should not.  Called an “alpha” error Type  Fail 2 error to reject the null hypothesis when you should.  Called a “beta” error Statistical power refers to the ability to detect true differences avoiding type 2 errors 10 Statistical Power see discussion at http://my.execpc.com/4A/B7/helberg/pitfalls/  Depends Sample on 4 issues size The effect size you want to detect The alpha (type 1 error rate) you specify The variability of the sample  Too little power  Too much power Overlook Any effect difference is significant 11 Parametric vs. nonparametric  Parametric (characteristics referring to specific population parameters) Parametric assumptions Independent samples Homogeneity of variance Data normally distributed Interval or better scale  Nonparametric Sometimes samples assumptions independence of 12 t-tests  (Look at t tables; p. 435) Used to compare two means or one observed mean against a guess about a hypothesized mean  For large samples t and z can be considered equivalent  Calculate t = - μ S Where S is the standard error of the mean, S/√n and df = n-1 13 t-tests  Statistical programs will give you a choice between a matched pair and an independent t-test. Your sample and research design determine which you will use. 14 z-test for Proportions (Look at t tables; p. 435)  When data are nominal Describe by counting occurrences of each value From counts, calculate proportions Compare proportion of occurrence in sample to proportion of occurrence in population Hypotheses testing allows only one of two outcomes: success or failure 15 z-test for Proportions (Look at t tables; p. 435) Comparing sample proportion to the population proportion  H0:  = k, where k is a value  H1: k between 0 and z=p- = p  Equivalent 1 p- √((1- )/n) to χ2 for df = 1 16 Chi-Square Test(sampling distribution) One Sample  Measures sample variance  Squared deviations from the mean – based on normal distribution Nonparametric  Compare expected with observed proportion  H0: Observed proportion = expected proportion  df = number of data points   categories, χ2 cells (k) minus 1 = (O – E)2 E 17 Univariate z Test  Test a guess about a proportion against an observed sample; eg., MBAs constitute 35% of the managerial population  H0: π = .35  H1: π  .35 (two-tailed test suggested) 18 Univariate Tests  Some univariate tests are different in that they are among statistical procedures where you, the researcher, set the null hypothesis.  In many other statistical tests the null hypothesis is implied by the test itself. 19 Contingency Tables Relationship between nominal variables   http://www.psychstat.smsu.edu/introbook/sbk28m.htm Relationship between subjects' scores on two qualitative or categorical variables (Early childhood intervention)  If the columns are not contingent on the rows, then the rows and column frequencies are independent. The test of whether the columns are contingent on the rows is called the chi square test of independence. The null hypothesis is that there is no relationship between row and column frequencies. 20 Correlations A statistical summary of the degree and direction of association between two variables  Correlation itself does not distinguish between independent and dependent variables  Most common – Pearson’s r 21 Correlations  You believe that a linear relationship exists between two variables  The range is from –1 to +1  R2, the coefficient of determination, is the % of variance explained in each variable by the other 22 Correlations r = Sxy/SxSy or the covariance between x and y divided by their standard deviations  Calculations needed   The means, x-bar and y-bar  Deviations from the means, (x – x-bar) and (y – y-bar) for each case  The squares of the deviations from the means for each case to insure positive distance measures when added, (x - xbar)2 and (y – y-bar)2  The cross product for each case (x – xbar) times (y – y-bar) 23 Correlations  The null hypothesis for correlations is H0: ρ = 0 and the alternative is usually H1: ρ ≠ 0 However, if you can justify it prior to analyzing the data you might also use H1: ρ > 0 or H1: ρ < 0 , a one-tailed test 24 Correlations  Alternative Spearman rranks measures rank correlation, rranks and r are nearly always equivalent measures for the same data (even when not the differences are trivial) Phi coefficient, rΦ, when both variables are dichotomous; again, it is equivalent to Pearson’s r 25 Correlations  Alternative measures Point-biserial, rpb when correlating a dichotomous with a continuous variable  If a scatterplot shows a curvilinear relationship there are two options: A data transformation, or Use the correlation ratio, η2 (etasquared) SSwithin 1SStotal 26 ANOVA  For two groups only the t-test and ANOVA yield the same results  You must do paired comparisons when working with three or more groups to know where the means lie 27 Multivariate Techniques  Dependent Regression variable in its various forms Discriminant analysis MANOVA  Classificatory Cluster or data reduction analysis Factor analysis Multidimensional scaling 28 Linear Regression  We would like to be able to predict y from x  Simple linear regression with raw scores y = dependent variable sy x = independent variable sx b = regression coefficient = rxy c = a constant term  The general model is y = bx + c (+e) 29 Linear Regression   The statistic for assessing the overall fit of a regression model is the R2 , or the overall % of variance explained by the model R2 = 1 – unpredictable variance total variance = predictable variance total variance = 1 – (s2e / s2y), where s2e is the variance of the error or residual 30 Linear Regression  Multiple regression: more than one predictor y = b1x1 + b2x2 + c Each regression coefficient b is assessed independently for its statistical significance; H0: b = 0  So, in a statistical program’s output a statistically significant b rejects the notion that the variable associated with b contributes nothing to predicting y  31 Linear Regression   Multiple regression  R2 still tells us the amount of variation in y explained by all of the predictors (x) together  The F-statistic tells us whether the model as a whole is statistically significant Several other types of regression models are available for data that do not meet the assumptions needed for least-squares models (such as logistic regression for dichotomous dependent variables) 32 Regression by SPSS & other Programs  Methods for developing the model  Stepwise: let’s computer try to fit all chosen variables, leaving out those not significant and re-examining variables in the model at each step  Enter: researcher specifies that all variables will be used in the model  Forward, backward: begin with all (backward) or none (forward) of the variables and automatically adds or removes variables without reconsideration of variables already in the model 33 Multicollinearity  Best regression model has uncorrelated IVs  Model stability low with excessively correlated IVs  Collinearity diagnostics identify problems, suggesting variables to be dropped  High tolerance, low variance inflation factor are desirable 34 Discriminant Analysis  Regression requires DV to be interval or ratio  If DV categorical (nominal) can use discriminant analysis  IVs should be interval or ratio scaled  Key result is number of cases classified correctly 35 MANOVA  Compare DVs means on two or more (ANOVA  Pure limited to one DV) MANOVA via SPSS only from command syntax  Can use the general linear model though 36 Factor Analysis    A data reduction technique – a large set of variables can be reduced to a smaller set while retaining the information from the original data set Data must be on an interval or ratio scale E.g., a variable called socioeconomic status might be constructed from variables such as household income, educational attainment of the head of household, and average per capita income of the census block in which the person resides 37 Cluster Analysis    Cluster analysis seeks to group cases rather than variables; it too is a data reduction technique Data must be on an interval or ratio scale E.g., a marketing group might want to classify people into psychographic profiles regarding their tendencies to try or adopt new products – pioneers or early adopters, early majority, late majority, laggards 38 Factor vs. Cluster Analysis  Factor analysis focuses on creating linear composites of variables Number of variables with which we must work is then reduced Technique begins with a correlation matrix to seed the process  Cluster cases analysis focuses on 39 Potential Biases Asking the inappropriate or wrong research questions.  Insufficient literature survey and hence inadequate theoretical model.  Measurement problems  Samples not being representative.  Problems with data collection:   researcher biases  respondent biases  instrument biases  Data analysis biases:  Biases (subjectivity) in interpretation of results.  coding errors  data punching & input errors  inappropriate statistical analysis 40 Questions to ask: Adopted from Robert Niles Where did the data come from?  How (Who) was the data reviewed, verified, or substantiated?  How were the data collected?  How is the data presented?   What is the context? Cherry-picking?  Be skeptical when dealing with comparisons Spurious correlations 41 Copyright © 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E FIGURE 11.2
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            