* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What is a research question? (From topics to questions)
Survey
Document related concepts
Transcript
Data collation and analysis – – – – – – – – – – Quantitative Sampling Questionnaires statistical analysis SPSS Interpreting quantitative data (exercise) Qualitative Interviewing focus groups PAR techniques textual analysis Interviewing and focus groups (exercise) Some key concepts Formulating and testing your hypothesis Reliability of data – consistency over time (quant); internal consistency Rigour – sufficient evidence (appropriate methodology; consideration of competing evidence; generalisability) – Triangulation – Documentation Validity (vs bias) – Overall validity of research – Internal validity (internal logic and consistency – elimination of rival hypotheses; consideration of negative evidence; cross-validation of findings with other parts of data; ‘member checking’) – External validity (generalisability – is sampling diverse enough?; is context thickly described?; can concepts be applied to other settings?) Research design – Selecting variables Purpose of research – generally (though not always…) – Why does Y happen? – Does X cause Y? – Do A, B and/or C cause Y? Y = your dependent variable (by convention) The factors you are examining which may impact on/cause Y (e.g. X, A, B or C) are your independent variables Need to decide these in advance through literature analysis (though can adapt as go on) Sampling What is your population? (group you are studying) Too large to study all? - Sample Logic is that you analyse data from sample but extrapolate / draw conclusions for whole population Key question = how representative is your sample? – need to follow logic of your research question – Random (probability) sampling (general representativeness) – may be stratified to ensure particular categories included (women, men, youth, landowners…) – Purposive sampling (specific groups, variables) – Key informants Key questions – How big will sample be, and why? – How will it be chosen, and why? – What claims will you make for its representativeness? Questionnaires / Interview Schedules Formal, structured, closed (quant) vs semi-formal, semistructured, open-ended (qual) Structured questionnaires – Principally for ‘what’ questions – Standardised questions (tick box, assign value) Semi-structured interview schedules – Principally for ‘why’ or ‘how’ questions – Standardised set of topics / areas to be explored Back to research question – What type of information/data do you need? Generally useful to pilot either technique Rigour in questionnaire surveys and interview schedules Misinterpretation of question Interviewer bias With respect to surveys, correlation not equal to causation – rigour required in interpretation (supplement with qualitative data) Triangulation – Different methods – Different informants – Different researchers Documentation – Tape recordings, photography etc… – Reflexivity – record of evolving analysis, decisions, directions taken in research process Surveys and interviews compared (Table 7.2, p. 171, Thomas and Mohan) Design – complete vs. evolving Selection of informants – initial vs. evolving Selection of questions – complete vs. modified Nature of questions – identical vs. tailored Choice of research participants – random vs. purposive sample Data analysis – statistical vs. qualitative analysis Validation methods – probability criteria vs. triangulation Most appropriate use – ‘what’ (statistical extrapolation to wider population) vs. ‘why’ & ‘how’ Statistical Analysis (the logic) Quantitative (statistical) analysis consists of the study of the variation in variables and the co-variation between them. Conducted for large number of cases (though definitions of ‘large’ vary) Looks for patterns in data and the possibility of generalising from this to larger population – findings due to chance variation or are they ‘statistically significant’? Basic descriptive statistics – Mean (average), mode (most frequently occurring value) and median (middle value) – Variation – standard deviation (the higher the SD, the greater the deviation from mean) – Frequency distributions - histograms, pie-charts Types of variable Level of mmt Interval var. Ordinal var. Nominal var. Dummy var. Example 43% Lab; 10% Lib Dem 47% Con 1=Lab, 2=Lib Dem, 3=Con 1=Lab, 2=Con, 3=Lib Dem 0=Non-Labour, 1=Labour Additional info contained Quantity (Con is 4% more than Labour) Order (left-right wing) Mutual exclusivity n/a Relationships between variables (Correlations) Statistical analysis looks at relations between variables and allows us to assess the chances of being wrong when drawing conclusions A presumed causal effect is ‘strong’ if it appears to have extensive effects – causes big changes (graphs) It is ‘widespread’ if it occurs in many different circumstances – need to use multiple variables to examine this It is ‘significant’ if it is unlikely to occur as a fluke, but good chance that it will re-occur in instances where generalised – generally accepted at 95% chance (though the more cases examined the better) Relationships between variables Cross tabulations / bivariate analysis e.g. Table X: Wealth and educational attainment Completed 2˚ school Yes No Total Wealthy 83% 17% 100% Middle income 60% 40% 100% Poor 26% 74% 100% Can use for many more variables (multivariate analysis) Assessing the strength of the relationship Correlations between variables (r or R) (=relationship) – Pearson coefficient 2 variables related if their values tend to move together (e.g. taller people tend to be heavier) May have negative correlations (e.g. thicker clouds lead to diminished sunlight) r/R measures between -1 and +1; the higher the value, the greater the correlation Assessing the strength of the relationship General guidelines on strengths of correlations Strength of correlation (r/R) Interpretation with indiv-level data Interpretation with aggregate data 0.00-0.06 0.07-0.19 0.20-0.34 0.35-0.49 0.50-0.65 0.66-0.80 0.81-0.95 0.96-1.00 Trivial Slight Moderate Strong Spectacular Highly spectacular Suspect Very suspect Trivial Trivial Slight Moderate Strong Very strong Spectacular Suspect Relationships between more than 2 variables Multivariate analysis – Say have two sources of variance (wealth and gender) with respect to your dept. variable (educational attainment) and want to know about differences between groups, within groups and interactions of these… Table Y: Gender, wealth and educational attainment Women Men Wealthy 77% 88% Middle income 53% 67% Poor 12% 33% Relationships between more than 2 variables Simple correlation = relationship between 2 variables – Pearson coefficient (r) – from -1 to 1 – the closer the number to 1, the higher the correlation Multiple correlation = relationship between more than 2 variables – Most common method used in calculating the effects of several variables is regression analysis Relationships between more than 2 variables Regression analysis developed by geneticists to study the way in which offspring who are taller/shorter than their parents tend themselves to have children whose height moves back towards the average or regresses to the mean Measure = R squared (guidelines on strengths as in previous table for r/R) Statistical inference / inferential statistics How to generalise from sample Key question: How likely am I to be wrong if I infer that my findings apply to population as a whole? = statistical significance p (probability) = measure of statistical significance – obtained through chi-square test – E.g. p<0.05 means that I’m likely to be wrong (the findings came about by chance) 5% (5 out of each 100 times) of the time – Therefore, there is a 95% chance that my findings can be generalised to the whole population – This is generally the level required for results to be ‘statistically significant’ – Though caution… again depends on size of sample Determining causation Correlation yes, but how do we infer causation? Variables (say X and Y) must be related A time order between variables must be demonstrated – e.g. X (the cause) preceding Y (the effect) Must have a plausible theory showing causal links between the two variables Must consider and eliminate plausible rival hypotheses (Punch, 2005: 49) Data analysis: SPSS Statistical Package for the Social Sciences Capable of handling vast quantities of information / variables from structured surveys Basic and advanced statistical analysis – – – – – – – Descriptive statistics Cross tabulations correlations regressions ANOVA Forecasting, time-series Reliability tests Graphical representation Package is on DCU computers (limited copies can also be put on student’s laptops under the DCU licensing agreement) Buy a manual! All with me so far? Interpreting survey-based data (Exercise) See Table (photocopy) Your research question: What was the effect of SAPs on the growth of the informal sector in Tanzania? Using data from informal sector survey of 1991 1. What are the dependent and independent variables? 2. What can the data tell you in relation to your question? 3. What can the data not tell you? (its limitations)