Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The following lecture has been approved for Post Graduate University Students This lecture may contain information, ideas, concepts and discursive anecdotes that may be thought provoking and challenging It is not intended for the content or delivery to cause offence Issues raised in the lecture may require the viewer to engage in further thought, insight, reflection, critical evaluation, reading, independent study, watching more TV, or listening to Radio 4. Quantitative Research Methods for PhD students Prof Craig Jackson Head of Psychology Division Faculty of Education Law & Social Sciences Birmingham City University [email protected] Keep it simple “Some people hate the very name of statistics but.....their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the science of man.” Sir Francis Galton, 1889 Part 1 Probability, Error & Chance (crack these and research is easy) Curse of probability Few subjects more counter-intuitive than probability Understanding this is essential “Probability is common sense reduced to calculation” Pierre Simon Laplace “{statistics} are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the science of man." Sir Francis Galton UK National Lottery 1994 Choose 6 numbers between 1 and 49 Jackpot approx. £8 million for all 6 numbers Smaller prizes for 5 numbers, 4 numbers, and 3 numbers Week 1 - Nobody won Week 2 - Rollover Week 2 - Factory worker in Bradford won £17,880,003 using 26, 35, 38, 43, 47, 49 LOTTERY FEVER STRUCK THE UK! Insurance company protection “Out, thou strumpet, Lady Fortune” UK National Lottery Behaviour Buying 13,983,816 tickets = a win If only winner = £6 million loss If shared winner = lose more Rollover If only winner, possibly 14th Jan 1995 Rollover of £16,292,830 Shared between 133 people who chose no.s 7,17,23,32,38,42 If everyone selected numbers at random, only 4 should have picked this combination some curious human psychology at work UK National Lottery Behaviour Rule 1 Win a fortune Only bet when there is a rollover (Rollover Paradox) Rule 2 Never bet on numbers that other people will choose. Avoid numbers under 31 – birthday punters + amateur gamblers especially avoid 3, 7, 17 Do use “4” and “13” “Stupid” combinations are better e.g. “34,35,36,37,38,39” Probability is always ahead UK national lottery Draw no. 631 Wed 9th Jan 2002 Number Rack 3 13 15 18 39 47 16 20 19 28 www.llednulb.demon.co.uk 21 29 25 41 28 31 45 38 41 44 49 UK lottery ball frequency Draw no. 631 Wed 9th Jan 2002 140 120 Frequency 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Ball No. 29 31 33 35 37 39 41 43 45 47 49 Counter intuity 4000 flips of a Euro coin Lands on “heads” 2780 times (68%) Evidence of an unfair coin? Heads 1000 Tails 2000 3000 4000 5000 Probability Basics Expressed as “P” or “p” Decimal measure of the likelihood of something happening P ranges from 0 through to 1 Certain events, P=1 Impossible events P=0 Equally likely events P = 0.5 java applet site demonstrations www.mste.uiuc.edu introductory article on probability Cohen, J & Stewart I (1998) That’s amazing isn’t it? New scientist, 17 Jan. pp24-28 Combining Probabilities Study 1. Drug x is more effective than a placebo in male patients Study 2. Drug x is more effective than a placebo in female patients Study 3. (Combining the data from study 1 & 2) Drug x is less effective than a placebo in all patients Basic Scientific Methodology VARIABLES IV DV Controlled SAMPLING Skewed, Methods, Bias SUBJECTS Independent, Matched, Repeated PROBABILITY P values ERRORS Type 1 and Type 2 SENSITIVITY Tweaking the methodology Types of Error #1 CONSTANT ERRORS lack of control poor variable measurement wrong tools for measuring the variable(s) HOW TO REMOVE / CONTROL CONSTANT ERRORS redefine troublesome variables control troublesome variables control measurement of variables Everything has errors Werner Heisenberg Science involves proving changes in dependent variables are due to (manipulation of) particular independent variables Need to prove random luck alone has not produced changes in the dependent variables that were observed Heisenberg’s uncertainty principle (1927) is an eternal problem for researchers Cannot objectively measure a phenomenon without effecting the phenomenon in some way. e.g. scanning electron microscopes Types of Error #2 RANDOM ERRORS natural fluctuation of the universe natural blips occurring in our variables and data little can be done about them universe is a “random” and chaotic place RANDOM ERRORS ARE HERE TO STAY scientific methods have to take account of this random errors cancel themselves out with a random sample Q.E.D the need for a truly random sample The Meaning of P World is chaotic Need to know what causes the observed results in data Random luck / natural flux, or the IV ? Use of an “arbitrary” figure (95% certainty) to let us decide THE P VALUE IN SCIENTIFIC TERMS A measure of likelihood of error in our results The likelihood of the DV being changed by random errors alone, and not the IV The Meaning of P Statistical software gives a p value Has calculated the likelihood of such results happening by chance < 5% and it can be assumed that such results have not occurred by chance “P > 0.05” results are likely to have been derived from random or constant errors (or both) and the IV was unlikely to have had any effect on the DV. NON-SIGNIFICANT i.e. something else changed the DV The Meaning of P “P = 0.05” or “P <0.05” results are unlikely to have derived from random or constant errors, and the IV can be held responsible for the changes in the DV. SIGNIFICANT Repeating experiments is the only sure way of establishing if this is really true e.g. “The mean age of males in the group (n=64) was 45 years (±3) and the mean age of females (n=59) was 37 years (±5); P=0.05 and therefore males were significantly older than females”. Errors continued. . . . . TYPE 1 ERRORS Claim that the IV produces an effect on the DV when it did not A false positive TYPE 2 ERRORS Claim that the IV did not produce an effect on the DV, when in fact it did A false negative Part 2 Data Considerations How Many Make a Sample? “8 out of 10 owners who expressed a preference, said their cats preferred it.” How confident can we be about such statistics? 8 out of 10? 80 out of 100? 800 out of 1000? 80,000 out of 100,000? Types of Data / Variables Continuous Discrete BP Height Weight Age Children Age last birthday colds in last year Ordinal Nominal Grade of condition Positions 1st 2nd 3rd “Better- Same-Worse” Height groups Age groups Sex Hair colour Blood group Eye colour Conversion & Re-classification Easier to summarise Ordinal / Nominal data Cut-off Points (who decides this?) Allows Continuous variables to be changed into Nominal variables BP > 90mm Hg = Hypertensive BP =< 90mm Hg = Normotensive Easier clinical decisions BMI Categorisation reduces quality of data Statistical tests may be more “sensational” Good for summaries Bad for “accuracy” Obese vs Underweight Dispersion Range Spread of data Mean Arithmetic average Median Location Mode Frequency SD Spread of data about the mean Range 50-112 mmHg Mean 82mmHg SD ± 10mmHg Median 82mmHg Mode 82mmHg Multiple Measurement of small sample 25 cell clusters 26 22 cell clusters 25 24 24 cell clusters 23 22 21 21 cell clusters 20 Total Mean SD = 92 cell clusters = 23 cell clusters = 1.8 cell clusters Small samples spoil research N Age IQ N Age IQ N Age IQ 1 2 3 4 5 6 7 8 9 10 20 20 20 20 20 20 20 20 20 20 100 100 100 100 100 100 100 100 100 100 1 2 3 4 5 6 7 8 9 10 18 20 22 24 26 21 19 25 20 21 100 110 119 101 105 113 120 119 114 101 1 2 3 4 5 6 7 8 9 10 18 20 22 24 26 21 19 25 20 45 100 110 119 101 105 113 120 119 114 156 Total Mean SD 200 20 0 1000 100 0 Total Mean SD 216 21.6 ± 4.2 1102 110.2 ± 19.2 Total Mean SD 240 24 ± 8.5 1157 115.7 ± 30.2 Presentation of data Table of means Exposed n=197 Controls n=178 Age (yrs) 45.5 ( 9.4) I.Q 105 ( 10.8) Speed 115.1 (ms) ( 13.4) T P 48.9 ( 7.3) 2.19 0.07 99 ( 8.7) 1.78 0.12 94.7 ( 12.4) 3.76 0.04 Correlation and Association Correlation is a numerical expression between 1 and -1 (extending through all points in between). Properly called the Correlation Coefficient. A decimal measure of association (not necessarily causation) between variables Correlation of 1 Maximal - any value of one variable precisely determines the other. Perfect +ve Correlation of -1 Any value of one variable precisely determines the other, but in an opposite direction to a correlation of 1. As one value increases, the other decreases. Perfect -ve Correlation of 0 - No relationship between the variables. Totally independent of each other. “Nothing” Correlation of 0.5 - Only a slight relationship between the variables i.e half of the variables can be predicted by the other, the other half can’t. Medium +ve Correlations between 0 and 0.3 are weak Correlations between 0.4 and 0.7 are moderate Correlations between 0.8 and 1 are strong With a scatter diagram, each individual observation becomes a point on the scatter plot, based on two co-ordinates, measured on the abscissa and the ordinate ordinate Correlation and Association abscissa Two perpendicular lines are drawn through the medians - dividing the plot into quadrants Each quadrant should outlie 25% of all observations Part 3 Research Design It all depends on the size of the needle Background on Surveys • Large-scale • Quantitative • Can be descriptive (“2% of women think they are beautiful”) • Can be inferential (“Significantly more single women think they’re beautiful than married women do”) • Done with a sample of patients, respondents, consumers, or professionals • Differences between any groups assessed with hypothesis testing Important that sample size must be large enough to detect any such difference if it truly exists Importance of Sample Size • “Forgotten” in many studies • Little consideration given • Appropriate sample size needed to confirm / refute hypotheses • Small samples far too small to detect anything but the grossest difference • Non-significant results are reported as “significant” – Type 2 error • Too large a sample – unnecessary waste of (clinical) resources • Ethical considerations – waste of patient time, inconvenience, discomfort • Essential to assess optimal sample size before starting investigation Qualitative studies need to sample wisely too… Asian GPs’ attitudes to ANP Objective: To determine attitudes to ANP among Asian doctors in East Birmingham PCT Method: Send invitation to 55 Asian GPs (Approx 47% of East Birmingham PCT) Intends to interview (30mins) with first 20 GPs who respond Sample would be 36% of Asian GPs – and only 17% of GPs in PCT Severely Biased Research (and ethically dodgy too) Sampling a Population Process of selecting units (e.g. people, organisations) from a population Generalise results to the population First question should be… Who do you want to generalize findings to ? POPULATIONS The POPULATION Sampling a Population A POPULATION REPRESENTATIVE SAMPLE (theoretical) ACCESSIBLE SAMPLE (actual) Are this lot are REPRESENTATIVE of the POPULATION ? Types of Sampling CONSCRIPTIVE sampling QUOTA SAMPLING sampling Ethically unsound Bias Favourite of ICM and MORI Quotas of the population Efficient Flaw potential RANDOM sampling OPPORTUNISTIC sampling Theoretically ideal Costly Time-consuming All elements of the population Desperate measure Take any subject available Cheap Fast Bias N of population Distributions 5’6” 5’7” 5’8” 5’9” 5’10” 5’11” Height RANDOM sampling OPPORTUNISTIC sampling CONSCRIPTIVE sampling QUOTA sampling 6’ 6’1” 6’2” 6’3” 6’4” Specificity and the acceptable N Jackson’s paradox Relative population size As study populations become smaller, acceptable study sample sizes reduce Population size Acceptable sample size General Pop Working Pop Specific Pop Rare Pop Specificity and the acceptable N Student Pop I.D Forces yachting training schools E.M Companies using stress counselling S.M Divers and ear barotrauma N.O Solvent exposure in Myanmar V.W Routine flu vaccinations A.F Dermatitis in hairdressers S.M O.H needs of NHS staff T.R NIHL in student employees I.C Blood tests in British Army pilots O.Y Upstream oil company deaths A.A Renal colic in flight deck crew A.C Hepatitis B in army regulars and territorials N 300 150 142 80 900 102 23 14 408 161 254 476 indepth yes yes Selection Bias Gulf War A&E Violence Syndrome C dif Call Centres Sampling properly is Crucial Samples may be askew Specialist publications attract a specialist response group Exists a self-selection bias of those with special interests Controversial topics, or litigious areas Depleted Uranium Weaponry Organophosphate Pesticides Stress THIS IS AN INHERENT PROBLEM WITH HEALTH RESEARCH COMBAT IT WITH LARGE SAMPLES AND CLEVER METHODOLOGY Telecomms Sampling Keywords POPULATIONS Can be mundane or extraordinary SAMPLE Must be representative INTERNALY VALIDITY OF SAMPLE Sometimes validity is more important than generalizability SELECTION PROCEDURES Random Opportunistic Conscriptive Quota Sampling Keywords THEORETICAL Developing, exploring, and testing ideas EMPIRICAL Based on observations and measurements of reality NOMOTHETIC Rules pertaining to the general case (nomos - Greek) PROBABILISTIC Based on probabilities CAUSAL How causes (treatments) effect the outcomes Example 1 - Independent Design Workers exposed to pesticide versus controls (not exposed to pesticide) Independent T test Age Exposed n=5 Controls n=5 T P 25.2 (sd 2.7) 26.4 (sd 2) -.77 .46 14.8 (sd 4.9) .65 .53 Psych 16.8 (sd 4.7) Example 2 - Matched Design Workers exposed to pesticide versus controls not exposed to pesticide Paired Samples T test Exposed n=5 Controls n=5 30.8 (sd 7.6) 30.8 (sd 7.6) Psych 13.8 (sd 2.1) 19.8 (sd 4.5) Age T P -4.8 .008 Example 3 - Repeated Design Workers before and after exposure to pesticide Independent T test Pre n = 10 Psych 14.1 (sd 5.7) Post n = 10 T P 19.9 (sd 4.2) 2.5 .02 N numbers doubled from independent methods Repeated subjects is efficient Sampling & Deployment RANDOM SAMPLING Selecting a sample from the POPULATION Related to the EXTERNAL VALIDITY of the research, Related to the GENERALIZABILITY of the findings to the POPULATION RANDOM ASSIGNMENT How to assign the sample into different treatments or groups Related to the INTERNAL VALIDITY of the research Ensures groups are similar (EQUIVALENT) to each other prior to TREATMENT Both RANDOM SAMPLING and RANDOM ASSIGNMENT can be used together, or singularly, or not all… Waste of time randomly sampling but not randomly allocating Having a choice in this matter is a luxury Power Hierarchy of Study Designs Best - Repeated Subjects / Repeated Measures comparing like with like each subject ”stays the same” in other factors reduces the need for covariate adjustment in analyses “doubles” the number of subjects Middle - Matched subjects important factors are matched between groups unmatched covariates still need to be adjusted for not comparing like with like in all respects Weakest - Independent subjects comparing groups which may be vastly different covariate adjustment is needed need to use strict exclusion criteria in order to maintain comparability Final Points Bias Avoiding bias is a good aim to have Not necessarily everything in research Existence of some bias in a sample does not ruin a project entirely Spector et al., (2000) shows the “inflating effect” of self-report bias may not be so prominent Mostly leads to underestimation rather than overestimation of any main effects Spector PE, Chen PY, O’Connell BJ. A longitudinal study of relations between job stressors and job strains while controlling for prior negative affectivity and strains. Journal of Applied Psychology 2000; 85: 211-218. Final Points Generalizability in epidemiological investigation Basic principles: Internal validity is always more important than its generalizability Never appropriate to generalise an invalid finding Mant et al. (1996) Mant J, Dawes M, Graham-Jones S. Internal validity of trials is more important than generalizability. British Medical Journal 1996; 312: 779. Part 4 Validity Validity Important consideration Example project: access to 300 workers workers’ ability is assessed workers attend a 1 week training course workers’ ability is assessed again classic within-subjects design (pre-post test design) Design Concept - Between subjects method 300 subjects randomised 150 control group 150 intervention group assess ability control results intervention results compare mean scores Design Concept - Within-subjects method - better 300 subjects randomised 300 control group assess ability #1 training course 300 treatment group assess ability #2 Threats to within-subjects designs 100 75 Observe increase after training course Gain from test #1 to test #2 scores 50 25 0 Student concludes the outcome (improvement) is due to training Could this be wrong? some threats to internal validity that critics (examiners) might raise and some plausible alternative explanations for the observed effects History threats Some “historical” event caused increase – not the training TV & other media Sesame Street, Countdown, Tomorrow’s World, Open University Elementary intellectual content Can be mundane or extraordinary “Specific event / chain of events” British Journal of Psychiatry (2000) 177: pp469-72 Maturation threats “Age is the key to wisdom” Improvement would occur without any training course Measuring natural maturation / growth of understanding Effects up to a certain limit Differential maturation Similar to “history threat”? Testing threats Specific to pre-post test designs Taking a test can increase knowledge Taking test #1 may teach participants Priming – make ready for training in a way they would not be Heisenberg’s Uncertainty Principle (1927) Instrumentation threats Specific to pre-post test designs “Making the goals bigger” Taking a test twice can increase knowledge Studies do not use same test twice Avoiding testing threats Perhaps 2 versions of the test are not really similar The instrument causes changes not the training course Instrumentation threats (further) Specific to pre-post test designs Especially likely with human “instruments” Observations or Clinical assessment 3 Factors Observers fatigue over time Observers improve over time Different observers Mortality threats Metaphorical Dropping out of study Obvious problem? Especially when drop out is non-trivial N = 300 take test #1 N =50 drop-out after taking test #1 N = 250 remain and take test #2 What if the drop-outs were low-scorers on test #1? (self-esteem) Mortality threats (further) Mean gain from test #1 to test #2 Using all of the scores available on each occasion Includes 50 low test #1 scorers (soon-to-be-dropouts) in the test #1 score Mean score Test #1 (n=300) 60.5 (± 9.7) Test #2 (n=250) 81.6 (± 8.9) Problem - - drops out the potential low scorers from test #2 Inflates mean test #2 score over what it would be if the poor scorers took it Solution - - compare mean test #1 and test #2 scores for only those workers who stayed in the whole study (n = 250)? No!!! - - a sub-sample is certainly not representative of the original sample Mortality threats (further) Degree of this threat gauged by comparison Compare the drop-out group (n = 50) with the non drop-out group (n = 250) e.g. using test #1 scores demographic data – especially age & sex If no major differences between groups: Reasonable to assume mortality occurred across entire sample Reasonable to assume mortality was not biasing results Depends greatly on size of mortality N Regression threats Things can only get better – things can only get worse “Regression artefact” “Regression to the mean” Purely statistical phenomenon Whenever there is: a non-random sample from a population two measures imperfectly correlated (test #1 and test #2 scores) these will not be perfectly correlated with each other Regression threats Few measurements stay exactly the same – confusing? e.g.: If a training program only includes people who are the lowest 10% of the class on test #1, what are the chances that they would constitute exactly the lowest 10% on test #2? Not very likely ! Most of them would score low on the post-test but unlikely to be the lowest 10% twice! The lowest 10% on test #1, they can't get any lower than being the lowest -they can only go up from there, relative to the larger population from which they were selected Summary of single-group threats History threats Maturation threats Testing threats Instrumentation threats Mortality threats Regression threats Part 5 Good Practice Design & Ethical Approval Good research should be... Justified Well planned Appropriately designed Ethically approved Ethical misconduct not to meet this standard? Design & Ethical Approval Research should be driven by protocol Pilot studies should have a written rationale Protocols should answer specific questions Not just “collecting data” Protocols must be agreed by all contributors & participants Keep the protocol as part of the Research record / log Design & Ethical Approval Statistical issues should be considered before data collection Power calculations are becoming essential Formal documented ethical approval is required for all research involving people medical records anonymous human tissue Human tissue studies - Nuffield Council on Bioethics Fully informed consent should always be sought If not possible (deceptive studies) a research ethics committee should decide Design & Ethical Approval If participants cannot give fully informed consent, research should follow international guidelines (Council for International Organizations of Medical Sciences - CIOMS) Animal experiments require full compliance with local, national, ethical, and regulatory principles, along with local licensing arrangements Formal supervision should be provided for all research projects, Including: frequent review quality control long term retention of records (up to 15 years) Precise roles and tasks of contributors should be agreed as soon as possible Data Analysis Data should be appropriately analysed Inappropriate analysis does not amount to misconduct (yet) Fabrication and Falsification of data do constitute misconduct Recent Ruling by Nurses’ Governing body Data Analysis All sources and methods used to obtain data should be disclosed Includes electronic pre-processing Explanations should be given for any exclusions Methods of analysis must be explained in detail and referenced if not in common use Post-hoc analysis of subgroups is acceptable if disclosed. Failure to disclose that some analysis was post hoc is unacceptable Discussion sections should mention any issues of bias which have been considered, and explain how they have been dealt with in the study design Authorship There is no universally agreed definition As min. authors should be responsible for at least one section of the study Balances intellectual contributions to the conception, design, analysis, and writing of the study, against the collection of data and other routine work No task = No credit Decide early: who will be authors who will be acknowledged Public responsibility for the content of the work by all (Multidisciplinary work makes this slightly harder) If uncertain, read the target journal’s “advice to authors” Conflict of Interests May not be fully apparent to all concerned Impartial opinion sought May influence the judgement of authors, reviewers, or editors “Those facts, which when revealed later, would make a reasonable reader feel misled or deceived” Personal, commercial, political, academic, or financial Financial conflicts may include: employment stock / share ownership travel funding honorariums consultancies etc. Plagiarism Ranges from un-referenced use of others’ published and unpublished ideas May occur at any stage of planning, research, writing, or publication Applies to both print and electronic formats All sources should be disclosed If large amounts of other peoples’ written or illustrative material is to be used permission must be sought Media Relations Medical research findings of increasing interest to the print, broadcast, and narrowcast media. Journalists may attend scientific meetings Where preliminary research findings are presented, may lead to premature publication in mass media. Authors approached should give as balanced account of work as possible, ensuring to point out where evidence ends and speculation begins Simultaneous publication in the mass media and a peer review journal is advised Authors should help journalists to produce accurate reports Media Relations Refrain from supplying additional data Patients taking part in the research should be informed of results by authors before the mass media, especially if clinical implications Authors should insist in being advised in advance if journalists are attending scientific meetings Authors should ask journals where their work appears if any media policies are operating Council for International Organizations of Medical Sciences (CIOMS). International Guidelines for Ethical Review of Epidemiological Studies. Geneva: WHO, 1991. Nuffield Council on Bioethics. Human tissue: Ethical and legal issues. London: Nuffield Council on Bioethics, 1995 If you or anyone you know has been affected by any of the issues covered in this lecture, you may need a statistician’s help: www.statistics.gov.uk Further Reading Abbott, P., & Sapsford, R.J. (1988). Research methods for nurses and the caring professions. Buckingham: Open University Press. Altman, D.G. (1991). Designing Research. In D.G. Altman (ed.), Practical Statistics For Medical Research (pp. 74-106). London: Chapman and Hall. Bland, M. (1995). The design of experiments. In M. Bland (ed.), An introduction to medical statistics (pp5-25). Oxford: Oxford Medical Publications. Bowling, A. (1994). Measuring Health. Milton Keynes: Open University Press. Daly, L.E., & Bourke, G.J. (2000). Epidemiological and clinical research methods. In L.E. Daly & G.J. Bourke (eds.), Interpretation and uses of medical statistics (pp. 143-201). Oxford: Blackwell Science Ltd. Jackson, C.A. (2002). Research Design. In F. Gao-Smith & J. Smith (eds.), Key Topics in Clinical Research. (pp. 31-39). Oxford: BIOS scientific Publications. Further Reading Jackson, C.A. (2002). Planning Health and Safety Research Projects. Health and Safety at Work Special Report 62, (pp 1-16). Jackson, C.A. (2003). Analyzing Statistical Data in Occupational Health Research. Management of Health Risks Special Report 81, (pp. 2-8). Kumar, R. (1999). Research Methodology: a step by step guide for beginners. London: Sage. Polit, D., & Hungler, B. (2003). Nursing research: Principles and methods (7th ed.). Philadelphia: Lippincott, Williams & Wilkins.