Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Omnibus test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Quantitative Research methods for the Social Science, 7.5 hp Week 2 (4+h material with exercises) Population Sample survey (random sample) population Data, observations Inferential statistics = Draw conclusions about the population from the sample survey. Statistics • • • • Statistics is about collecting, organizing, analyzing and presenting data. That is what we do when we are doing research on for instance how many have been unemployed for more than a year, or what the figures for car sales were the first three months of this year. Concepts • Population: A group of individuals which we want to investigate. • Total survey: All the units in a population are investigated. • Sample survey: A subsample of the populations is chosen and investigated. • Random sampling: The sample units are chosen by some random mechanism • Variable: Property connected with the units in a population. Population Sample survey (random sample) population Data, observations Inferential statistics = Draw conclusions about the population from the sample survey. Do science, statistical perspective • Formulate a research problem. • Define the population and plan a random sample survey. • Find relevant variables to measure. • Make descriptive statistics of the sample. • Make inferential statistics to generalize what you find to entire population. • Write report. Where it can go wrong! • A study is carried out to understand the training habits of students at Umeå university. • The researcher hands out questionnaires at the entrance of Iksu. • Result: Students at Umeå University train a lot more than expected. • Where did it go wrong? Where it can go wrong! • A study is carried out to find out if students at Umeå University prefer the campus pups more than the inner city pubs. • The researcher hands out questionnaires in the queue at a campus pub. • Result: A majority of the students prefer campus pubs. • Where did it go wrong? Why make a random sample? • If the sample is random it is possible to use probability theory to control the error that is arising from the fact that we just study a sample and not the entire population. This is impossible if the sample is not random. Make a random sample. • Give objective measures of the precision of the results of the survey. • Make objective comparisons between different sampling plans prior to the survey. • Calculate how large samples you need in order to achieve a certain margin of error. Find relevant variables to measure. Variable: Property connected with the units in a population. Measurement: An allocation of numbers to the subjects in a survey such that specific relationships between the subjects, in consideration to some specific property, can be seen in the numbers. Why do we measure? • To describe, To compare, To evaluate. Examples of things we want to measure: • Length • Stress • Welfare • Consumer satisfaction. Data levels Data levels are important because different level of the data means different methods of analyzing the data. • Nominal Data – Classification, • Ordinal Data – Classification and Order • Scale Data (Interval or Ratio) – Classification, Order, and Equivalent distance Exercise 1: Variable type? • • • • • • • • Age Age group 25-34, 35-44, 45-54,... Sex (male/female) Education (primary, secondary, university) Smoker (yes/no) BMI (23.45, 28.12,…) Car model (Volvo, Saab, Fiat) Temperature (12C, -4C, 14C,…) Descriptive statistics • Measures of location – mean – median • Measure of spread – – – – – range (min-max) variance standard deviation, SD standard error of the mean, SEM percentiles/quantiles (p25, p75, q1, q3,...) • Frequency tables • Graphs – barchart/histogram – boxplot – scatterplot Center and spread Answering a research questions is often to compare measures. Exercise 2: What is a boxplot • Black board example: Number of earrings of males and females in an African tribe • Sample 11 males, 11 females • Females: 3 4 7 5 3 6 4 4 2 3 4 • Males: 10 5 15 17 18 23 10 8 7 22 16 • Construct a box plot for each gender • Is there a difference between gender? Mean and Variation same? Mean and Variation same? Descriptive Statistics Descriptive Statistics: Number of earrings • • • Gender Minimum f 2,000 m 5,00 Q1 Median 3,000 4,000 8,00 15,00 Q3 Maximum IQR 5,000 7,000 2,000 18,00 23,00 10,00 • Gender N Mean SE Mean StDev Variance • f 11 4,091 0,436 1,446 2,091 • m 11 13,73 1,84 6,10 37,22 Workload and exam result investigation. Is there a difference in the study results between males and females? If so, what does the difference depend on? A sample of graphs and plots. Exam results (scale) Workload (scale) Histogram of Exam Score (scale) Bar chart Grades (ordinal/ nominal) Pie Chart of Grade (ordinal/ nominal) Boxplot of Exam Score, gender (scale vs nominal) Bar chart of Grade , gender (nominal vs nominal) Boxplot of Total Study Time, gender (scale vs nominal) Scatter plot (scale vs scale) Is there a relation? Inferential statistics Is there a difference in how well females and males perform on the exam?(this week) Is there a difference in how much females and males study to the exam? (this week) Is there a difference in how well females and males perform on the exam if we take the time the students study time into account?(next week) Inferential statistics is a collection of methods used to draw conclusions or inference about the characteristics of populations based on sample data. Exercise 3: Design an experiment • We want to examine if there is a difference between mosquito cream A and B. • Material for the experiment. – 30 students with naked arms – 1 bottle of mosquito cream A – 1 bottle of mosquito cream B – A forest full of mosquitoes How do you perform the experiment and what data do you gather? Inferential statistics (The idea) Hypothesis testing In research we want to get answers to posed questions (hypothesis). • Are all coffee flavors equally popular? • Is the use of bike helmets effective in protecting people in bicycle accidents from head injuries? • Is there a connection between gender and alcohol consumption among the students at Umeå university? HYPOTHETIC-DEDUCTIVE METHOD 1 Hypothesis Statement Deduction – logically valid argument (Predictive inference) 2 3 Induction (Inductive inference) 1Tries to predict what will happen if the hypothesis holds. 2 ”Dialogue with reality” Observation Logical valid hypothesis (example) Valid Hypothesis: The animal is a horse. Statement: If the animal is a horse it will have four legs. Observation: The animal has not four legs. Conclusion: The animal is not a horse. Invalid Hypothesis: The animal is a horse. Statement: If the animal is a horse it will have four legs. Observation: The animal has not four legs. Conclusion: It is a horse. Non valid conclusion. It can be a pig or some other animal. Contradiction proofs Within statistical hypothesis testing (inference theory) we are not looking for ”impossible” events” in order to reject posed hypotheses. (e.g. it is impossible that the animal has six legs if it is a horse. If the animal has six legs the hypothesis ”it is a horse” is rejected.) Instead we are looking for contradictions in terms of ”improbable events”. Improbable event Assume that we suspect that the usage of bicycle helmets is an effective way to protect people in bicycle accidents from skull damage. Null hypothesis: The percentage of persons with skull damage after a bicycle accident is the same whether or not they use bicycle helmets. Statement: If the percentage of persons with skull damage after a bicycle accident is the same whether or not they use bicycle helmets, in a sample survey there should only be a small difference in the percentage of people with skull damage in the two groups. If the hypothesis holds, it is an improbable event in a sample survey, to observe a large percentage difference between these kinds of groups. Improbable event Assume that we suspect that there is a difference between male and female students at Umeå university concerning the opinion about EMU. Null hypothesis: The percentage of students that are against EMU is the same whether or not they are male or females. Statement: If the percentage of students that are against EMU is the same whether or not they are male or females, in a sample survey there should only be a small percentage difference of students against EMU between the two groups. If the hypothesis holds, it is an improbable event in a sample survey, to observe a large percentage difference between these groups. Test statistic, What is a improbable even? Within statistical inference theory the statements are summarized in a test statistic. From our hypothesis and from the probability theory we can derive the distribution of the test statistic if the null hypothesis is true. Next, we draw a sample and calculate a value of the observed test statistic and compare it with the derived distribution to understand if we have an improbable event. If we get an improbable event the null hypothesis is rejected. P-value The P-value describes how improbable the event is. If the p-value is small, we either have something which is improbable or the null hypothesis does not hold. If the p-value is small (< 0.05 or <0.01) the null hypothesis should be rejected. (The level 0.05=5% or 0.01=1% is called significant level. More about this later.) Coffee Example 100 people took part in a survey about different brands of coffee. Each person tasted four different brand (in a blind test), and noted which one they preferred. The result of the test was as follows: Brand: Ellips Gexus Luber Eco Number of people 26 28 16 30 Does the result of the survey show that any of the brands are more popular than the others, or are they all equal? In statistical terms we can formulate the problem as: Null hypothesis: All the coffee brands are equally popular. Alternative hypothesis: All the coffee brands are not equally popular. If the null hypothesis is true, we could expect the following result of the survey: Brand: Ellips Gexus Luber Eco Number of people 25 25 25 25 Can we with a significance level of 5% reject the null hypothesis? One way of measuring how much the observed table differs from the expected table is to look at the differences: Brand: Ellips Gexus Luber Eco Number of people 26 28 16 30 Brand: Ellips Gexus Luber Eco Number of people 25 25 25 25 If we square the difference and divide with the expectation and sum over all brands we get a teststatistic called “Chi square”. It’s possible to derive the distribution of the test statistic under assumption that the null hypotheses is true. 26 25 2 2 obs 25 4.64 28 25 2 25 16 25 2 25 30 25 2 25 Chi-square distribution Is 4.64 an improbable event? If the null hypothesis is true, ought to be close to zero. Is 4.64 so far away from zero that we can reject the null hypothesis? 2 obs We compare the obtained p-value with our chosen level of significance. Observed p-value: 0,20 Conclusion? Distribution under the null hypothesis. (To get 4.64 or more is not unusual. We can not reject the null hypothesis.) Choose the right test • Hand out the summery picture. (Reminder) 1) Decide what problem objective is 2) What is the data type? (scale, ordinal, nominal) 3) Make assumptions/approximations to reach a test. (Make sure you check the assumptions or chose a test that is robust against miss modeling.) Significance level • When deciding what a improbably outcome is you compare the P-value connected to outcome with the significance level. • The significance level is a ”risk-level” you decide yourself before you conduct the test. • Common level of significance is 5% and 1% Type I and II errors • A Type I error is made when we reject the null hypothesis and the null hypothesis is actually true (incorrectly reject a true H0). The probability of making a Type I error is the significance level . Type I and II errors • A Type II error is made when we fail to reject the null hypothesis and the null hypothesis is false (incorrectly keep a false H0). Power of a Test • The power of a test is the ability to reject a false null hypotheses. Questions in class 1) What happens to the power if the significance level increase/decrease? Why? 2) What happens to the power if the sample size increase/decrease? Why? The steps when analyzing data statistically. 1) Make descriptive statistic relevant for the research question/ hypotheses. 2) Construct statistical hypotheses H0 and HA, connected to your research question. In H0 you put the statement you want to reject. 3) Pick a significance level. 4) Choose a appropriate test and check the assumptions of the test. 5) Evaluate P-value and draw conclusion. Example: Earring data • 1) Descriptive statistics 2: Hypotheses • A) Two sided test – H0:There is a no difference in mean number of earrings between males and females – HA: there is a difference. • B) One sided test (one side) – H0: Men have (in mean) equal or more earrings – HA: Class: help me • C) One sided test (other side) – H0: Males have (in mean) equal or less earrings – HA: Class: help me • 3 Pick a significance level. – Significance level=5% (That is we decide type I error =5%) 4: Choose a appropriate test Look Chart: Compare two populations Data type = interval Descriptive measurement = Central location Experimental design = Independent samples Population distributions= normal (assumption) Population variances = Unequal Result: T-test (with unequal variances) Put data in computer and calculate Difference = mean (f) - mean (m) Estimate for difference: -9,64 A) T-Test of difference = 0 (vs not =) P-Value = 0,000 B) T-Test of difference = 0 (vs <): P-Value = 0,000 C) T-Test of difference = 0 (vs >): P-Value = 1,000 Learn how to interpret out print of the statistical software. 5) Conclusions: Class help me Reasons for non-significant results • There is no difference • There is a difference, but we have too few observations to detect it • Important. The fact that we can’t reject the null hypothesis does not mean that the null hypothesis is true. Normal probability distribution • Common assumption in several statistical tests • Does NOT mean that the observations are distributed as they normally would be. • Notion: N(mean, variance) Normal probability distribution 250 Mean=0 200 Frequency SD=1 150 100 50 0 -3 -2 -1 0 1 2 3 N(0,1) ~68% of obs. within mean ±1SD ~99.7% within ±3SD ~95% within ±2SD Estimates and confidence intervals Ex. Estimate the mean length of the population in Umeå by measuring a sample of 10 individuals Estimate = sample mean 95% confidence interval = mean 1,96SEM (SEM= Standard Error of Mean=standard deviation of mean) 95% confidence interval is an interval that with 95% probability will cover the population value. (what we want to estimate) Normal probability distribution • How do I know if my variable is normal distributed? – continuous variable, no cut-off point – draw histogram, normal probability plot – symmetric, bell-shaped, mean=median – Unsure? Use non-parametric tests if available How to know if data is normal distributed? Parametric/non-parametric test • Parametric tests: – if data are normal distributed – All information in sample can be summaries in the mean and standard deviation. • Non-parametric tests: – primarily if data are not normal distributed – can also be used if data is normally distributed, but less powerful – less sensitive to outliers Exercise 4: back to mosquito example. • Summary of students ideas at blackboard (reminder) • There are no right way. Just ways that uses different assumptions about reality. Three reasonable ways can be found in the chart. • State weakness of each analysis. Experimental design • Put the mosquito cream A on a random selected arm for all 30 people. Put cream B on the other hand. • Let the students walk the forest (with lots of mosquitoes) • Count the number of mosquito bits on each arm. (After 1 hour.) • H0: The mosquito creams are equally effective • HA: The mosquito creams are not equally effective. 1: Sign test (or equivalent) For each person calculate the number of mosquito bites on arm A minus the mosquito bites on arm B. If the result is positive associate the person with ”+” and if the result is negative associate the person with a ”-”. Count the number of “+”. 1: Sign test (or equivalent) • If H0 is true the we expect about 15+ and 15 – 1: Sign test (or equivalent) • If A is the better we expect many + 1: Sign test (or equivalent) • If B is the better we except few + 1: Sign test (or equivalent) • We reject H0 if we get many + or few + if H0 true. Reject region less than 9 or more than 21 gives test with significance level 4.28% 1: Sign test (or equivalent) • Weakness and strength of the test – Only care about + or – not how big the difference is. + No distribution assumption. (works if we have even less than 30 people) + Eliminates variation between persons. – No good way to handle ties. (same number of -bites on each arm.) 2: T-test (not paired) • Calculate the mean number of mosquito bites on arms A minus mean number of mosquito bites on arms B We call this number T (You can divide T with the estimated standard deviation of T to get a normalized test statistic, that is T-distribute. That is what the computer do.) 2: T-test (not paired) • If H0 is true we expect T to be close to zero. 2: T-test (not paired) • We reject H0 if T far out in the tails. 2: T-test (not paired) • Weakness and strength of the test – Normal distribution assumptions. (Works really good if sample 30 or more, confer CLT ) – Continuous approximation to discreet data may be a bad approximation. – Do not eliminate variation between persons. 3: paired T-test • For each person calculate the number of mosquito bites on arm A minus the mosquito bites on arm B. Calculate the mean of the numbers you get and call this number T. (You can divide T with the estimated standard deviation of T to get a normalized test statistic, that is T-distribute. That is what the computer do.) 3: paired T-test • If H0 is true we expect T to be close to zero. Typically we get smaller variation. Between persons variation eliminated. More powerful test 3: paired T-test • We reject H0 if T far out in the tails. 3: paired T-test • Weakness and strength of the test – Normal distribution assumptions. (Works really good if sample 30 or more, confer CLT ) – Continuous approximation to discreet data may be a bad approximation. + Do eliminate variation between persons. Typically if between subject variation is large the test is much more powerful than the not paired T-test. 4: Poisson model (beyond this course) • Model: the intensity of mosquito bites of each persons is Poisson distributed with a mean equal to a person effect * cream effect (A or B). • Estimate cream effects and evaluate • Only weakness: Assumed interaction pattern between persons and creams. If time 1: Why the tests works • “The law of large numbers (LLN). Given a sample of independent and identically distributed random variables (SRS) with a finite population mean, the average of these observations will eventually approach and stay close to the population mean.” • This result tells us that the larger the sample, the better precision of the estimates. • “The central limit theorem (CLT) states that if the sum of independent identically distributed random variables (SRS) has a finite variance, then it will be approximately normal distributed, (following a Gaussian distribution, or bell-shaped curve). “ • This result (and similar) is important because it lets us approximate the distribution of test statistics which is necessary to test hypothesis Central Limit Theorem (In words) • No matter the form of the distribution of the population. If you take a large SRS (20-30+ observations) the sample mean will approximately be normal distributed. The approximation gets better and better the larger the sample. • Conclusion of this. If your sample is large then you don’t need to care so much if your observations break the normal assumption. (This holds as long as you don’t make predictions for individuals. Se regression next week.) 85 Exampel Uniform distribution between 1 and 8: UniformDistribution(1,8) P (X ) 0.2 E(X) = = 4.5 V(X) = 2 = 5.25 SD(X) = = 2.2913 0.1 0.0 1 2 3 4 5 6 7 8 X 86 Sample of size n=2. Mean of the two observations. • 8*8 = 64 different outcomes 1 2 3 4 5 6 7 8 1 1,1 2,1 3,1 4,1 5,1 6,1 7,1 8,1 2 1,2 2,2 3,2 4,2 5,2 6,2 7,2 8,2 3 1,3 2,3 3,3 4,3 5,3 6,3 7,3 8,3 4 1,4 2,4 3,4 4,4 5,4 6,4 7,4 8,4 5 1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 6 1,6 2,6 3,6 4,6 5,6 6,6 7,6 8,6 7 1,7 2,7 3,7 4,7 5,7 6,7 7,7 8,7 8 1,8 2,8 3,8 4,8 5,8 6,8 7,8 8,8 1 2 3 4 5 6 7 8 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 2 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 3 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 4 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 6 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 87 SamplingDistributionof theMean X 0.10 P (X ) E (X ) V (X ) 0.05 4. 5 X 2 2. 625 X SD ( X ) 1 .6202 X 0.00 1.01.52.02.53.03.54.04.5 5.05.56.06.57.07.58.0 X 88 SamplingDistributionof theMean UniformDistribution(1,8) 0.2 P (X ) 0.10 P (X ) Compare! • Same mean • Less variance • More Bellshaped 0.1 0.05 0.00 0.0 1.01.52.02.53.03.54.04.55.05.56.06.57.07.58.0 1 2 3 4 5 6 7 8 X 89 Illustration CLT Population n=2 n = 30 X X X X 90 If time 2:Comparing means example A. Comparing means from 2 samples (using T-test) B. Comparing means from several samples (using ANOVA). C. Comparing means from several samples (using Blocked ANOVA) A: Do gender affect the mean score on a statistical exam? A: SPSS gives (T-test) What does the SPSS output imply? What about if we do a one sided test. (Se hand in 1) B: Do students with different grades put down different amount of time in the studies? A: SPSS gives (One Way ANOVA) • What is the simple idée behind the analysis? • What does the SPSS output imply? • Where is the difference? Tukey intervals (Where does the mean differ?) C: Do math background or Gender or both influence the time put into the course? SPSS gives (two way ANOVA) If extra time 3 population Talk about weaknesses and strength of quantitative and qualitative methods.