Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction Creating New Knowledge Scientific Cycle Observe world Scientific Cycle Observe world Isolate problem Scientific Cycle Observe world Isolate problem Propose theory Scientific Cycle Observe world Isolate problem Propose theory Design study Scientific Cycle Observe world Scientific Cycle Isolate problem Propose theory Design study Make predictions Observe world Scientific Cycle Isolate problem Propose theory Design study Collect data Make predictions Observe world Scientific Cycle Isolate problem Propose theory Design study Collect data Make predictions Compare Scientific Cycle Observe world Isolate problem Propose theory Design study Make predictions Compare Collect data Presentation Scientific Cycle Observe world Isolate problem Refine – revise/replace Propose theory Design study Make predictions Compare Collect data Presentation Scientific Cycle Observe world Expand/restrict/modify scope Isolate problem Refine – revise/replace Propose theory Design study Make predictions Compare Collect data Presentation Scientific Cycle Observe world Expand/restrict/modify scope Isolate problem Refine – revise/replace Propose theory Design study Make predictions Compare Collect data Presentation Design Experiments, surveys, interventions, observational studies Data Descriptive: Tables, graphs Inference: Probabilistic models and inference theory to draw conclusion such as • Tests of theories/hypotheses • Prediction of unobserved events • Decisions Design Experiments, surveys, interventions, observational studies Data Descriptive: Tables, graphs Inference: Probabilistic models and inference theory to draw conclusion such as • Tests of theories/hypotheses • Prediction of unobserved events • Decisions Design How to optimally collect the data in order to later perform inference Data Descriptive: Tables, graphs Inference: How to optimally use the data to answer the question of interest Design How to optimally collect the data in order to later perform inference Data Descriptive: Tables, graphs Inference: How to optimally use the data to answer the question of interest Note! A poorly designed study cannot be rescued by statisticians Population Sample survey population Data, observations Inferential statistics = Draw conclusions about the population from the sample survey. Writing a science report. Statistical perspective! • Formulate a research problem. • Define the population and plan a sample survey. • Find relevant variables to measure. • Make descriptive statistics. • Make inferential statistics. • Write report. Define the population and perform a sample survey. • Population: A group of individuals which we want to investigate. • Total survey: All the units in a population are investigated. • Sample survey: A subsample of the populations is chosen and investigated. • Random sampling: The sample units are chosen by some random mechanism. Where it can go wrong! • A study is carried out to understand the training habits of students at Umeå university. • The researcher hands out questionnaires at the entrance of Iksu. • Result: Students at Umeå University train a lot more than expected. • Where did it go wrong? Where it can go wrong! • A study is carried out to find out if students at Umeå University prefer the campus pups more than the inner city pubs. • The researcher hands out questionnaires to random persons in the queue at a campus pub. • Result: A majority of the students prefer campus pubs. • Where did it go wrong? Why a sample survey rather than a total survey? Why a sample survey rather than a total survey? • Cheaper • Faster • Cannot be used when the population is very big or infinitely large • Trials where the objects are used or destroyed Why make a random sample? • If the sample is random it is possible to use probability theory to control the error that is arising from the fact that we just study a sample and not the entire population. This is impossible is the sample is not random. Make a random sample. • Give objective measures of the precision of the results of the survey. • Make objective comparisons between different sampling plans prior to the survey. • Calculate how large samples you need in order to achieve a certain margin of error. Find relevant variables to measure. Variable: Property connected with the units in a population. Measurement: An allocation of numbers to the subjects in a survey such that specific relationships between the subjects, in consideration to some specific property, can be seen in the numbers. Why do we measure? • To describe, To compare, To evaluate. Examples of things we want to measure: • Length • Stress • Welfare • Consumer satisfaction. Data levels Data levels are important because different level of the data means different methods of analyzing the data. • Nominal Data – Classification, • Ordinal Data – Classification and Order • Interval Data – Classification, Order, and Equivalent distance • Ratio Data – Classification, Order, Equivalent distance, and Absolute zero Which type of variable? (Help me!) • • • • • • • • Age Age group 25-34, 35-44, 45-54,... Sex (male/female) Education (primary, secondary, university) Smoker (yes/no) BMI (23.45, 28.12,…) Car model (Volvo, Saab, Fiat) Temperature (12C, -4C, 14C,…) Descriptive statistics • Measures of location – mean – median • Measure of spread – – – – – range (min-max) variance standard deviation, SD standard error of the mean, SEM percentiles/quantiles (p25, p75, q1, q3,...) • Frequency tables • Graphs – barchart/histogram – boxplot – scatterplot Center and spread Answering a research questions is often to compare measures. Workload and exam result investigation. Is there a difference in the study results between males and females? If so, what does the difference depend on? A sample of graphs and plots. Exam results (scale) Workload (scale) Histogram of Exam Score (scale) Bar chart Grades (ordinal/ nominal) Pie Chart of Grade (ordinal/ nominal) Boxplot of Exam Score, gender (scale vs nominal) Bar chart of Grade , gender (nominal vs nominal) Boxplot of Total Study Time, gender (scale vs nominal) Scatter plot (scale vs scale) Is there a relation? Inferential statistics Is there a difference in how well females and males perform on the exam if we take the time the students study time into account? Inferential statistics is a collection of methods used to draw conclusions or inference about the characteristics of populations based on sample data. • (Conclusion after analysis, no gender difference.) Inferential statistics (The idea) Hypothesis testing In research we want to get answers to posed questions (hypothesis). • Are all coffee flavors equally popular? • Is the use of bike helmets effective in protecting people in bicycle accidents from head injuries? • Is there a connection between gender and alcohol consumption among the students at Umeå university? HYPOTHETIC-DEDUCTIVE METHOD 1 Hypothesis Statement Deduction – logically valid argument (Predictive inference) 2 3 Induction (Inductive inference) 1Tries to predict what will happen if the hypothesis holds. 2 ”Dialogue with reality” Observation Logical valid hypothesis (example) Valid Hypothesis: The animal is a horse. Statement: If the animal is a horse it will have four legs. Observation: The animal has not four legs. Conclusion: The animal is not a horse. Invalid Hypothesis: The animal is a horse. Statement: If the animal is a horse it will have four legs. Observation: The animal has four legs. Conclusion: It is a horse. Non valid conclusion. It can be a pig or some other animal. Logical valid hypothesis (example) Valid Invalid Hypothesis: It is raining. Hypothesis: It is raining. Statement: If it is raining the ground will be wet. Statement: If it it is raining the ground will be wet. Observation: The ground is not wet. Observation: The ground is wet. Conclusion: It does not rain. Conclusion: It rains. Non valid conclusion. The ground can be wet due to several reasons. Contradiction proofs Within statistical hypothesis testing (inference theory) we are not looking for ”impossible” events” in order to reject posed hypotheses. (e.g. it is impossible that the ground is dry if it rains. If the ground is dry hypothesis ”it rains” is rejected) Instead we are looking for contradictions in terms of ”improbable events”. Improbable event Assume that we suspect that the usage of bicycle helmets is an effective way to protect people in bicycle accidents from skull damage. Null hypothesis: The percentage of persons with skull damage after a bicycle accident is the same whether or not they use bicycle helmets. Statement: If the percentage of persons with skull damage after a bicycle accident is the same whether or not they use bicycle helmets, in a sample survey there should only be a small difference in the percentage of people with skull damage in the two groups. If the hypothesis holds, it is an improbable event in a sample survey, to observe a large percentage difference between these kinds of groups. Test statistic Within statistical inference theory the statements are summarized in a test statistic. From our hypothesis and from the probability theory we can derive the distribution of the test statistic if the null hypothesis is true. Next, we draw a sample and calculate a value of the observed test statistic and compare it with the derived distribution to understand if we have an improbable event. If we get an improbable event the null hypothesis is rejected. P-value The P-value describes how improbable the event is. If the p-value is small, we either have something which is improbable or the null hypothesis does not hold. If the p-value is small (< 0.05 or <0.01) the null hypothesis should be rejected. Mosquito cream example: We have tested anti mosquito creams on 10 students. Each student did get the cream A on a random chosen arm and cream B on the other arm. The students were then forced to walk in the Amazon jungle. After some hours the number of mosquito bites was counted on each arm. Suppose 7 out of the 10 students did have less mosquito bites on the arm with cream A. Is this enough evidence to say that there is a difference in effectiveness between the creams? Help me with the null hypothesis. Example: • Null hypothesis: The anti mosquito creams A and B are equally effective. • Alternative hypothesis: The anti mosquito creams are not equally effective • Statement: If the Null hypothesis holds then we expect that about half of the people in our sample get more mosquito bites with cream A. • Math Calculations gives that if the null hypothesis is true then the number of people in our sample that get more mosquito bites on arm with cream A is binominal distributed. If Null hypothesis is true. Is 7 out of 10 a Improbable event? The probability of getting 7 or more and 3 or less is about 34%. Conclusion • The P-value is 34%. This means that it is not uncommon to get the data we got in our sample or anything more extreme if the null hypothesis is true. • We can not reject the Null hypotheses. • We don’t have empirical support to claim that there is a difference between the mosquito creams. Reasons for non-significant results • There is no difference • There is a difference, but we have too few observations to detect it • Important. The fact that we can’t reject the null hypothesis does not mean that the null hypothesis is true. How to find the right test The Principe behind all statistical tests are the same. It’s just about finding the right test. You can do this by looking at the chart. (Hand out the chart) 1) Find the objective in the study. 2) Identify the level of the variables. 3) Use chart to find right test. The steps of a statistical investigation • Aim • Formulation of the problem, (Boundaries and more…) • Planning (What to collect and how it relates to the research question…) • Data collection (Make a random sample…) • Analysis (Descriptive and Inferential statistics…) • Report Some useful stuff if there is more time Questionnarie • Construct the questionnaire so that the respondents can easily - understand the instructions - understand the questions - answer the questions Keep the questionnarie as short as possible! Test the questionnaire • • • • Pilot study Try different orders of the questions. Test the coding of the answers. Does the questionnarie give the answers to the questions you have? • Should the response alternatives in the answers be modified? • Should the questions be reformulated? Avoid leading questions • Do you think it is reasonable to have stricter punishments in order to reduce crime in our society? • Do you prefer to go to the theatre rather than a movie if you want a cultural experience? Avoid negativly formulated questions • Should one not allow trucks to drive through the city center? • Better: Should one allow trucks to drive through the city center? Avoid hypothetical questions • Would you choose to buy a locally produced CD player if its price was 20% higher than that of CD player produced in Japan? One thing at the time! • Do you consider the staff at the bank to be friendly and competent? • Better: Do you consider the staff at the bank to be friendly? Do you consider the staff at the bank to be competent? Fixed alternatives • • • • Easy to code and to work with. Easy to answer. Should be mutually exclusive. Should be exhaustive. Open questions • Can be difficult to process and to code. • The only alternative when the possible answers are unknown or impossible to classify. • Can be used in combination with fixed alternatives. (ex: םother ______ , םwhy ? _____ ) • Best at the end of the questionnaire. Think before you act • Important issues at the planning stage: (That is before you start collecting data!) – What data to collect, how is the data helping you to solve your problem. – How to analyze the data. – How does the way of collecting data influence the analysis. • Total investigation or survey • Sampling design of a survey • Choice of method and measuring tool – How to handle missing values – How to present the results Different kind of sampling • • • • • Accidental (Convenient) sampling Voluntary answers Voluntary subjects Other nonrandom sampling techniques… Random sampling Random sampling • The sample units are chosen by some random mechanism • The probability of inclusion is known for each unit Different types of random sampling • Simple random sampling • Stratified sampling • Cluster sampling Simple random sampling • Each subject from the population is chosen randomly and entirely by chance, such that each subject has the same probability of being chosen at any stage during the sampling process. • In simple random sampling of n subjects all possible combinations of n subjects have the same chance to be selected Stratified random sampling The population is separated into non-overlapping groups, strata, and then a simple random sample is selected from each stratum. The reasons for using stratified sampling are: Stratified random sampling can increase the quantity of information for a given cost Stratified sampling allows for separate estimates of population parameters within each stratum Stratification may produce a smaller bound on the error of estimation than what you get in a simple random sampling. This is especially true when the strata are homogeneous. The cost of administration may be minimised by carefully planned stratified sampling in compact and well-defined geographical areas. Cluster random sampling The populationen is divided into groups (clusters) of subjects. An number of such clusters are randomly chosen. All individuals in the chosen clusters are selected. The reasons for using cluster sampling are: A good frame, (listing the individuals of the population) is either not available or is very costly to obtain, while a listing of the clusters is easily obtained. The cost of obtaining observations increases as the distance separating the individuals increases. Sources of Errors • • • • • Sampling error – unless we make a census… Nonresponse Frame error – Over-/Undercoverage Measurement error Data processing error