Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Warm-up In the United States, heart disease kills roughly one-anda-quarter times as many people as cancer. If you look at the death rate per 100,000 residents by state, the distribution for the two diseases are roughly normal, provided you leave out Alaska and Utah, which are outliers because of their unusually young populations. The Mean and standard deviations for all 50 states are given below: Mean SD Heart disease 238 52 Cancer 196 31 Alaska has 88 deaths per 100,000 residents from heart disease and 111 death per 100,000 from cancer. Explain which death rate is more extreme compared to other states. Solution z heart zcancer 88 238 2.88 52 111 196 2.74 31 Alaska’s death rate for heart disease is 2.88 standard deviation below the mean. The death rate for cancer is 2.74 standard deviation below the mean. These rates are extreme but the death rate for heart disease is more extreme. Daniel S. Yates The Practice of Statistics Third Edition Chapter 5: Producing Data 5.1 Designing Samples Copyright © 2008 by W. H. Freeman & Company Essential Questions • What is the difference between an observational study and an experiment? • What is a simple random Sample (SRS)? • What are the different types of sampling methods? • What is bias? What are the sources of bias? Two Basic Ways to Collect Data Observational Study Experiment Observational vs. Experiment • An observational study observes individuals and measures variable of interest but does not attempt to influence the responses. • An experiment, on the other hand, deliberately imposes some treatment on individuals in order to observe their responses. Observational and experimental studies … • Observational study is one in which measurements representing a variable of interest are observed and recorded, without controlling any factor that might influence their values. • Experimental study is one in which measurements representing a variable of interest are observed and recorded, while controlling factors that might influence their values. Sampling vs. a Census • Sampling involves studying a part in order to gain information about the whole. • A census attempts to contact every individual in the entire population. Sample vs. Population What are some reasons that a sample would be used instead of using the population? 1. Cost 2. Access 3. Time 4. Not necessary if the sample truly represents the population Statistical Inference • The purpose of collecting data on a sample is to answer some question of interest and make an inference about the population • or to conduct an experiment to confirm/support a cause/effect relationship when two variables, explanatory and response, are related. Parameter — A descriptive measure of a population. Statistic — A descriptive measure of a sample. Because populations tend to be very large, most population parameters are not only unknown but also unknowable. We can only use statistics inference to obtain an estimate if willing to accept less than 100% accuracy. Instead of investigating the entire population, we choose to study a sample. How to capture a “Sample” • Getting a portion of the population is not difficult. • Getting a good sample is difficult. • Creating a plan to do this is called “sample design”. How not to sample – Voluntary Reponse Sample • Voluntary response sample consists of people who choose themselves by responding to a general appeal (example: Call in opinion polls). • The problem with call in opinion polls is that the people who answer the polls tend to have strong opinions, especially strong negative opinions. • This sample is biased; this sample is not representative of the population. How not to sample – Convenience Sample • Choosing individuals who are easiest to reach is called Convenience sampling. (For example: Mall intercept interviews.) • Convenience sampling may not get you access to all the people in the population. • Interviewers often avoid people who may make them feel uncomfortable. • This sample is biased; this sample is not representative of the population. Definition of Bias Remedy for bias in choosing a sample is to allow chance to do the selecting. How to sample • The best way to sample is to use a “simple random sample” • A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has and equal chance to be the sample actually selected. How to create a SRS • Steps for choosing an SRS: – Step 1: Label. Assign a numerical label to every individual in the population. – Step 2: Make Random selections of labels • Random number table (Table B) • Random number generator (RandInt in the TI-83/84) • Computer software. – Step 3: Stopping Rule – criteria use to stop sampling. – Step 4: Identify Sample. Use the labels to identify the subjects selected to be in the sample. Using a Table of Random Digits Look at Problem 5.10 page 341 Problem 5.10 131 | 05007 16632 81194 14873 04197 855776 O5 Beach Castle 19 Sea Castle 20 Banyan Tree Probability Sample Methods that use chance to choose a sample is a Probability Sample. • Some probability samples methods give every member an equal chance for selection (SRS). This may not be true for more elaborate sampling methods. • However, the use of chance to select the sample is the essential principle in statistical sampling. More Complex Sampling Methods • Methods for sampling from large populations over wide areas may be more complex than SRS. • Common examples are: – Stratified Random Sample – Cluster Sampling – Multi-Stage Sampling Stratified Random Sample Some Reasons to Do a Stratified Random Sample • It assures that you will be able to represent not only the overall population, but also key subgroups of the population. If you want to be able to talk about subgroups, this method gives you’re the ability to do so. • Stratified random sampling will generally have more statistical precision than simple random sampling. This will only be true if the strata or groups are homogeneous. Sampling Methods/Designs Stratified Random Sample sample important groups within the population separately and then combine the groups Steps: a. divide the population into groups of similar individuals, called strata (gender, age, political party, weight) b. choose a separate SRS in each stratum c. combine to form the full sample Choose the strata based on facts known BEFORE the sample is taken Stratified Random Sample For example, let's say that the population of clients for our agency can be divided into three groups: Caucasian, African-American and HispanicAmerican. Furthermore, let's assume that both the African-Americans and Hispanic-Americans are relatively small minorities of the clientele (10% and 5% respectively). Stratified Random Sample Example A school official wants to estimate the average number of hours per week that students devote to homework. Because she believes that this figure will differ considerably among classes, stratified random sample will be employed. The population of students at this school will be group into four strata consisting of all freshman, sophomore, juniors and seniors. From each stratum, a random sample of students will then be selected. The resulting information can be combined to obtain an estimate that is expected to be more precise than that obtained from a random sample of the entire population. Cluster Sampling The main difference between Stratified Random Sampling and Cluster sampling is once you randomly select the clusters, all members of the each selected cluster is part of the sample. Stratified Random Sampling, you do an SRS at all levels. Cluster Sampling Example Suppose an organization wishes to find out which sports seniors are participating in across PA. It would be too costly and take too long to survey every student, or even some students from every school. Instead, 100 schools are randomly selected from all over PA. These schools are considered to be clusters. Then, every senior student in these 100 schools is surveyed. In effect, students in the sample of 100 schools represent all seniors in Pa. Multistage Sampling Design • Randomly choose stage 1 strata (for example, states) • Randomly choose stage 2 strata (for example, cities within states) • and so on until you get down to the sample size. Analysis of Complex Sampling Designs • Analysis of data from sampling designs more complex than an SRS is beyond the scope of this course. • However, the SRS is the building block of the more complex design. • The fundamental concepts are the same. Cautions About Sample Surveys • Response Bias – The behavior of the respondent or of the interviewer can influence a response. For example a respondent may lie about illegal or embarrassing behavior. • Poorly Worded Questions – Confusing or leading questions can introduce strong bias. Inferences About The Population • Using chance to choose a sample eliminates bias in the selection of the sample of available individuals. • The results from a sample is unlikely to exactly match the entire population. • We can improve our accuracy by using large random samples. Review Questions A business school researcher wants to know what factors affect the survival and success of small businesses. She selects a sample of 150 restaurants from those listed in the Yellow pages. The population is… 1. 2. 3. 4. Successful restaurants 150 restaurants she chose All restaurants in the city All small businesses An SRS is 1. Stratified Random Sample 2. Simple Random Sample 3. Statistically Real Survey 4. Single Radon Stocker In an SRS, the sample 1. Is divided into groups 2. Is selected randomly 3. Biased 4. Is on a voluntary basis The design of a study is biased if… 1. An SRS was used 2. It systematically favors certain outcomes 3. Population is divided into strata 4. Not all individuals are surveyed A student wants to know the opinions of the teachers in his school about final exams so he asks his current teachers. 1. 2. 3. 4. SRS Stratified Random Convenience Voluntary A group of high school students are first divided into groups by elementary school attended and then divided by grade. This is an example of… 1. SRS 2. Stratified Random Sample 3. Convenience Sample 4. Multistage Sample A principal wants to know the attitudes of the students towards final exams so she divides the students by grade and then randomly selects 20 students from each grade 1. SRS 2. Stratified Random 3. Convenience 4. Voluntary 1. 2. 3. 4. A textbook publisher wants to know the attitudes of teachers in the state towards final exams so a survey is sent to the members of the teachers’ union from his hometown. SRS Stratified Random Convenience Voluntary A state ed. Board member wants to know the attitudes of teachers toward final exams so she interviews teachers at the state teachers’ convention. 1. 2. 3. 4. SRS Stratified Random Convenience Voluntary A local church is trying to determine the most popular hymn. They select a random sample from the traditional service (as opposed to the contemporary service) at 8am.Which bias is present. 1. 2. 3. 4. Response Bias Nonresponse Bias Undercoverage Leading question If a sample is an SRS, then the results will be free of bias. 1. True 2. False Which of the following is not a probability sample. 1. 2. 3. 4. 5. 6. Voluntary Convenience SRS Stratified Random Sample Both 1 and 2 Both 3 and 4 If a survey requires that the respondents call in with their opinion and the interviewer randomly selects which callers to include, then it is an SRS. 1. True 2. False