Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICS BY NICK TANG AND NICK YU HOW WOULD BIAS, USE OF LANGUAGE, ETHICS, COST, TIME AND TIMING, PRIVACY, AND CULTURAL SENSITIVITY MAY INFLUENCE THE RESULTS Bias, use of language, ethics, cost, time and timing, privacy, and cultural sensitivity may influence how the data is represented and the way the data is collected. Data results could be manipulated because when the data is being collected the collectors of the data could manipulate the data collection method. Also, when the data is being represented it could be manipulated because the numbers of the data and the method the data is displayed may not match up. BIAS Influences the collection of data because you can manipulate the words to make one side of data more or less appealing. This causes buyers, users, and customers to be misled by the wording of the question and it make them feel obligated to choose one side. Example: : Over 90% of Canadians today choose Tide over the leading detergent This is bias because it's leading the customers towards picking Tide because they said that majority of Canadian use Tide. USE OF LANGUAGE influences the collection of data by leaving out certain information and replacing it with a bias word. example: Most customers pick Campbell’s Chicken Noodle Soup over President’s Choice Chicken Noodle Soup. This is misleading because it uses the word most which makes buyers believe that over 80% of customers choose it but in reality only 54% of customers choose it. ETHICS influence the collection of data by asking the question in a certain way. A Bad ethic is asking a question in an inappropriate way or crossing the line which could make the customer feel uncomfortable. A good ethic is letting the customer make their own decisions and speak on their own terms. Example: A sales person calls you and you say you are not interested. The sales person calls back later that day and continues throughout the week. This is unethical because the sales person crosses the line by consistently calling when they were told they were uninterested COST influences the collection of data because it may cost a lot of money to ask large amounts of people. Because of this, smaller percentages of the population will be asked and it effects the data collected. Smaller amounts of people may have similar opinions to each other and their opinions don’t relate to the rest of the population. Example : mcdonalds is asking their customers which new smoothie is their favourite. They asked 10 customers instead of 100 to cut the cost of the questionnaires needed. 8 of 10 people liked the blueberry one, and 2 of 10 people liked the strawberry one. Their results concluded that 80% of people asked like the new blueberry one. This is misleading because the customers don’t know how many people were asked for the poll, so they just assume that it’s a larger number than 10. This tricks customers into thinking that the results are 100% true, when in reality very few people were asked and their opinion doesn’t relate to everyone else. TIME AND TIMING influences the collection of data because the time effects what you are more likely to choose. Also based on the month or what time of the year the questions being asked your answer can vary. Example : : Asking someone what their favourite drink from Starbucks is This is misleading because it depends on the month because in winter they will most likely choose a warm drink instead of a cold one. PRIVACY influences the collection of data because the information stays private and anyone could choose the answers. This makes it inaccurate because other people who view the data wouldn't know how they obtained it. Example : Confidential surveys where others don't know who took the survey or when the data was collected This is misleading because the people would not know how the data was obtained CULTURAL SENSITIVITY influences the collection of data because if you ask a question to a certain culture they will answer differently than people who are not in that culture. Example : Asking people from a certain culture that can not eat ham, what brand of ham they would choose. This is misleading because the people who choose not to eat ham will not like a certain brand of ham over another. THE DIFFERENCE BETWEEN A POPULATION AND A SAMPLE The difference between a population and a sample is how many people were surveyed.A Population means everyone in the area was surveyed the same way and then the results would be figured out with the data that was collected. A sample means a portion of the population was surveyed the same way and then the results would be figured out with the data that was collected. EXAMPLE Population – all of Canada would be surveyed Sample – a province of Canada would be surveyed DIFFERENT TYPES OF SAMPLING METHODS CONVENIENCE SAMPLE • A convenience sample is one of the main types of non-probability sampling methods. A convenience sample is made up of people who are easy to reach. • Example: A pollster interviews shoppers at a local mall. If the mall was chosen because it was a convenient site from which to survey participants or/and because It was close to the interviewer’s home or business, then this would be a convenience sample. • Convenient for the interviewer RANDOM SAMPLE • Random sampling is a procedure for sampling from a population in which the selection of a sample unit is based on chance and every element in the population has a “known, non-zero” probability of being selected • Random sampling helps produce representative samples by eliminating voluntary response bias and guarding against undercover bias. All good sampling methods rely on random sampling. STRATIFIED SAMPLE • “Stratified” sampling refers to a type of sampling method. With stratified sampling, the researcher divides the population into several groups, called strata. Then a simple random sample is drawn from each group. • Using stratified sampling, it may be possible to reduce the sampling size required to achieve given precision. Or it may be possible to increase the precision with the same sample size. SYSTEMATIC SAMPLE • With systematic random sampling, we create a list of every member of the population. From the list, we randomly select the first sample element from the first k elements on the population list. Afterwards, we select every kth elements on the population list. • This is different from the simple random sampling since every possible sample of n elements is not equally likely. VOLUNTARY RESPONSE SAMPLE • Main types of non-probability sampling methods. A voluntary sample is made up of people who self-select into the survey. Often, these people have a strong interest in the main topic of the survey • Example: a news show asks viewers to participate in an on-line poll. This would be a voluntary sample. The sample is chosen by the viewers, not by the survey administrator DIFFERENCE BETWEEN THEORETICAL AND EXPERIMENTAL PROBABILITY. Theoretical probability is the probability that is calculated using math formulas. This is the probability based on math theory Experimental probability is calculated when the actual situation or problem is performed as an experiment. In this case, you would perform the experiment, and use the actual results to determine the probability. Example: Chance of flipping a heads or tails on a coin is 50/50. That is theoretical probability. But when do you the tests (experimental probability) you most likely will never get the same number of heads than tails. 3 EXAMPLES OF MISLEADING STATISTICS EXAMPLE 1 Vitamin water is very misleading because they use the word vitamin in their name which leads consumers to believe it is healthy but are body only needs so much of the vitamin and then the rest of the vitamins get flushed out of our bodies.We consume the vitamins then we consume a big amount of sugar which is not healthy for our bodies. EXAMPLE 2 • A place where you could find misleading statistics are on the news. They hope you wont notice and often just slip in a graph with exaggerated data or ones that are sometimes not even accurate at all. There was a chart on a documentary where they were showing lines on a graph going up, but there weren’t even any numbers on the graph at all. EXAMPLE 3 • One classic example involves false positives while testing for rare events. • Suppose there is a test for tuberculosis and it’s given to every schoolkid in the US. Then it’s found that 99% of the positive results were false positives; the kid was fine but the test said they were sick. • Many people would interpret this to mean that the test has low accuracy. Not so - its accuracy is still pretty high, but there are simply very few kids who actually do have tuberculosis out there, so the number of positive results is simply very low. • For example, if the test is given to 10 million kids and it returns the correct result 99.9% of the time (regardless of the kid's health) and 100 kids in the country have tuberculosis, the test will have 100 or 99 or 98 correct positive results and about 10,000 incorrect positive results. 99% of positive results will be wrong results, even though the test was actually very accurate.