Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Misleading Statistics Nick Deyotte and Joshua Miki Influences on the Collection of Data Bias, Language, Ethics, and Cost Bias - A biased statistic is when it is calculated in a way that it is only systematically different from the population parameter of interests. (Ex. It would be bias if you were collecting data about the best sport and you asked more basketball players than anyone else) Language - Language barrier can influence the way somebody answers a question if they don’t completely understand it or think it means something that it doesn’t. (Ex. If you were doing a survey about the best food and somebody was just learning English, they might answer with Pizza when they meant to answer Hot Dogs because they didn’t completely understand the language. Ethics - Unethical behaviour is common in statistics and is often done to reinforce a viewpoint. It is something that is looked down upon or not thought to be morally right. (Ex. A car might be ranked high on comfort but low on safety so the car company could show only the comfort figures and not safety to mislead customers) Cost - The amount of money spent on the collection of data can have an impact on the data. People might answer a survey differently depending on how much money, time, and effort they could see was put into the survey. (Ex. If there was a survey about a phone, people might say better things about it if the survey looked nice and they could see a lot of money was spent on it than if it was a really bad, low budget survey) Timing, Privacy, and Cultural Sensitivity Timing - Should be taken into consideration when conducting a survey. According to what is happening at the time it can effect their answer. (Ex. If you asked people about gun laws right after a mass shooting it will be more biased towards strict gun laws.) Privacy - Important for them to know if they are anonymous or not because depending on the question you are asking it could effect what they answer or whether they answer at all (Ex. If you are asking someone their age they might want to keep it confidential or they will lie about it or they will not answer at all. Cultural Sensitivity – If you are asking people racialy stereotypical it could be offensive to some people. (Ex. If you ask people about gun laws it can differ between races depending on how they are treated.) Population vs Sample Population Population includes all the members of a specific group that you are studying or collecting information from. You wouldn’t collect information from every single person in the group, only a portion. Ex. If you were studying people between the ages of 65 and 70 all over Canada, all the people between the ages of 65 and 70 would be the population, but you wouldn’t collect information from all of them, only some. Sample A sample is only a portion of a group of people. (Ex. If people are going to a hockey game and you ask the first 100 people who show up that is a “sample” of the amount of people who show up) Types of Sampling Methods Stratified Sample, Systematic Sample, and Voluntary Response Sample Stratified Sample - Stratified sampling is a technique where the researcher divides the population into different groups or strata, then randomly selects the final subjects from the different strata. Ex. If you wanted to interview 40 high school students you could split up all the students into 4 strata by grade and interview 10 random people from each strata to make up your sample size of 40. Systematic Sample - Systematic sampling is where sample members from a larger population are selected from a random starting point and fixed intervals. Ex. If you wanted to interview 10 people from different houses on 3 streets you could start with the 2nd house and interview someone from every 3rd house to get your sample size of 10. Voluntary Response Sample - Voluntary response sampling is a sample made up of volunteers. Ex. If you wanted to interview 5 people on a radio show you could interview the first 5 people to call. Convenience Sample and Random Sample Convenience Sample - A method of non-probability sampling. Made up of people who are easy to reach.(Ex. Someone is holding a poll at a local mall because it’s easier for them.) Random Sample - A random sample is best when asking lots of people.(Ex. Asking people from around the world how much time they watch TV a day it will be very random if you ask in very different communities) Theoretical vs Experimental Probability Theoretical Probability Theoretical probability is finding the probability of events that come from a sample space of known equally likely outcomes. Ex. You have a bag of 100 marbles. 50 of the marbles are red and the other 50 are yellow. If you pull out one marble from the bag, the theoretical probability of grabbing a red marble is 50/100 or 1/2. This doesn’t mean that if you grabbed 50 marbles from the bag you would always get 25 of each color but theoretically you would. You could grab 30 red and 20 yellow and the theoretical probability would still be 1/2. Experimental Probability Experimental probability is when an experiment being conducted uses probability is recorded. (Ex. If you flip a coin 50 times the experimental outcome can be different from the theoretical outcome.) Misleading Statistics in the Media This is a column chart that is very misleading. In a column chart the y-axis should always start at 0 but it doesn't in this one. It makes it look like since 2008-2009 the percentage of graduating students has tripled when it only went up 7% This is what the chart would like if the y-axis started at 0. As you can see the difference between 2008-2009 and now looks a lot smaller. This graph is misleading because it doesn’t start a zero making the differences look bigger. It looks like gas prices have tripled in the last year. This graph is very misleading because it doesn’t start at zero. It starts at 79000 and makes the difference look big. This is actually what the graph would look like if graphed properly. It makes the difference look really small now.