Download Misleading Statistics - Riverside Secondary School

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Misleading Statistics
Nick Deyotte and Joshua Miki
Influences on the Collection of Data
Bias, Language, Ethics, and Cost
Bias - A biased statistic is when it is calculated in a way that it is only systematically different from the population
parameter of interests. (Ex. It would be bias if you were collecting data about the best sport and you asked more
basketball players than anyone else)
Language - Language barrier can influence the way somebody answers a question if they don’t completely understand it
or think it means something that it doesn’t. (Ex. If you were doing a survey about the best food and somebody was
just learning English, they might answer with Pizza when they meant to answer Hot Dogs because they didn’t
completely understand the language.
Ethics - Unethical behaviour is common in statistics and is often done to reinforce a viewpoint. It is something that is
looked down upon or not thought to be morally right. (Ex. A car might be ranked high on comfort but low on safety so
the car company could show only the comfort figures and not safety to mislead customers)
Cost - The amount of money spent on the collection of data can have an impact on the data. People might answer a
survey differently depending on how much money, time, and effort they could see was put into the survey. (Ex. If
there was a survey about a phone, people might say better things about it if the survey looked nice and they could
see a lot of money was spent on it than if it was a really bad, low budget survey)
Timing, Privacy, and Cultural Sensitivity
Timing - Should be taken into consideration when conducting a survey. According to what is happening at the time it can
effect their answer. (Ex. If you asked people about gun laws right after a mass shooting it will be more biased
towards strict gun laws.)
Privacy - Important for them to know if they are anonymous or not because depending on the question you are asking it
could effect what they answer or whether they answer at all (Ex. If you are asking someone their age they might want
to keep it confidential or they will lie about it or they will not answer at all.
Cultural Sensitivity – If you are asking people racialy stereotypical it could be offensive to some people. (Ex. If you ask
people about gun laws it can differ between races depending on how they are treated.)
Population vs Sample
Population
Population includes all the members of a specific group that you are studying or collecting
information from. You wouldn’t collect information from every single person in the group, only
a portion.
Ex. If you were studying people between the ages of 65 and 70 all over Canada, all the
people between the ages of 65 and 70 would be the population, but you wouldn’t collect
information from all of them, only some.
Sample
A sample is only a portion of a group of people.
(Ex. If people are going to a hockey game and you ask the first 100 people who
show up that is a “sample” of the amount of people who show up)
Types of Sampling Methods
Stratified Sample, Systematic Sample, and Voluntary Response Sample
Stratified Sample - Stratified sampling is a technique where the researcher divides the population into different groups or
strata, then randomly selects the final subjects from the different strata.
Ex. If you wanted to interview 40 high school students you could split up all the students into 4 strata by grade and interview 10 random people from
each strata to make up your sample size of 40.
Systematic Sample - Systematic sampling is where sample members from a larger population are selected from a random
starting point and fixed intervals.
Ex. If you wanted to interview 10 people from different houses on 3 streets you could start with the 2nd house and interview someone from every 3rd
house to get your sample size of 10.
Voluntary Response Sample - Voluntary response sampling is a sample made up of volunteers.
Ex. If you wanted to interview 5 people on a radio show you could interview the first 5 people to call.
Convenience Sample and Random Sample
Convenience Sample - A method of non-probability sampling. Made up of people who are easy to
reach.(Ex. Someone is holding a poll at a local mall because it’s easier for them.)
Random Sample - A random sample is best when asking lots of people.(Ex. Asking people from around
the world how much time they watch TV a day it will be very random if you ask in very different
communities)
Theoretical vs Experimental Probability
Theoretical Probability
Theoretical probability is finding the probability of events that come from a sample space of known equally
likely outcomes.
Ex. You have a bag of 100 marbles. 50 of the marbles are red and the other 50 are yellow. If you pull out
one marble from the bag, the theoretical probability of grabbing a red marble is 50/100 or 1/2.
This doesn’t mean that if you grabbed 50 marbles from the bag you would always get 25 of each color but
theoretically you would. You could grab 30 red and 20 yellow and the theoretical probability would still be
1/2.
Experimental Probability
Experimental probability is when an experiment being conducted uses probability
is recorded.
(Ex. If you flip a coin 50 times the experimental outcome can be different from the
theoretical outcome.)
Misleading Statistics in the Media
This is a column chart that is very misleading. In
a column chart the y-axis should always start at
0 but it doesn't in this one. It makes it look like
since 2008-2009 the percentage of graduating
students has tripled when it only went up 7%
This is what the chart would like if the y-axis
started at 0. As you can see the difference
between 2008-2009 and now looks a lot smaller.
This graph is misleading because it doesn’t start
a zero making the differences look bigger. It
looks like gas prices have tripled in the last year.
This graph is very
misleading because it
doesn’t start at zero. It
starts at 79000 and makes
the difference look big.
This is actually what the
graph would look like if
graphed properly. It makes the
difference look really small
now.