Download Normal Distribution - George Brown College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Normal Distribution
Statistics is used to organize data, which is important if we want to analyze and draw
general conclusions or make predictions about the data. For example, imagine we have
a collection of crayons and want to know how many crayons of a certain colour exist in
the crayon box. After collecting our information (that is, the number of crayons for each
colour or the frequency of appearance for each colour) we can organize and illustrate
our data on a graph, such as a bar graph or pie chart
Crayons in box
Crayons in box
80
60
Red
40
Blue
20
Purple
Green
0
Red
Blue
Purple
Green
The graphs can be helpful in deciding if we need to buy more crayons of a particular
colour or if we were to randomly pick up a crayon from the box, what colours would be
more likely picked.
There are 60 blue crayons out of a total of 100 crayons in the box. That means 60% of
the crayons are blue. If we were to pick out a random sample of 10 crayons from the
box, we could get any number of blue crayons ranging from 0 to 10. Since 60% of the
crayons are blue, we would expect to get 6 blue crayons with each sample trial, but the
exact number of blue crayons per trial will vary. However, the average number of blue
crayons obtained from a large number of trials should be close to the expected value of
6, and this average will tend to become closer as more trials are performed. Let’s take
a look at this example graphically. The following graph shows the number of blue
crayons per sample over 25 trials:
Number of blue crayons per sample
10
5
0
Trials
Tutoring and Learning Centre, George Brown College 2014
www.georgebrown.ca/tlc
Normal Distribution
If we now organize the data so that we are illustrating the number of trials that had a
specific number of blue crayons (or the frequency of when we get a particular number
of blue crayons) we get the following graph:
Frequency of particular # of blue crayons
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10
In this case over 25 trials, the average number of the blue crayons per trial is 5.3. If we
keep performing more trials, we will find that the average will become closer to our
expected value of 6.
If you compare the appearances of the above graphs, you’ll notice that data can be
distributed in a few different ways. Sometimes a peak may be more skewed to the left
or to the right of a graph, or it may have a random distribution. We often find that a data
set will follow a particular type of distribution where the peak is concentrated mostly
around a central value.
The above graph illustrates what is called a normal distribution of data, which means
that 50% of the data points in the set are on either side of the central value. The central
value in a normal distribution is the value that occurs most often in the data set (i.e. the
mode). It is also the average value in the data set (i.e. the mean). Furthermore, if you
were to rank all the data values in the set in ascending order, the central value would
also be the value that is in the middle of the ordered set (i.e. the median).
For a normal distribution, the mean = mode = median
Tutoring and Learning Centre, George Brown College 2014
www.georgebrown.ca/tlc
Normal Distribution
We can also draw a curve through the data points. Notice that the curve’s shape
resembles a bell, which is why it is often called a bell curve.
Graphs of normal distributions will look different depending on the mean value (which
determines the location of the center of the graph) and the standard deviation (which
is the measure of how spread out the data values are). When the standard deviation is
large, the data values are spread out from the mean so the graph will look more flat.
For example, the first graph has a larger standard deviation than the second graph:
If we have a graph with a normal distribution and its mean value is equal to 0 and it has
a standard deviation of 1, then the graph is illustrating a standard normal distribution.
Standard Normal Distribution
μ=0
σ=1
The above graph is telling us that
68% of all the data values are within 1 standard deviation of the mean
95% of all the data values are within 2 standard deviations of the mean
99.7% of all data values are within 3 standard deviations of the mean.
Tutoring and Learning Centre, George Brown College 2014
www.georgebrown.ca/tlc
Normal Distribution
Example:
Let’s say we want to plant some shrubs along the exterior of a building but we want to
make sure that the plants will not likely grow taller than the windows. We find out that
95% of a particular type of shrub grows to a maximum height between 1.1m and 1.7m
tall. Assuming the data is a normal distribution, we can calculate the mean and
standard deviation.
The mean is the halfway point in the data:
Mean = (1.1 m + 1.7 m) ÷ 2 = 1.4 m
95% is 2 standard deviations on either side of the mean value. Therefore, the
difference between 1.1m and 1.7m can be divided by 4 to determine the value of 1
standard deviation.
1 standard deviation = (1.7 m – 1.1 m) ÷ 4 = 0.15
From this data, we now know that the average height of that type of shrub is 1.4 m and
that with any particular shrub, there is a 68% probability that it will be within 0.15 m from
the average (i.e. between 1.25 m and 1.55 m).
“Standard Score” or “z-score” is used to describe the number of standard
deviations a particular value x is from the mean.
z-score =
If we have a shrub that is at a height of 1.85 m, according to the above graph it would
be 3 standard deviations from the average. That is, the z-score for that shrub is 3.
If we have another shrub that is at a height of 1.8 m, how many standard deviations is it
from the mean? To solve this, we need to calculate the difference between the shrub
height and the mean, and then divide that value by the standard deviation:
Tutoring and Learning Centre, George Brown College 2014
www.georgebrown.ca/tlc
Normal Distribution
z-score = (1.8 m – 1.4 m) ÷ 0.15 m = 2.67
Let’s say we have a particular shrub that is 1.2 m tall. What is its z-score?
z-score = (1.2 m – 1.4 m) ÷ 0.15 m = -1.33
This shrub is -1.33 standard deviations from the average. A negative z-score indicates
that the height of the shrub is shorter than the average.
If we have another shrub that is exactly the same height as the average, the z-score
would be 0.
If we were to create another graph of the data using the z-scores instead of height
values, we would end up with the standard normal distribution graph.
Standard Normal Distribution
-3
-2
-1
0
1
2
3
Practice Questions
1) The class average for a test is 75% and the standard deviation is 4%. Duncan has a
test score of 83%. Assuming the test scores follow a normal distribution, what
percentage of the class did better on the test than him?
2) It takes Leia an average of 45 minutes (standard deviation is 7 minutes) to travel
from her home to her office. If she wakes up late one morning and only has 38 minutes
to get to work, what is the probability she will get there on time assuming her travel
times follow a normal distribution?
Answers
1) z-score = (83% – 75%) ÷ 4% = 2
If we look at a standard normal distribution graph, 2.5% of the test scores are greater
than Duncan’s score.
2) z-score = (38 min – 45 min) ÷ 7 min = -1
If we look at a standard normal distribution graph, 84% of the travel times are above a
z-score of -1. This means that 84% of Leia’s travel times are longer than 38 minutes,
that is, she has an 84% chance of being late. So Leia only has a 16% chance of getting
to work on time.
Tutoring and Learning Centre, George Brown College 2014
www.georgebrown.ca/tlc