Download Histogram - What is Normality

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Transcript
WHAT IS NORMALITY?
Explaining the Normal Distribution
PRELIMINARIES: HOW TO DESCRIBE
DATA
Discussion Question: How do we describe data in Statistics?
In this course, the way we describe data is by using the C.U.S.S.
Method. We look at these characteristics:
C. Center (i.e. Mean and Median)
U. Unusual Features( i.e. Outliers present?)
S. Shape (Skewness and\or Symmetry)
S. Spread (max to min)
CONTINUED PRELIMINARIES:
BOXPLOTS
• Recall that a box plot is a standard way of displaying the distribution of data
graphically using the five number summary.
• Lets build a box plot using our calculators!
• Ex 1: Make a box plot using the following distribution of numbers in your
calculator.
2, 3, 5, 5, 6, 7, 7, 7, 8, 9, 9, 9 10, 10, 10, 10, 10, 10, 11, 11, 12, 14, 14, 14, 15, 16, 18,
18, 18, 18, 19, 19, 20, 20, 22, 22, 24, 24, 25, 26, 26, 27, 28, 28, 28, 33, 45, 50, 55, 66
• After you obtain the graph in your calculator draw it in your notes, write
down the five number summary, and then use C.U.S.S to describe the
distribution.
STEPS TO DRAW A BOX PLOT IN
CALCULATOR TI 83-84
1. Go to STAT button and push edit
2. Copy the data into list L1
3. Push 2nd y= to get to the stat plot page
4. Select plot 1 on
5. Select Type: Boxplot (bottom left)
6. Select Xlist: L1
7. Select Frequency 1
8. Select mark the first one
9. Complete the questions
PRELIMINARIES: HISTOGRAMS
• Recall that a Histogram is a graphical representation of the data of a
distribution using different ranges and frequencies. The distribution has a
similar shape to a bar graph. Below is a visual representation of a Histogram.
VIDEO: BUILDING A HISTOGRAM
Please watch this short video about how to build a Histogram and
answer the following questions in your notes
Histogram
WHAT IS THE CONNECTION BETWEEN
HISTOGRAMS AND NORMALITY?
• As more data is added to the a Histogram the shape becomes more and
more similar to a bell curve. You can see the resemblance in the examples
we covered and in the following picture
DEFINING THE NORMAL
DISTRIBUTION
• To assume that a set of data follows a Normal Distribution is one of the most
important assumptions in Statistics
• When we think about the concept of Normality consider a list of numbers
that has low values, high values, and values in the middle. If you have LOTS
of these types of numbers then they MIGHT follow a “Normal Distribution”
• Some examples of Normally distributed data are:
The heights of all males in the United States, scores on the SAT exam, lengths of
great white sharks, etc.
Can you think of some examples based on the description given above?
Write them in your notes.
CAUTION! THE NORMALITY TRAP
• Not all data follows a Normal Distribution
• Data with outliers or skewness may not be Normally distributed
• Large samples will be closer to a Normal distribution than small samples
• Real life data is almost NEVER EXACTLY NORMAL.
CHARACTERISTICS OF A NORMAL
DISTRIBUTION
•
The mean, median, and mode all have the same value
•
The curve is bell shaped and symmetric about the line that
crosses the mean
•
The curve approaches, but never touches the x-axis as you
move away from the mean
•
The area under the curve is equal to 1
•
Almost all of the area under the curve exists within three
standard deviations of the mean
•
When data follows a normal distribution it is denoted N(𝜇, 𝜎)
where 𝜇 is the mean and 𝜎 is the standard deviation of the
distribution
THE NORMAL DISTRIBUTION
The middle represents the
mean/median/50th percentile
HOW TO DRAW A NORMAL
CURVE
We first draw our axis and
then we plot the mean
and three standard
deviations above and
below the mean as
illustrated in the diagram.
Then draw the bell curve.
Ex: In your notes draw the
following curve. The data
is N(10, 5).
THE EMPIRICAL RULE
• Recall the characteristics of the Normal distribution said that almost all of the
area under the curve exists within three standard deviations of the mean
• How do we know what the almost means? For this we have the Empirical
Rule A.K.A The 68-95-99.7 Rule
• The empirical rule states that;
• 68% of the data is within 1 StDev of the mean
• 95% of the data is within 2 StDev of the mean
• 99.7% of the data is within 3 StDev of the mean
• Discussion Question: Where do you think this rule originated from?
ESTIMATING AREAS UNDER THE
NORMAL CURVE USING THE
EMPIRICAL RULE
• Complete this example in your notes with a partner.
• Suppose the scores from your AP Stats exam are N(75, 5). Answer the
following questions.
1. Sketch the distribution
2. What percentage of the scores is within 60 and 90?
3. What percentage is the 50th percentile?
4. What score would be the 16th percentile?
Z-SCORES
• A z-score is how many standard deviations a data point is away from the
mean
• Z-scores are used to find the area under the Normal curve by standardizing
it.
• The Standard Normal Curve is a Normal curve with a mean of 0 and
standard deviation 1.
• The formula for the z-score is as follows:
CONTINUED Z-SCORES
• Recall our previous example with our test scores that were N(70, 5).
Supposed I asked you to find the z for a score of 82.
• Using the formula we get 𝑧 =
82−70
5
= 2.4
• Now to interpret this result we would say: A score of 82 in the test is 2.4
standard deviations above the mean
• If your z-score is negative then the data is z standard deviations away from
the mean.
USING Z-SCORES TO FIND THE
AREA UNDER THE NORMAL CURVE
• Suppose we said that the show size for males in the U.S. is N(9.5, 1.25).
• How would you calculate the percentage of shoe sizes that are below 10?
• We could use the Empirical Rule to estimate this, but an quicker way is to use
z-scores
• Let’s work through this example with our calculators
• Using the z-score formula we get: 𝑧 =
10−9.5
1.5
= .3333
• Now on the calculator press 2nd vars, normalcdf(-1,000, .333) and you get
the answer.
• If it were to ask you for the percentage above 10, you would press 2nd vars,
normalcdf(.33, -1,000).
NOW YOU TRY!
• Using the previous scenario answer the following questions in your notes with
a partner.
• Find the percentage of shoe sizes above 8.5
• Find the percentage of shoe sizes between 8.5 and 9.5
• Find the percentage of show sizes below 9.25
• Draw and shade the Normal Curves representing these scenarios
CHECKING FOR NORMALITY
USING THE EMPIRICAL RULE
• Discussion Question: How could we use the Empirical Rule to check if the
data follows a Normal Distribution?
• Jot down some ideas in your notes.
CONTINUED: CHECKING FOR
NORMALITY
The steps to checking the data for Normality are as follows
1. Check the Empirical Rule (68-95-99.7 percent of the data lie within this
boundary)
2. Check the 1-variable stats (The closer the median is to the mean suggest
that the data might be Normally distributed)
3. Draw a Histogram or boxplot in your calculator and check C.U.S.S. The
more symmetrical the graph the more evidence that suggest that the data
is Normally Distributed
FINAL ACTIVITY: GREAT WHITE
SHARKS
GREAT WHITE SHARKS
Below are the length of a
sample of 44 Great white
Sharks in feet. In your notes
use the steps previously
stated to check if the
length of this sample of
Great White Sharks is
Normally Distributed? Write
a Paragraph describing
your findings. Discuss your
findings with your partner.
18.7
12.3
18.6
16.4
15.7
18.3
14.6
15.8
14.9
17.6
12.1
16.4
16.7
17.8
16.2
12.6
17.8
13.8
12.2
15.2
14.7
12.4
13.2
15.8
14.3
16.6
9.4
18.2
13.2
13.6
15.3
16.1
13.5
19.1
16.2
22.8
16.8
13.6
13.2
15.7
19.7
18.7
13.2
16.8
LESSON SUMMARY
• For describing data we use the C.U.S.S acronym (Refer to slide 2)
• The more samples you add to a histogram the more symmetrical it becomes
the it shapes resembles a bell curve.
• The concept of Normality refers to a distribution of data that has low values,
middle values, and high values in almost equal amounts
• The empirical Rule allows us to calculate the area under the Normal Curve
• Z-scores represent how many standard deviations a data point is away from
the mean
• The Empirical Rule, and histograms are used to check if certain samples of
data might follow a Normal Distribution.
• Great White Shark Example and distribution clipart provided by Mr. Pines’s
Website. Konastats.com Chapter 2 powerpoint.