Download Statistics Packet/Project Levels 1-4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Level 1 Notes
Data Presentation
The first requirement to any stats unit is to understand that it is based around a set of data. For the
introduction we are going to look at different ways to present data and some of the basic terms.
ALWAYS Make sure all values are accounted for. (In this case 24 data points.)
Here is the set of data used for the following:
The average heart rate for HS stats class are the following:
60
64
74
66
46
90
65
67
75
68
81
69
72
68
73
67
69
67
73
77
55
63
69
63
Let us look at some different ways to present data:
Stem and Leaf Plot (also called a Stem Plot)
Requirements:
Stems can be the first number, the first two numbers, etc…
The leafs are only the last number, or number of importance..
There always has to be a key.
4
5
6
7
8
9
6
5
03345677788999
233457
1
0
key
4|6 = 46 beats per min
Dotplots
Requirements:
Every score is represented by a picture or a dot. There is a equal space for every value.
x
45
50
x
x
55
60
x x
x
x x x
x x x x x x x
x
x x x x
65
70
75
Avg. Heart Rate, beats per min.
x
x
80
x
85
90
Frequency Table
Requirements:
Create Classes, they are the “ranges” of each category. Then determine how many are in each class. If
he score is used in more then one class, always count it up.
Class
45-55
55-65
65-75
75-85
85-95
Frequency
1
5
14
3
1
Histogram
Requirements: need to use a frequency table to present data
This is similar to a bar graph, but you need to have a break mark if not starting at (0,0), as well as labeled axes.
14
Frequency of Heart Rates
12
10
8
6
4
2
45
55
65
75
85
Avg. Heart Rate beats per min.
95
Relative Cumulative Frequency Plot
Requirements: need to use a frequency table to present data
Similar to the histogram in terms of horizontal axis, but this presentation shows percents of scores.
Also the first line is where the first score is at, and by the end you are at 100%. You are tallying your
total percents from left to right.
Step one, look at the frequency table and calculate percents
Class
Frequency
45-55
1
55-65
5
65-75
14
75-85
3
85-95
1
95.83%
100
R. C. F.
•
75
•
•
% (24 total)
01/24 = .0416
06/24 = .2500
20/24 = .8333
23/24 = .9583
24/24 = .1000
100%
83.3%
50
25
4.1%
•
•
25%
45
55
65
75
85
Avg. Heart Rate, beats per min.
95
“Math Stuff to Find”
Mean: arithmetic average
Median: Middle Number (when arranged numerically)
Mode: most common, most often used
Range: largest value minus smallest value (this is the exact values not the largest and smallest values of
the classes
Level 1 Notes
Data Presentation HWK
On a separate sheet of paper, for each of the following create a stem plot, Dot Plot,
Frequency Table, Histogram, Relative Cumulative Frequency Plot
And find the Mean, Median, Mode, and Range
1. Scores of the all the Superbowl Champions (arranged in order)
52, 49, 48, 46, 43, 42, 39, 38, 38, 37, 35, 35, 35, 34, 34, 34, 33, 32, 32, 31, 31, 31, 31, 30,
29, 27, 27, 27, 27, 27, 26, 24, 24, 24, 23, 23, 21, 21, 21, 20, 20, 20, 17, 16, 16, 16, 14
2. Just Random data (arranged in order)
30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,34,34,34,34,34,34,36,36,36,36,36,38,38,38,38,
40,40,40,40,42,42,42,44,44,44,46,46,46,48,48,50,50,52,52,54,54,56,56,58,60,62,64,66,68,70
3. Just Random data (arranged in order)
30,32,34,36,36,38,38,40,40,40,42,42,42,44,44,44,44,46,46,46,46,46,48,48,48,48,48,48,50,50,50,
50,50,50,52,52,52,52,52,52,54,54,54,54,54,56,56,56,56,58,58,58,60,60,60,62,62,64,64,66,68,70
4. Just Random data (arranged in order)
30,32,34,36,38,40,40,42,42,44,44,46,46,48,48,48,50,50,50,52,52,52,54,54,54,56,56,56,56,58,58,58,
58,58,60,60,60,60,60,62,62,62,62,62,62,64,64,64,64,64,64,66,66,66,66,66,68,68,68,68,70,70,70,
5. Just Random data (not in order)
12, 18, 40, 60, 34, 85, 49, 75, 32, 18, 55, 55, 64, 23,
46, 72, 64, 55, 11, 81, 64, 53, 32, 31, 55, 49, 67, 21
Level 2 Notes
Box and Whisker Plots
Measure of Variability - A number that represents the
spread (or the diversity) of a set of data
 
**The larger the measure of variability, the more the
data is spread out
Range: Difference between the two extremes
Five number summary: Max, Min, Q1, Q2, Q3
Breaking Data into Quartiles:
Quartiles - Four groupings of a set of data determined by the median of the set
and the medians of the sets determined by the median
1) List all data values in order from least to greatest and find the median (Q2)
 
2) Take the first half of the data and find the median of that set
*That median is called Q1
3) Find the median of the second half set of data
 *That median is called Q3
***If there are an odd number of data values, the Q2 will be exact, and will be
shared in both the first and second half of the data sets to find Q1 and Q3
Interquartile Range (IQR)
The difference between Q3 and Q1
**IQR = Q3 - Q1
Level 2 Homework
Give the five number summary for the values in the given set of data, mention any outliers, and
draw a box and whisker plot for each.
a. {$4.45, $5.50, $5.50, $6.30, $7.80, $11.00, $12.20, $17.20}
b. {2,0,0,7,1,0,10,3,93,13,44,170,30}
Level 3 Day 1 Notes
Standard Deviation

Standard Deviation (σ): A measure of the average amount by which individual
items of data deviate from the mean of all the data
*In plain English: How much all of the data vary compared to the
_______________________


If a set of data has a small standard deviation, the data is
__________________________

If a set of data has a large standard deviation, the data is
_______________________ spread
Standard Deviation: The square root of the mean of the squares of the deviation
from the arithmetic mean
𝜎=√
∑ 𝑥−𝑥̅
𝑛
Steps for finding the Standard Deviation:
Step 1: Find the _________________________ of the set of data
Step 2: ________________________ the mean from each individual data value
Step 3: Square each answer from __________________________________
Step 4: ____________ all answers from step 3
Step 5: Divide the answer from ____________________ by the _____________________ of data values
Step 6: Take the ______________________________________ of the answer from step 5
Example 1: Find the standard deviation of the data set
{20,47,72,58,16}
Example 2: Find the standard deviation of the data set
{369, 398, 381, 392, 406, 413, 376, 454, 420, 385, 402, 446}
FINDING STANDARD DEVIATION ON YOUR CALCULATOR:
Step 1: Enter data values into L1
ie. STAT  1: Edit…
Step 2: Press STAT Scroll right to CALC 1:1-Var Stats
Step 3: Press ENTER
*Standard Deviation is the 𝜎𝑥 symbol
Example 3: Find the standard deviation of the set of data manually
{23,21,12,10,26}
Example 4: Find the standard deviation of the set of data
{12,13,93,19,64,18,31,78,1,51,42,19,83,20}
Level 4 Notes - Normal Distribution
Properties
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and is symmetric about the mean.
3. The total area under the curve is equal to one. The area of a region under a probability
curve is equal to the probability that the random variable will have a value in the
corresponding interval.
4. The normal curve approaches, but never touches, the x-axis as it extends farther and
farther away from the mean.
5.
6. Between 𝜇 ± 𝜎 the graph curves downward. The graph curves upward to the left of 𝜇 −
𝜎 and the the right of 𝜇 + 𝜎. The points at which the curve changes direction from
curving upward to curving downward are called inflections points. (Essential when
drawing normal curves)
7. About 68.3% of the data is contained within 1 standard deviation of the mean.
About 95.5% of the data is contained within 2 standard deviations of the mean.
About 99.7% of the data is contained within 3 standard deviations of the mean.
(This is referred to as the Empirical Rule--- use these percentages in bold)
1. Categorize each distribution as normal or skewed (left or right).
2. Which curve has the greatest variability, A or B?
A
B
3. Sketch and label a normal curve using the given data
x  100 S x  15
4. Use the 68-95-99.7 Rule to find the probability of the shaded area
5. A set of data is normally distributed with a mean of 10 and standard deviation 2.
Sketch and label a standard normal curve, and answer the following questions.
A. What percent of the data is above 15?
B. What percent of the data is between 7 and 11?
C. Find the value of X that represents the 80th percentile.
14.4 The Normal Distribution HW
1. Sketch a normal curve with a mean of 75 and a standard deviation of 10.
2. Sketch a normal curve with a mean of 75 and a standard deviation of 5.
3. Which curve (from numbers 1 and 2) displays less variability? Explain your answer.
4. Sketch a curve that represents data that is NOT normally distributed.
5. The mean of a set of normally distributed data is 550 and the standard deviation is 35.
a. Sketch a curve that represents the frequency distribution.
(Continued from #5)
b. What percent of the data is between 515 and 585?
c. Name the interval about the mean in which about 99% of the data are located.
d. If there are 200 vales in the set of data, how many would be between 480 and 620?
6. A set of 500 values is normally distributed with a mean of 24 and a standard deviation of 2.
a. What percent of the data is in the interval 22-26?
b. What percent of the data is in the interval 20-30?
c. Find the interval about the mean that includes 95% of the data.
Level 4 Notes
Normal Distribution and Z-scores
A majority of the time, individual scores do not fall exactly on 1, 2, or 3 standard deviations
from the mean. You can describe where an individual score falls within a distribution be
describing that score’s location relative to the mean or median. Percentiles measure location
relative to the median. Use z-scores to measure location relative to the mean.
The z-score =
𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑜𝑛
=
𝑥−𝑥̅
𝜎
, is a measure of the position that indicates the
number of standard deviatons a value lies from the mean. X, sometimes called the raw score,
represents values in the nonstandard normal distribution. Z represents values in the standard
normal distribution. Z-scores can be positive or negative. Positive z-scores are above the mean
and negative z-scores are below the mean.
Percentiles measure location relative to the median. (percentiles is the percent of data from
left to right)
The fiftieth percentile is the mean.
+1 standard deviation from the mean is the 84.15 percentile.
-1 standard deviation from the mean is the 15.85 percentile
Example 2 – not nice numbers
A survey indicates that for each trip to the grocery store, a shopper spends an average 𝑥̅ = 45
minutes with a standard deviation of 𝜎 = 12 minutes. The length of time spent in the store is
normally distributed and is represented by the variable x. Draw a normal curve for each
situation, express each probability as an inequality and answer the following questions.
9 21 33 45 57 69 81
a. What is the probability that a shopper will be in the store for less than 35 minutes?
z-score =
𝑥−𝑥̅
𝜎
z score =
35−45
12
z score = -.8333
Now using Table A, we find -.83 = the .8 is the row on the left, and the .03 is the column
for an answer of .2033 or 20.33%
b. What is the probability that a shopper will be in the store for more than 60 minutes?
z-score =
𝑥−𝑥̅
𝜎
z score =
60−45
12
z score = 1.25
Now using Table A, we find 1.25 = the 1.2 is the row on the left, and the .05 is the
column for an answer of .8944 or 89.44%
c. What is the probability that a shopper will be in the store between 20 and 30 minutes?
z-score =
𝑥−𝑥̅
𝜎
z score =
20−45
12
z score = -2.08
Now using Table A, we find -2.08 = the -2.0 is the row on the left, and the .08 is the
column for an answer of .0188 or 1.88%
z-score =
𝑥−𝑥̅
𝜎
z score =
30−45
12
z score = -1.25
Now using Table A, we find -1.25 = the -1.2 is the row on the left, and the .05 is the
column for an answer of .1056 or 10.56%
Now because we want the percent between the two values, 20 and 30, you subtract the
two percents:
10.56-1.88 = 8.68% or .0868
d. What is the probability that a shopper will be in the store between 40 and 47 minutes?
z-score =
𝑥−𝑥̅
𝜎
z score =
40−45
12
z score = -.42
Now using Table A, we find -.42 = the -.4 is the row on the left, and the .02 is the column
for an answer of .3372 or 33.72%
z-score =
𝑥−𝑥̅
𝜎
z score =
47−45
12
z score = .17
Now using Table A, we find .17 = the .1 is the row on the left, and the .07 is the column
for an answer of .5675 or 56.75%
Now because the two values, 40 and 47, are on the same side you subtract the two
percents:
56.75-33.72 = 23.03% or .2303
e. What is the interval around the mean that contains 40% of the scores?
We need to find two values:
20% above the mean, which is 70%, and 20% below the mean which is 30%
First we need to find the correct z score from the table,
Looking at right sided graph, find the percent that most closley relates to .70 = that is
.6985
The corresponding z score is .52, 0.5 from the row, and .02 from the column
z-score =
𝑥−𝑥̅
𝜎
.52 =
𝑥−45
12
x = 51.24
Now look at the left sided graph, find the percent that most closley relates to .30 = that
is .3015
The corresponding z score is -.52, -0.5 from the row, and .02 from the column
z-score =
𝑥−𝑥̅
𝜎
-.52 =
𝑥−45
12
x = 38.76
So the interval around the mean that contains 40% of the scores is 38.75-51.24.
Level 3 Day 2 Normal Distribution with Z-scores HW
1. A set of data is normally distributed with a mean of 82 and a standard deviation of 4.
a. What is the probability that a data value is less than 88?
b. What is the probability that a data value is less than 76?
c. What is the probability that a data value is between 76 and 88?
d. What is the probability that a data value is greater than 88?
e. What is the probability that a data value is greater than 76?
2. The mean of a set of normally distributed data is 402, and the standard deviation is
36.
a. What percent of the data is less than 417?
b. What percent of the data is between 387 and 417?
c. What percent of the data is greater 387?
b. What percent of the data is between 362 and 442?
3. A set of data is normally distributed with a mean of 140 and a standard deviation of
20.
a. What percent of the data is greater than 105?
b. What percent of the data is between 130 and 180?
4. What is the probability of scoring less than a 22 on the ACT, given that the mean is
21.1 and the standard deviation is 5.1?
5. What is the probability of scoring greater than a 25 on the ACT, given that the mean
is 21.1 and the standard deviation is 5.1?