Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Level 1 Notes Data Presentation The first requirement to any stats unit is to understand that it is based around a set of data. For the introduction we are going to look at different ways to present data and some of the basic terms. ALWAYS Make sure all values are accounted for. (In this case 24 data points.) Here is the set of data used for the following: The average heart rate for HS stats class are the following: 60 64 74 66 46 90 65 67 75 68 81 69 72 68 73 67 69 67 73 77 55 63 69 63 Let us look at some different ways to present data: Stem and Leaf Plot (also called a Stem Plot) Requirements: Stems can be the first number, the first two numbers, etc… The leafs are only the last number, or number of importance.. There always has to be a key. 4 5 6 7 8 9 6 5 03345677788999 233457 1 0 key 4|6 = 46 beats per min Dotplots Requirements: Every score is represented by a picture or a dot. There is a equal space for every value. x 45 50 x x 55 60 x x x x x x x x x x x x x x x x x x 65 70 75 Avg. Heart Rate, beats per min. x x 80 x 85 90 Frequency Table Requirements: Create Classes, they are the “ranges” of each category. Then determine how many are in each class. If he score is used in more then one class, always count it up. Class 45-55 55-65 65-75 75-85 85-95 Frequency 1 5 14 3 1 Histogram Requirements: need to use a frequency table to present data This is similar to a bar graph, but you need to have a break mark if not starting at (0,0), as well as labeled axes. 14 Frequency of Heart Rates 12 10 8 6 4 2 45 55 65 75 85 Avg. Heart Rate beats per min. 95 Relative Cumulative Frequency Plot Requirements: need to use a frequency table to present data Similar to the histogram in terms of horizontal axis, but this presentation shows percents of scores. Also the first line is where the first score is at, and by the end you are at 100%. You are tallying your total percents from left to right. Step one, look at the frequency table and calculate percents Class Frequency 45-55 1 55-65 5 65-75 14 75-85 3 85-95 1 95.83% 100 R. C. F. • 75 • • % (24 total) 01/24 = .0416 06/24 = .2500 20/24 = .8333 23/24 = .9583 24/24 = .1000 100% 83.3% 50 25 4.1% • • 25% 45 55 65 75 85 Avg. Heart Rate, beats per min. 95 “Math Stuff to Find” Mean: arithmetic average Median: Middle Number (when arranged numerically) Mode: most common, most often used Range: largest value minus smallest value (this is the exact values not the largest and smallest values of the classes Level 1 Notes Data Presentation HWK On a separate sheet of paper, for each of the following create a stem plot, Dot Plot, Frequency Table, Histogram, Relative Cumulative Frequency Plot And find the Mean, Median, Mode, and Range 1. Scores of the all the Superbowl Champions (arranged in order) 52, 49, 48, 46, 43, 42, 39, 38, 38, 37, 35, 35, 35, 34, 34, 34, 33, 32, 32, 31, 31, 31, 31, 30, 29, 27, 27, 27, 27, 27, 26, 24, 24, 24, 23, 23, 21, 21, 21, 20, 20, 20, 17, 16, 16, 16, 14 2. Just Random data (arranged in order) 30,30,30,30,30,30,30,30,32,32,32,32,32,32,32,34,34,34,34,34,34,36,36,36,36,36,38,38,38,38, 40,40,40,40,42,42,42,44,44,44,46,46,46,48,48,50,50,52,52,54,54,56,56,58,60,62,64,66,68,70 3. Just Random data (arranged in order) 30,32,34,36,36,38,38,40,40,40,42,42,42,44,44,44,44,46,46,46,46,46,48,48,48,48,48,48,50,50,50, 50,50,50,52,52,52,52,52,52,54,54,54,54,54,56,56,56,56,58,58,58,60,60,60,62,62,64,64,66,68,70 4. Just Random data (arranged in order) 30,32,34,36,38,40,40,42,42,44,44,46,46,48,48,48,50,50,50,52,52,52,54,54,54,56,56,56,56,58,58,58, 58,58,60,60,60,60,60,62,62,62,62,62,62,64,64,64,64,64,64,66,66,66,66,66,68,68,68,68,70,70,70, 5. Just Random data (not in order) 12, 18, 40, 60, 34, 85, 49, 75, 32, 18, 55, 55, 64, 23, 46, 72, 64, 55, 11, 81, 64, 53, 32, 31, 55, 49, 67, 21 Level 2 Notes Box and Whisker Plots Measure of Variability - A number that represents the spread (or the diversity) of a set of data **The larger the measure of variability, the more the data is spread out Range: Difference between the two extremes Five number summary: Max, Min, Q1, Q2, Q3 Breaking Data into Quartiles: Quartiles - Four groupings of a set of data determined by the median of the set and the medians of the sets determined by the median 1) List all data values in order from least to greatest and find the median (Q2) 2) Take the first half of the data and find the median of that set *That median is called Q1 3) Find the median of the second half set of data *That median is called Q3 ***If there are an odd number of data values, the Q2 will be exact, and will be shared in both the first and second half of the data sets to find Q1 and Q3 Interquartile Range (IQR) The difference between Q3 and Q1 **IQR = Q3 - Q1 Level 2 Homework Give the five number summary for the values in the given set of data, mention any outliers, and draw a box and whisker plot for each. a. {$4.45, $5.50, $5.50, $6.30, $7.80, $11.00, $12.20, $17.20} b. {2,0,0,7,1,0,10,3,93,13,44,170,30} Level 3 Day 1 Notes Standard Deviation Standard Deviation (σ): A measure of the average amount by which individual items of data deviate from the mean of all the data *In plain English: How much all of the data vary compared to the _______________________ If a set of data has a small standard deviation, the data is __________________________ If a set of data has a large standard deviation, the data is _______________________ spread Standard Deviation: The square root of the mean of the squares of the deviation from the arithmetic mean 𝜎=√ ∑ 𝑥−𝑥̅ 𝑛 Steps for finding the Standard Deviation: Step 1: Find the _________________________ of the set of data Step 2: ________________________ the mean from each individual data value Step 3: Square each answer from __________________________________ Step 4: ____________ all answers from step 3 Step 5: Divide the answer from ____________________ by the _____________________ of data values Step 6: Take the ______________________________________ of the answer from step 5 Example 1: Find the standard deviation of the data set {20,47,72,58,16} Example 2: Find the standard deviation of the data set {369, 398, 381, 392, 406, 413, 376, 454, 420, 385, 402, 446} FINDING STANDARD DEVIATION ON YOUR CALCULATOR: Step 1: Enter data values into L1 ie. STAT 1: Edit… Step 2: Press STAT Scroll right to CALC 1:1-Var Stats Step 3: Press ENTER *Standard Deviation is the 𝜎𝑥 symbol Example 3: Find the standard deviation of the set of data manually {23,21,12,10,26} Example 4: Find the standard deviation of the set of data {12,13,93,19,64,18,31,78,1,51,42,19,83,20} Level 4 Notes - Normal Distribution Properties 1. The mean, median, and mode are equal. 2. The normal curve is bell-shaped and is symmetric about the mean. 3. The total area under the curve is equal to one. The area of a region under a probability curve is equal to the probability that the random variable will have a value in the corresponding interval. 4. The normal curve approaches, but never touches, the x-axis as it extends farther and farther away from the mean. 5. 6. Between 𝜇 ± 𝜎 the graph curves downward. The graph curves upward to the left of 𝜇 − 𝜎 and the the right of 𝜇 + 𝜎. The points at which the curve changes direction from curving upward to curving downward are called inflections points. (Essential when drawing normal curves) 7. About 68.3% of the data is contained within 1 standard deviation of the mean. About 95.5% of the data is contained within 2 standard deviations of the mean. About 99.7% of the data is contained within 3 standard deviations of the mean. (This is referred to as the Empirical Rule--- use these percentages in bold) 1. Categorize each distribution as normal or skewed (left or right). 2. Which curve has the greatest variability, A or B? A B 3. Sketch and label a normal curve using the given data x 100 S x 15 4. Use the 68-95-99.7 Rule to find the probability of the shaded area 5. A set of data is normally distributed with a mean of 10 and standard deviation 2. Sketch and label a standard normal curve, and answer the following questions. A. What percent of the data is above 15? B. What percent of the data is between 7 and 11? C. Find the value of X that represents the 80th percentile. 14.4 The Normal Distribution HW 1. Sketch a normal curve with a mean of 75 and a standard deviation of 10. 2. Sketch a normal curve with a mean of 75 and a standard deviation of 5. 3. Which curve (from numbers 1 and 2) displays less variability? Explain your answer. 4. Sketch a curve that represents data that is NOT normally distributed. 5. The mean of a set of normally distributed data is 550 and the standard deviation is 35. a. Sketch a curve that represents the frequency distribution. (Continued from #5) b. What percent of the data is between 515 and 585? c. Name the interval about the mean in which about 99% of the data are located. d. If there are 200 vales in the set of data, how many would be between 480 and 620? 6. A set of 500 values is normally distributed with a mean of 24 and a standard deviation of 2. a. What percent of the data is in the interval 22-26? b. What percent of the data is in the interval 20-30? c. Find the interval about the mean that includes 95% of the data. Level 4 Notes Normal Distribution and Z-scores A majority of the time, individual scores do not fall exactly on 1, 2, or 3 standard deviations from the mean. You can describe where an individual score falls within a distribution be describing that score’s location relative to the mean or median. Percentiles measure location relative to the median. Use z-scores to measure location relative to the mean. The z-score = 𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑜𝑛 = 𝑥−𝑥̅ 𝜎 , is a measure of the position that indicates the number of standard deviatons a value lies from the mean. X, sometimes called the raw score, represents values in the nonstandard normal distribution. Z represents values in the standard normal distribution. Z-scores can be positive or negative. Positive z-scores are above the mean and negative z-scores are below the mean. Percentiles measure location relative to the median. (percentiles is the percent of data from left to right) The fiftieth percentile is the mean. +1 standard deviation from the mean is the 84.15 percentile. -1 standard deviation from the mean is the 15.85 percentile Example 2 – not nice numbers A survey indicates that for each trip to the grocery store, a shopper spends an average 𝑥̅ = 45 minutes with a standard deviation of 𝜎 = 12 minutes. The length of time spent in the store is normally distributed and is represented by the variable x. Draw a normal curve for each situation, express each probability as an inequality and answer the following questions. 9 21 33 45 57 69 81 a. What is the probability that a shopper will be in the store for less than 35 minutes? z-score = 𝑥−𝑥̅ 𝜎 z score = 35−45 12 z score = -.8333 Now using Table A, we find -.83 = the .8 is the row on the left, and the .03 is the column for an answer of .2033 or 20.33% b. What is the probability that a shopper will be in the store for more than 60 minutes? z-score = 𝑥−𝑥̅ 𝜎 z score = 60−45 12 z score = 1.25 Now using Table A, we find 1.25 = the 1.2 is the row on the left, and the .05 is the column for an answer of .8944 or 89.44% c. What is the probability that a shopper will be in the store between 20 and 30 minutes? z-score = 𝑥−𝑥̅ 𝜎 z score = 20−45 12 z score = -2.08 Now using Table A, we find -2.08 = the -2.0 is the row on the left, and the .08 is the column for an answer of .0188 or 1.88% z-score = 𝑥−𝑥̅ 𝜎 z score = 30−45 12 z score = -1.25 Now using Table A, we find -1.25 = the -1.2 is the row on the left, and the .05 is the column for an answer of .1056 or 10.56% Now because we want the percent between the two values, 20 and 30, you subtract the two percents: 10.56-1.88 = 8.68% or .0868 d. What is the probability that a shopper will be in the store between 40 and 47 minutes? z-score = 𝑥−𝑥̅ 𝜎 z score = 40−45 12 z score = -.42 Now using Table A, we find -.42 = the -.4 is the row on the left, and the .02 is the column for an answer of .3372 or 33.72% z-score = 𝑥−𝑥̅ 𝜎 z score = 47−45 12 z score = .17 Now using Table A, we find .17 = the .1 is the row on the left, and the .07 is the column for an answer of .5675 or 56.75% Now because the two values, 40 and 47, are on the same side you subtract the two percents: 56.75-33.72 = 23.03% or .2303 e. What is the interval around the mean that contains 40% of the scores? We need to find two values: 20% above the mean, which is 70%, and 20% below the mean which is 30% First we need to find the correct z score from the table, Looking at right sided graph, find the percent that most closley relates to .70 = that is .6985 The corresponding z score is .52, 0.5 from the row, and .02 from the column z-score = 𝑥−𝑥̅ 𝜎 .52 = 𝑥−45 12 x = 51.24 Now look at the left sided graph, find the percent that most closley relates to .30 = that is .3015 The corresponding z score is -.52, -0.5 from the row, and .02 from the column z-score = 𝑥−𝑥̅ 𝜎 -.52 = 𝑥−45 12 x = 38.76 So the interval around the mean that contains 40% of the scores is 38.75-51.24. Level 3 Day 2 Normal Distribution with Z-scores HW 1. A set of data is normally distributed with a mean of 82 and a standard deviation of 4. a. What is the probability that a data value is less than 88? b. What is the probability that a data value is less than 76? c. What is the probability that a data value is between 76 and 88? d. What is the probability that a data value is greater than 88? e. What is the probability that a data value is greater than 76? 2. The mean of a set of normally distributed data is 402, and the standard deviation is 36. a. What percent of the data is less than 417? b. What percent of the data is between 387 and 417? c. What percent of the data is greater 387? b. What percent of the data is between 362 and 442? 3. A set of data is normally distributed with a mean of 140 and a standard deviation of 20. a. What percent of the data is greater than 105? b. What percent of the data is between 130 and 180? 4. What is the probability of scoring less than a 22 on the ACT, given that the mean is 21.1 and the standard deviation is 5.1? 5. What is the probability of scoring greater than a 25 on the ACT, given that the mean is 21.1 and the standard deviation is 5.1?