Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Workshop – Day 1 Reviewing Some Concepts Part 1 – Snapshot of the Annenberg Series: “Against all Odds” This is a great FREE resource! Later on remember to register. http://mcmath.blogdrive.com/ Scroll down to the section “Free Online Annenberg Video Series on Teaching/Learning Math” Click on the Annenberg Project Video Series – Statistics Click on the VoD logo and register Part 2 – Read and Comment Literary Digest For the 1936 presidential election, Literary Digest conducted a poll to determine the winner. Over 10 million questionnaires were sent to those who owned automobiles and/or telephones. Over 2.4 million questionnaires were returned and Literary Digest predicted that Alf Landon would defeat Franklin D. Roosevelt with 57% of the vote. George Gallup also conducted a poll of 50,000 random voters and predicted Roosevelt as the winner. Many people laughed at Gallup because Literary Digest had been correctly predicting the outcome of the presidential election since 1916 and based its predictions on such a large sample. Gallup was correct and Roosevelt won with 62% of the vote. Where Literary Digest go wrong? In the end, Literary Digest went bankrupt and Gallup started his own company, which still predicts the elections. Part 3 – Note: Tutorials are also available in my web page: http://www.montgomerycollege.edu/faculty/~maronne/ Review some concepts by means of Tutorial (1), then comment on the following: - Distinction between: - Population and sample - Parameter and statistic - Descriptive and inferential statistics - Why sampling? - Why random selection? - Importance of simple random samples - Biased, unbiased sampling techniques 1 Statistics Workshop – Day 1 Selecting Random Samples (Section 1.4) Simulating Experiments Describing Data Sets with Tables and Graphs (Sections 2.2, 2.3) Part 4 - Select 5 students at random from your Statistics class. 4-a) Use the TI-83/84 calculator to generate 5 different random integers from 1 to 28. The instruction in the home screen of your calculator should read: randInt(1,28) (Need help? See calculator section, item 1) 4-b) List the five numbers obtained: 4-c) Check with a classmate. Are his/her numbers the same as yours? Explain. 4-d) Check with the class roster shown on the transparency to name the students selected. 4-e) Comment on the importance of random selection Part 5 - Use random numbers to simulate rolling a “fair” six-sided die 60 times. 5-a) Use the calculator to simulate rolling a fair die once. Indicate the instruction entered in the calculator. Press ENTER a few times, observe the outcomes and reflect on what you are doing. 5-b) There is a shortcut to simulate rolling the die 60 times. We are going to Clear a list (L1), generate 60 integers from 1 to 6 and, store the numbers into L1. We’ll access the editor to explore the list and record the outcomes in a table. The instruction in the home screen of your calculator should read: ClrList L1:randInt(1,6,60)→ L1 Note: we use “:” (colon) to separate statements (Need help? See calculator section, item 2) ......... .......... 5-b) Do you have any suggestions to make the counting process easier? 5-c) Think on a way of graphing the information contained in this table. Show graph above. Counting is tedious; in the next page you are given instructions to get help from the TI-83 to determine the “counts”. First we need to review some vocabulary. 2 Statistics Workshop – Day 1 Constructing Frequency Tables and Histograms (Sections 2.2, 2.3) Part 6 - Review some concepts by means of Tutorial (2), then comment on the following: - Why frequency distributions? Advantages and disadvantages If you have to choose a representative of a class, what number is a good choice? Why class boundaries? Part 7 - Use the calculator to sketch the histogram of the data stored in L1. Trace the histogram to read the frequencies of the classes. Display results below. (Need help? See calculator section, item 3) Frequency Distribution. Label Histogram. Label .................. Part 8 – Reflect in what we have done and comment on the following: - Randomness Equally Likely Outcomes (Chapter 3) Unpredictability of a single outcome Long run regularity Law of large numbers (Chapter 3) Uniform distribution Theoretical distribution Sampling error Part 9 – Copy data into another list To keep the generated data available for future use, we’d like to copy it into another list labeled RNDIE. (Need help? See calculator section, item 4) 3 Statistics Workshop – Day 1 Law of Large Numbers (Chapter 3) Collecting Class Results Sketching Histograms using Grouped Data (Sections 2.2, 2.3) Part 10: Collecting class’ results. 10-a) In part 7, you constructed the frequency distribution for the simulation of the experiment of rolling a die 60 times. Look up your results and write them on the board. We’ll produce a new frequency distribution with the results of the class, and graph the corresponding histogram by hand. What do you expect the shape of this new distribution to be? Refer to the Law of Large numbers in your explanation. Class Results Frequency Distribution Numbers Show the histogram. Label Frequencies 4 Statistics Workshop – Day 1 Finding the Mean and Standard Deviation (Section 2.4) The Standard Deviation as a Ruler (Section 2.4) Range Rule of Thumb (Section 2.4) Part 11 - Review some concepts by means of Tutorial (3), then comment on the following: - Mean versus median in a skewed distribution - Resistant measure of the center - Sum of the deviations, the mean as a “fulcrum” - Difference between the formulas for standard deviation for a sample and population - Usual and unusual values according to the range rule of thumb - Empirical rule - Chebyshev’s theorem Part 12: Mean and standard deviation of grouped data 12-a) Find the mean and standard deviation of the distribution of class’ results shown on the previous page. The instruction in the home screen of your calculator should read: mean(L1, L2) (or stdDev(L1, L2)) (Need help? See calculator section item 5) mean (L1,L2) = standard deviation (L1, L2) = 12-b) Think about our experiment of rolling a die and recording the outcome. Is it unusual to roll a 6? Are any of the outcomes unusual? Use your intuition to answer. 12-c) Use the values of the mean and standard deviation to label the scale given below. Use the range rule of thumb to comment on usual and unusual values. Do the results agree with your answer to part 12b? ___|__________|__________|__________|__________|__________|__________| x -3s x -2s x -s x x +s x +2s x +3s 5 Statistics Workshop – Day 1 Probability Distributions and Histograms (Chapter 4) Mean and Standard Deviation of Probability Distributions (Chapter 4) Part 13 - Review some concepts by means of Tutorial (4), then comment on the following: - Similarities/differences between relative frequency distributions/histograms and probability distributions/histograms - Is the random variable in our experiment of rolling a die and recording the outcome, discrete or continuous? - Correspondence between areas and probabilities - Requirements for a probability distributions compared to what you learned about relative frequencies - Formula for finding the mean of a probability distribution compared to the one used for finding the mean of a frequency distribution - Range rule of thumb for determining unusual results compared to the probability rule - How to use the calculator to find the mean and standard deviation of probability distributions 6 Statistics Workshop – Day 1 Relative Frequency Distributions and Histograms (Sections 2.2, 2.3) Probability Distributions and Histograms (Section 4.2) Mean and Standard Deviation of Probability Distributions (Section 4.2) Part 14: Use the class results from part 10. 14-a) Construct a relative frequency distribution and a relative frequency histogram. 14-b) Construct a probability distribution and probability histogram. 14-c) Construct the theoretical probability distribution and histogram for the experiment of rolling a die and recording the outcome. 14-d) Find the mean and standard deviations for the distributions of parts (b) and (c). (Need help? See calculator section items 5 & 6) (a) Relative Frequency Distribution from Class Results (b) Probability Distribution from Class’ Results Numbers Obtained Random Variable, x Relative Frequencies*100 (%) Relative Frequency Histogram. Label Probability, P(x) Probability Histogram Label Mean = St. Deviation = (c) Theoretical Probability Distribution Random Variable, x Probability, P(x) Probability Histogram Label Mean = St. Deviation = 7 Statistics Workshop – Day 1 Distribution of Sample Means (Section 5.5) Central Limit Theorem (Section 5.5) Part 15 - Consider the theoretical uniform distribution of the experiment of rolling a die and recording the outcome. The mean and standard deviation of this population was obtained in part 14-c. The parameters for that population are: μ = σ= 15-a) Think of the list RNDIE that you have in your calculator, as a sample that was selected at random from this population. Find the mean of RNDIE and write your result here: x = 15-b) Since each of us has a RNDIE list, we can say that we have selected 28 samples of size 60 from this theoretical population. We are going to enter each of the 28 sample means in the overhead calculator. Let’s use list 6. We have created a new distribution; which is the distribution of sample means for samples of size 60. Before doing that, just think about this new distribution of sample means for samples of size 60: Comment on the shape, the mean and the standard deviation. How do you think they compare to the shape, mean and standard deviation of the original uniform distribution? 15-c) Let’s sketch a histogram for the distribution of sample means for samples of size 60. Observe its shape, center and variability. Is it what you predicted? 15-d) Let’s find the mean of the distribution of sample means; which is stored in the list 6 of the overhead calculator. x = How does it compare to μ? Part 16 - Review some concepts by means of Tutorial (5), then comment on the following: - Distribution of sample means - Mean and standard deviation of the distribution - Central Limit Theorem 8 Statistics Workshop – Day 1 Distribution of Sample Means, Small Sample Size (Section 5.5) Central Limit Theorem (Section 5.5) Part 17 – If time permits, simulate rolling a die 10 times, store the numbers into L2, and find the mean of L2. Collect the class’ results to generate the distribution of sample means for samples of size 10. Sketch the histogram and observe its shape. Find the mean and standard deviation of this distribution. Compare your results with what is predicted by the Central Limit Theorem. 9 Statistics Workshop – Day 1 Importance of the Class Width (Section 2.3) Part 18: Histograms Observe how the selection of the class width “changes” the “story” portrayed by the graph. Decide what class width provides the best picture of the data. Access my web page: http://montgomerycollege.edu/~maronne/ Click on Statistics Workshop Click on Applets Click on Histogram 18-1) Assume the data represents grades of students in a test. a) What is a convenient number to use as the class width? b) What class width is convenient to use if we want to know if there are any students who scored above 95%? 18-2) Assume the data represent the number of cars that go through a busy intersection from 4 am until 10 am. To avoid entering new data that fits this situation, and to be able to use the given histogram, we’ll have to make the assumption that 40 = 4 a.m., 50 = 5 a.m., etc. A class width of 10 will mean 1-hour intervals. A class width of 5 will mean .....-minute intervals A class width of 2.5 will mean .........-minute intervals A class width of 1 will mean ............-minute intervals Change the class width from 10 to 5, to 2.5, to 1. You can use the slider, but it’s more exact if you just type the number and press enter. 18-2-a) Give the time interval in which the most cars go through the intersection if you use i) A class width of 10: ii) A class width of 5: iii) A class width of 2.5: iv) A class width of 1: 18-2-b) You cross the intersection sometime after 5:30 a.m. What is the most convenient time interval to go through the intersection? 18-2-c) What if you are in that area sometime between 6 and 7 a.m.? 18-2-d) What is the best choice of class width if we want to pinpoint the rush hour and avoid the time when the most cars go through the intersection? 10 Statistics Workshop – Day 1 Mean versus Median (Section 2.4) Part 19: Mean and Median 19-1) Access my web page: http://montgomerycollege.edu/~maronne/ Click on Statistics Workshop Click on Applets Click on Mean and Median 1 Click on Mean and Median Read Instructions and play with it Make sure you drag a point along the line 19-1-a) Observe what happens. Explain. 19-1-b) Which of the two measures of the center is said to be resistant? Explain the meaning of this term. 19-2) Exploring Mean and Median Objective: To stress the concept: The median is a resistant measure of the center while the mean is affected by extreme values. Access the Applets window in my web page Click on Mean and Median 2 Click on Mean versus Median Read instructions and play with it 19-2-a) Salaries of U.S. households are skewed to the .................... If you were reporting results about this population, what measure of the center would you use? Explain. 19-2-b) Dates of coins Suppose you and your friends emptied your pockets of coins and recorded the year marked on each coin. What do you think the shape of the distribution looks like? Explain. What measure of the center is more appropriate to use, the mean or the median? Explain why. Part 20 - Review some concepts by means of Tutorial (6). 11 Statistics Workshop – Day 1 Box Plots (Section 2.7) Part 21 – Using Box Plots to Explore Data 21-1) The data show the average amount of money spent per student in public elementary schools for each of the 50 states and the District of Columbia. The categories are region: S = south, W = west, NE = northeast, MW = mid-west. Source: National Center for Education Statistics We’ll explore the box plot(s) for the Amount (in dollars) Spent per Student in Public Elementary Schools by States. First we’ll look into the Graph for All Data, then, we’ll explore the Graphs by Category. Access my web page: http://montgomerycollege.edu/~maronne/ Click on Statistics Workshop Click on Applets Click on Box Plots On the “Select a Data set” drop down menu, Select “Amount Spent Per Student” Scroll down and click on “Graph by Category”, and then on “Graph all Data” Observe the changes Scroll down and in the window at the bottom, you can actually see the individual values 21-1-a) Name the variables in this data set. 21-1-b) How would you describe the data: as qualitative or quantitative? 21-1-c) Overall, how many observations are there? 21-1-d) How many observations per region? 21-1-e) Select Graph All Data - Write down the 5-number summary Category N min Q1 Median Q3 Max. - How would do you describe the distribution’s shape? Symmetric, skewed to the right, skewed to the left? 12 Statistics Workshop – Day 1 Box Plots 21-1-f) Select Graph by Category - Write down the 5-number summary for each category Category N min Q1 Median Q3 Max. - Use the back of this paper to answer the following questions: 1. Which of the 4 categories is “closer” to a symmetric distribution? 2. Which of the 4 categories has an outlier? (Even though it’s not indicated as one?) 3. Overall, what was the median amount spent per student? 4. How does the median amount spent per student in the NE compare to the other regions? 5. Which region spends the most per student? 6. Which spends the least? 7. How do the middle 50% of the data compare for the different regions? 8. Refer to the box plot of the South to answer the following. Explain your choice. - True or False? The length of the different portions of the box suggest that in the selected sample, there are more schools with expenditures above $5109 (Q3), than with expenditures below $5109. Part 22 – Load some data sets into your calculator (Need help? See calculator section, item 7) Here are the data sets that will be loaded into the calculator. Diet Coke Volume = CDTVL Diet Coke Weight = CDTWT Regular Coke Volume = CRGVL Regular Coke Weight = CRGWT Diet Pepsi Volume = CDTVL Diet Pepsi Weight = CDTWT Regular Pepsi Volume = CRGVL Regular Pepsi Weight = CRGWT Head circumferences (cm) of Two-Month-Old Baby-Boys = MHED Head circumferences (cm) of Two-Month-Old Baby-Girls = FHED 13 Statistics Workshop – Day 1 Describing Data Sets Part 23 - Exploring the distribution of volumes of Regular Coke Cans Before constructing any graphs, think about the following: 23-a) What is the volume posted in a regular Coke can? 23-b) Think on selecting a sample of regular Coke cans, recording their volume, and using the calculator to sketch the histogram. What do you think the histogram will look like? What shape will this distribution have? 23-c) Load the data set into the editor of the calculator. (Need help? See calculator section item 8) 23-d) Construct a histogram letting your calculator select the window. Is it what you expected? As usual, expand in your explanations. (Need help? See calculator section, item 4, Note 9) 23-e) Show the frequency distribution and histogram here. If you think it necessary, modify the window values. 23-f) Open a second STAT PLOT to construct the box plot for the data. You may want to use a large Ymax value in the window of the calculator to fit both graphs. Use both graphs to describe some characteristics of the data set. (Need help? See calculator section item 9) 14 Statistics Workshop – Day 1 Histograms and Box Plots (Sections 2.3, and 2.7) Part 24 – Comparing Weights of Diet Coke and Regular Coke by using Box Plots Before constructing any graphs, think about both box plots. 24-a) Do you think they will have the same length (range)? Will they have the same minimum and maximum, or one of the plots will be farther to the right of the other? If so, which will be to the right? 24-b) Construct a box plot for each of the distribution of the weights of regular and diet Coke. Display both plots in the same window. Is it what you predicted? Compare the graphs and determine whether there appears to be a significant difference between the two distributions. If so, provide a possible explanation for the difference. Use the scale provided below as a guide to sketch the box plots. Record the 5 number summary and the outliers for each of the distributions. Also mention the smallest and largest number of the distributions which are not outliers. _____|_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|___ Part 25 – Use graphs to compare the head circumference of two months old baby boys and girls. As usual, think on the situation, make some conjectures and then verify with the graph. Comment on the results. 15 Statistics Workshop – Day 1 Histograms and Box Plots (Sections 2.3, and 2.7) Part 26 – The Story of Old Faithful Use the data sets stored in your calculator: Intervals between eruptions (in minutes), Old Faithful Geyser, Yellowstone National Park = OFINT Duration of eruptions (in seconds), Old Faithful Geyser = OFDTN 26-a) Use a histogram and a box plot to graph the distribution of time between eruptions which is stored in the calculator (OFINT). Comment on the shape and other characteristics of the data. Notice the advantage of showing both graphs. Some characteristics are captured in one of the graphs and hidden on the other. Note: The distribution of intervals between eruptions was close to symmetric before the 1959 earthquake. The data in our calculator was collected after this earthquake. The mean interval between eruptions has remained steady at about 65 minutes for the past 100 years, but the earthquake has changed the distribution of eruption intervals. The range of the distribution has also remained steady, but the standard deviation of the eruption intervals has also changed. Which of the two distributions of eruption intervals has a larger standard deviation, the one before the 1959 earthquake or the one after? 26-b) In the same window of the calculator display the box plot and the histogram for the distribution of the duration of eruptions. (OFDTN). Comment on the results. 16 Statistics Workshop – Day 1 Data Part 27 – Here is a revision to Part 21 Here are the data sets that will be loaded into the calculator. AGE = age of students in two Statistics classes CDTVL = Diet Coke Volume (oz) CDTWT = Diet Coke Weight (lb) CRCTY = City fuel consumption (mi/gal) CRGVL = Regular Coke Volume (oz) CRGWT = Regular Coke Weight (lb) CRHWY = Highway Fuel Consumption (mi/gal) FHED = Head circumferences of Two-Month-Old Baby-Girls (cm) HMSP = Selling Price of Homes in Dutchess County, NY (thousand dollars) MHED = Head circumferences of Two-Month-Old Baby-Boys (cm) OFDTN = Duration of eruptions of Old Faithful Geyser (seconds) OFHT = Height of eruptions of Old Faithful Geyser (feet) OFINT = Intervals between eruptions of Old Faithful Geyser (minutes) PDTVL = Diet Pepsi volume (oz) PDTWT = Diet Pepsi weight (lb) PRGVL = Regular Pepsi volume (oz) PRGWT = Regular Pepsi weight (lb) QRTRS = Weight of quarters 17