Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LAB ACTIVITY 3 Due Friday, Sept. 9 at 11:59pm (Use ‘ClassData_FA2016.MTW’ Dataset in ANGEL) Activity 1: Graphical Summaries for Quantitative Data 1. First we will investigate the variable ‘Height’. Create a histogram for ‘height’: Graph Histogram Simple OK select ‘Height’ OK a. Does it look like there are some data points that are… ridiculous? b. Let’s get rid of the ridiculous data points! Make a boxplot of ‘Height’ using Graph boxplot simple select ‘Height’ OK How many outliers can you see? __________________ Find out which observations represent the five outliers that have height 10 inches or less. You can do this by hovering your mouse over the little stars on the boxplot. This will tell you which observations provided these values. If you can only see a couple outliers, double click on them then select ‘Identical Points’ Jitter OK. What rows are these observations in? __________, __________, _______, _________, and _____________. Now use this information to replace those outlying heights with an asterisk (*). This will tell Minitab to treat them as ‘missing’, and we won’t mess up the analysis. Remember, raw data can be messy! c. After removing these observations, make the histogram again. How would you describe the shape of the distribution? Roughly Symmetric Left-Skewed Right-skewed Let’s get the numerical summary for this variable as well. Stat basic stat display descriptive statistics click in the variables box select ‘Height’ OK d. What is the mean? e. What is the standard deviation (StDev)? f. Complete the 5-number summary below: Minimum = Q1 = Median = Q3 = Maximum = g. Write a sentence that interprets the median. h. Give the value that completes the following sentence. About 1/4 of the students are less than i. Give the value that completes the following sentence: About 1/4 of the students are more than j. ___ inches tall. ___ inches tall. What is an interval that describes the middle one half of the students’ heights? k. Calculate the Inter-Quartile Range (IQR): Q3 – Q1 2. Now let’s compare the variable ‘Height’ for the different genders. Create side by side boxplots: Graph boxplot one y with groups OK put ‘height’ in the box ‘graph variables’ put ‘gender’ in the box ‘categorical variables for grouping’. OK. Now, considering only the boxplots from ‘Male’ and ‘Female’: a. Which gender, M or F, has the highest median? b. Which gender, M or F, has a larger middle box? c. Which gender, M or F, has the most outliers? d. Are there any other noticeable differences between genders in their distribution of height? Creating a side-by-side boxplot like this one is one of the first steps in answering the following question: Is there a statistically significant difference in height between college aged men and women? More on this later in the semester. Activity 2: Empirical Rule 1. Car and truck speeds at a particular location have approximately a bell-shaped distribution with mean = 65 mph and standard deviation = 5 mph. Draw a picture to show the approximate distribution of the speeds. Make sure to label important values on the horizontal axis such as the mean, the mean plus and minus one standard deviation, the mean plus and minus two standard deviations, and the mean plus and minus three standard deviations. a. About 68% of cars and trucks travel between ______ and _____ mph at this location. b. About 95% of cars and trucks travel between _____ and _______ mph at this location. c. About 99.7% of cars and trucks travel between _____ and _____ mph at this location. 2. The typical amount of sleep per night for college students has a bell-shaped distribution with a mean of 7 hours and a standard deviation equal to 1.7 hours. Use the empirical rule (and a picture, if you want) to complete the following sentences: a. About 68% of college students typically sleep between b. About 95% of college students typically sleep between c. About 99.7% of college students typically sleep between and and and hours. hours. hours. 3. A sample of n=500 individuals is asked how many hours they typically spend using a computer in a week. The mean response is 8.3 hours and the standard deviation is 7.2 hours. Explain why the distribution cannot be bell-shaped.