Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Displaying Data Data Set(s) used for this activity: U.S. Smoking and Class Survey 1 Suppose a medical researcher compares the average blood pressures of women who take oral contraceptives to the blood pressures of women who do not. a. Is blood pressure a categorical variable or a quantitative variable? b. Is oral contraceptive use (or not) a categorical variable or a quantitative variable? c. What variables that affect blood pressure might confuse the comparison of average blood pressures for users and nonusers? That is, what factors affecting blood pressure might differ for users and nonusers. Explain. 2 A statistics class at UC Davis was asked “About how many hours do you watch television per week? A five-number summary of the responses from 173 students follows. Median 6 Quartiles 2 12.5 Extremes 0 100 a. What were the median hours of weekly television watching? In the context of this situation, write a sentence that interprets the median. b. Give the value that completes the following sentence. About 1/4 of the students watch less than ___ hours of television per week. c. Give the value that completes the following sentence. About 1/4 of the students watch more than ___ hours of television per week. d. What is an interval that describes the middle 1/2 of the student’s television watching amounts? e. The mean for these data is 8.9 hours per week. How do you think the mean is calculated? Why do you think it is larger than the median in this instance? 1 3 Using the U.S. Smoking Data: a. Create a stemplot of the Percentage Adult Smokers that smoke in the 50 states and Washington D.C. About where do most states fall, in terms of percent smoking? About what is lowest percent in the dataset? About what is the highest percent in the dataset? What do you notice about the values in the worksheet and the values displayed in the stemplot? How would you describe the shape of this data? b. Find the basic descriptive summary statistics for these percentages. : Mean Percent = Standard Deviation = Median percent = lower quartile (denoted by Q1) = upper quartile ( denoted by Q3) = c. Write a sentence that interprets the median in the context of this situation. d. What value completes the following sentence? In about 1/4 of the states, the percent that smokes is less than _____. e. What interval includes the middle 1/2 of the values of the state smoking percentages? f. Now let us see the effect upon the mean and standard deviation when we add a constant to each value and when we multiply each value by a constant. Using calculator functions in the software we can create a new variable “Plus10” where the data represents 10 being added to each observation for percentage of smokers and a new variable ‘Times10’ where the data represents 10 being multiplied to each observation for percentage of smokers What do you notice about the changes in the mean and standard deviation from the original to the new data? 4 Car and truck speeds at a particular location have approximately a bell-shaped distribution with mean = 65 mph and standard deviation = 5 mph. [Recall from the notes/text that for any bell shaped curve, you will find that roughly 68% of the observations fall within +/- one standard deviation from the mean; 95% of the observations fall within +/- two standard deviations; and 99.7%% of the observations fall within +/three standard deviations from the mean.] 2 a. About 68% of cars and trucks travel between _______ and _______ at this location. b. About 95% of cars and trucks travel between ______ and ________ at this location. c. About 99.7% of cars and trucks travel between ______ and ________ at this location. d. A z-score is a measure of how many standard deviations a value is from the mean. Later in the course, we will see that it is an important measure of the size of a value. The formula is Z = Observed Value - Mean . Standard deviation Determine a z-score for a vehicle speed of 72 mph. e. What vehicle speed has a z-score = −1? Said another way, what vehicle speed is one standard deviation below the mean? (You will need to do some algebra to solve for Observed Value) 5 Using the Class Survey data file: (this data are from a survey given to students to Stat200 courses last semester). You are a researcher and want to use this class survey data to research how PSU undergraduate students compare to these national averages. . a. The purpose of most statistical studies is to use the sample data to generalize to a larger group. What do you think are the weaknesses of using this class survey data for generalizing to all PSU undergraduate students? b. (Importance of checking data). Compute the Descriptive Statistics for SATM and SATV . Note the minimum and maximum value for each. i. From the output, what does the * represent? ii. How many students answered the question regarding their SATM and SATV scores? SATM______ SATV______ c. Now find the Descriptive Statistics for SATM and SATV by Gender (Repeat what you did for part b but now enter Gender in the By Variable window) and use the output to answer the following: Female SATM: Q1 ________ Female SATV: Q1 ________ Male SATM: Q1 ________ Male SATV: Q1 ________ Q3 ________ Q3 ________ Q3 ________ Q3 ________ IQR _________ IQR _________ IQR _________ IQR _________ d. Using the 5-number summary, a data point is considered an outlier on a boxplot if it is either larger than Q3+ (1.5IQR), or smaller than Q1 (1.5IQR). Calculate and identify any outliers for the Female group. 3 SATM: Calculate the value of Q3+ (1.5IQR) = SATM: Calculate the value of Q1 (1.5IQR) = SATV: Calculate the value of Q3+ (1.5IQR) = SATV: Calculate the value of Q1 (1.5IQR) = e Based on the Descriptive Statistics you calculated by Gender and to answer the following: How do the SAT scores from our survey compare across gender? Do you believe that any differences are significant? That is, do you think these differences are large enough that statistically they are the different? 6 Staying with the Class Survey . The column Book Cost are the responses to how much students expected to pay for books that semester. a. Create a Histogram for the variable Book Cost and complete the following sentences. The most frequently reported amount spent was between ___ and ___. Of the 226 students, ___ students said they spent that much. [HINT: place your mouse pointer over the tallest bar.] The second most frequently reported amount spent was between ____ and ____. b. Create a boxplot for the variable Book Cost and answer the following questions: What does the * represent in a boxplot? How many * are there for the variable Book Cost? What are the outlier values? [place your mouse over the * to see the value] What is the 5-number summary for Book Cost? The shape of the data represented by the box plot can be determined by the location of the median bar in the box and by comparing the length of the “whiskers” – the two lines that extend from either end of the box. If the median is in the center and the whiskers are of roughly equal length then the data is symmetrical. If the median is near the bottom of the box and the top whisker is longer, then the distribution is said to be skewed to the 4 right or positively skewed. If the median is near the top of the box and the bottom whisker is longer, then skewed to the left or negatively skewed. What is the shape of Book Cost based on the boxplot? Does this concur with how you would interpret the histogram? 5