Download display

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Displaying Data
Data Set(s) used for this activity: U.S. Smoking and Class Survey
1 Suppose a medical researcher compares the average blood pressures of women who take oral
contraceptives to the blood pressures of women who do not.
a. Is blood pressure a categorical variable or a quantitative variable?
b. Is oral contraceptive use (or not) a categorical variable or a quantitative variable?
c. What variables that affect blood pressure might confuse the comparison of average blood pressures for
users and nonusers? That is, what factors affecting blood pressure might differ for users and nonusers.
Explain.
2 A statistics class at UC Davis was asked “About how many hours do you watch television per week? A
five-number summary of the responses from 173 students follows.
Median
6
Quartiles
2
12.5
Extremes
0
100
a. What were the median hours of weekly television watching? In the context of this situation, write a
sentence that interprets the median.
b. Give the value that completes the following sentence. About 1/4 of the students watch less than ___
hours of television per week.
c. Give the value that completes the following sentence. About 1/4 of the students watch more than ___
hours of television per week.
d. What is an interval that describes the middle 1/2 of the student’s television watching amounts?
e. The mean for these data is 8.9 hours per week.
How do you think the mean is calculated?
Why do you think it is larger than the median in this instance?
1
3 Using the U.S. Smoking Data:
a. Create a stemplot of the Percentage Adult Smokers that smoke in the 50 states and Washington D.C.
About where do most states fall, in terms of percent smoking?
About what is lowest percent in the dataset?
About what is the highest percent in the dataset?
What do you notice about the values in the worksheet and the values displayed in the stemplot?
How would you describe the shape of this data?
b. Find the basic descriptive summary statistics for these percentages. :
Mean Percent =
Standard Deviation =
Median percent =
lower quartile (denoted by Q1) =
upper quartile ( denoted by Q3) =
c. Write a sentence that interprets the median in the context of this situation.
d. What value completes the following sentence? In about 1/4 of the states, the percent that smokes is less
than _____.
e. What interval includes the middle 1/2 of the values of the state smoking percentages?
f. Now let us see the effect upon the mean and standard deviation when we add a constant to each value
and when we multiply each value by a constant. Using calculator functions in the software we can create
a new variable “Plus10” where the data represents 10 being added to each observation for percentage of
smokers and a new variable ‘Times10’ where the data represents 10 being multiplied to each observation
for percentage of smokers
What do you notice about the changes in the mean and standard deviation from the original to the new
data?
4 Car and truck speeds at a particular location have approximately a bell-shaped distribution with mean = 65
mph and standard deviation = 5 mph. [Recall from the notes/text that for any bell shaped curve, you will
find that roughly 68% of the observations fall within +/- one standard deviation from the mean; 95% of
the observations fall within +/- two standard deviations; and 99.7%% of the observations fall within +/three standard deviations from the mean.]
2
a. About 68% of cars and trucks travel between _______ and _______ at this location.
b. About 95% of cars and trucks travel between ______ and ________ at this location.
c. About 99.7% of cars and trucks travel between ______ and ________ at this location.
d. A z-score is a measure of how many standard deviations a value is from the mean. Later in the course, we
will see that it is an important measure of the size of a value.
The formula is Z =
Observed Value - Mean
.
Standard deviation
Determine a z-score for a vehicle speed of 72 mph.
e. What vehicle speed has a z-score = −1? Said another way, what vehicle speed is one standard deviation
below the mean? (You will need to do some algebra to solve for Observed Value)
5 Using the Class Survey data file: (this data are from a survey given to students to Stat200 courses last
semester). You are a researcher and want to use this class survey data to research how PSU undergraduate
students compare to these national averages. .
a. The purpose of most statistical studies is to use the sample data to generalize to a larger group. What do you
think are the weaknesses of using this class survey data for generalizing to all PSU undergraduate students?
b. (Importance of checking data). Compute the Descriptive Statistics for SATM and SATV . Note the
minimum and maximum value for each.
i.
From the output, what does the * represent?
ii.
How many students answered the question regarding their SATM and SATV scores?
SATM______
SATV______
c. Now find the Descriptive Statistics for SATM and SATV by Gender (Repeat what you did for part b but
now enter Gender in the By Variable window) and use the output to answer the following:
Female SATM: Q1 ________
Female SATV: Q1 ________
Male SATM: Q1 ________
Male SATV: Q1 ________
Q3 ________
Q3 ________
Q3 ________
Q3 ________
IQR _________
IQR _________
IQR _________
IQR _________
d. Using the 5-number summary, a data point is considered an outlier on a boxplot if it is either larger than
Q3+ (1.5IQR), or smaller than Q1  (1.5IQR). Calculate and identify any outliers for the Female group.
3
SATM: Calculate the value of Q3+ (1.5IQR) =
SATM: Calculate the value of Q1  (1.5IQR) =
SATV: Calculate the value of Q3+ (1.5IQR) =
SATV: Calculate the value of Q1  (1.5IQR) =
e Based on the Descriptive Statistics you calculated by Gender and to answer the following:

How do the SAT scores from our survey compare across gender? Do you believe that any
differences are significant? That is, do you think these differences are large enough that statistically
they are the different?
6 Staying with the Class Survey . The column Book Cost are the responses to how much students
expected to pay for books that semester.
a. Create a Histogram for the variable Book Cost and complete the following sentences.
The most frequently reported amount spent was between ___ and ___. Of the 226 students, ___ students
said they spent that much. [HINT: place your mouse pointer over the tallest bar.]
The second most frequently reported amount spent was between ____ and ____.
b. Create a boxplot for the variable Book Cost and answer the following questions:

What does the * represent in a boxplot?

How many * are there for the variable Book Cost?

What are the outlier values? [place your mouse over the * to see the value]

What is the 5-number summary for Book Cost?

The shape of the data represented by the box plot can be determined by the location of
the median bar in the box and by comparing the length of the “whiskers” – the two lines
that extend from either end of the box. If the median is in the center and the whiskers are
of roughly equal length then the data is symmetrical. If the median is near the bottom of
the box and the top whisker is longer, then the distribution is said to be skewed to the
4
right or positively skewed. If the median is near the top of the box and the bottom
whisker is longer, then skewed to the left or negatively skewed. What is the shape of
Book Cost based on the boxplot? Does this concur with how you would interpret the
histogram?
5