Download Chap 1 Reading Guide key

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia, lookup

History of statistics wikipedia, lookup

Categorical variable wikipedia, lookup

Time series wikipedia, lookup

Name ________________________________ Date ________ Block ______ READING GUIDE
Chapter 1: Exploring Data - Key
Key Vocabulary:
categorical variable
quantitative variable
two way table
marginal distributions
conditional distribution
skewed left
skewed right
dot plot
split stems
back-to-back stemplot
time plot
Q1, Q3
five-number summary
modified boxplot
standard deviation
Analyzing Categorical Data (pp.2-6)
1. How is statistics defined? The science of data – organizing, displaying, summarizing,
and asking questions about data (i.e. data analysis)
2. Define data analysis? Organizing, displaying, summarizing, and asking questions
about data (i.e. data analysis)
3. Define individual. Objects described by a set of data
4. Define variable. Any characteristic of an individual
5. What is a categorical variable? A qualitative variable that simply records a category
destination; in other words it is used to place an individual into one or several groups
6. What is a quantitative variable? A measurement variable that typically measures a
numerical characteristic; in other words, it categorizes an individual using numerical
values for which it is often sensible to find an average
7. Define distribution. A distribution tells us what values a variable takes and how often
the variable takes on those values.
8. How should data be explored? Begin by examining each variable by itself. Then
move on to study relationships among them. Also, use a graphical display with
numerical summaries.
9. Drawing conclusions that go beyond the given data is referred to as _inference.
Chapter 1: Exploring Data
10. What are the two primary ways to produce data? Sampling and experiments
1.1 Displaying Distributions with Graphs (pp.8-21)
1. What is the difference between a frequency table and a relative frequency table?
Frequency table only shows the count whereas a relative frequency table shows the
2. What type of data are pie charts and bar graphs used for?? Categorical data. They show
the distribution more vividly.
3. Pie Charts can only be used when? Since a pie chart must use all the categories that make
a whole, it can only be used when you want to emphasize each category’s distribution as
it relates to the whole.
4. How is a two-way table setup? It is set up to describe two categorical variables.
5. Which is more informative when comparing group counts or percents? Percents
6. Explain the four step process to organizing a statistical problem.
State – What’s the question that you’re trying to answer?
Plan – How will you go about answering the question?
Do – Make graphs and carry out needed calculations.
Conclude – Give your practical conclusion in context of the problem.
7. What do you need to be cautious of when variables seem to have a strong association?
Hidden variables – be sure to examine data carefully.
Describing Distributions with Numbers (pp.27-42)
8. How do you make a dot plot? Draw a number line (i.e. a horizontal axis) labeled with the
name of the variable. Scale the axis using the appropriate range. Place a dot over the
location that corresponds with the frequency of each value.
9. When examining a distribution, you can describe the overall pattern by its
10. If a distribution is symmetric, what does its dot plot look like? The left and right sides of
the graph are approximately mirror images of each other.
11. If a distribution is skewed right, what does its dot plot look like? The right side of the
graph is much longer than the left side; i.e. the long tail is to the right or FEWER
observations are to the right.
12. If a distribution is skewed left, what does its dot plot look like? The left side of the graph
is much longer than the right side; i.e. the long tail is to the left or FEWER observations
are on the left.
Chapter 1: Exploring Data
13. What is the difference between unimodal, bimodal, and multimodal data? Unimodal data
has a distribution that is single-peaked (one mode). Bimodal data has two peaks (2
modes) and multimodal data refer to distributions with more than two clear peaks.
14. How do you make a stemplot? Separate all data observations into a stem and leaf (the
final digit of the value). Write the stems in a vertical column ascending. Do not skip
stems. Draw a vertical line to the right of the column. Write each leaf in the row to the
right of its tem in ascending order. Provide a key that explains in context what the stems
and leaves represent.
15. When is it advantageous to split stems on a stemplot? (See pp.33-34) It is difficult to
determine the shape of a distribution when you have too few stems or when each stem has
too many leaves. In this case, splitting the stems gives a better visual of the shape. (Note:
If you split stems, be sure that each stem is assigned an equal number of possible leaf
digits. For example, two stems with 5 possible leaves.)
16. When is a back to back stemplot useful? It is useful when comparing two sets of data
about an individual on one graph.
17. What is the purpose of the stemplot? A stemplot gives a quick picture of the shape of a
distribution while including the actual numerical values in the graph. It does not work
well for large data sets.
18. How is the stemplot of a distribution related to its histogram? A histogram is a shaded in
stemplot – on the histogram the individual data values of the stemplot are not recorded;
however the overall shape of the distribution remains.
19. What is a histogram? The most common graph that shows the distribution of one
quantitative variable.
20. When is it better to use a histogram rather than a stemplot or dotplot? When you have
many data values.
21. What is meant by frequency in a histogram? The frequency = the number of counts in
each class.
22. What is the difference between a bar-graph and a histogram? A histogram displays
quantitative data and a bar-graph categorical. A histogram doesn’t have space between
bars due to the representation of continuous data.
23. Define outlier. An outlier is an individual observation that falls outside the overall pattern
of the graph.
1.3 Describing Quantitative Data with Numbers (pp.50-69)
1. In statistics, what are the most common measures of center? The arithmetic average, or
Chapter 1: Exploring Data
2. Explain how to calculate the mean, x . To find the mean of a set of observations, add their
values and divide by the number of observations.
3. Explain how to calculate the median, M. The median, M, is the midpoint of a distribution,
the number such that half the observations are smaller and the other half are larger. To
find the median: 1) arrange all the observations in order of size, from smallest to
2) if the number of observations is odd, the median M is the center observation of
the order list
3) if the number of observations is even, the median M is the mean of the 2
center observations in the ordered list.
4. Explain why the median is resistant to extreme observations, but the mean is
nonresistant. The median is resistant because it is only based on the middle one or two
observations of the ordered list. The mean is sensitive to the influence of a few extreme
observations. Even if there are no outliers a skewed distribution will pull the mean
toward the long tail.
5. In a symmetric distribution where are the mean and median in relation to each other?
What about in a distribution that is skewed? See graphs below.
6. What is the difference between “average” value and “typical” value?
7. Explain how to calculate Q1 and Q3 and IQR. To calculate the quartiles: 1) arrange the
observations in increasing order and locate the median in the list 2) Q1 is the median of
the observations whose position in the ordered list is to the left of the location of the
overall median
3) Q3 is the median of the observations whose position in the ordered list
is to the right of the location of the overall median. The IQR is the distance between the
first and third quartiles, Q3 - Q1. Also known as the range of the middle half of the data.
8. When does an observation become an outlier? An observations is an outlier it if is more
than 1.5*IQR above the third quartile of below the first quartile.
9. What is the five-number summary? The 5 # summary is: Minimum, Q1, Median, Q3, and,
10. How much of the data falls between each quartile? 25% of the data falls between each
11. How much of the data falls between Q1 and Q3? 50% of the data falls between Q1 and
Describe a boxplot. A modified boxplot is a graph of the 5-number summary, with outliers
plotted individually. Description: - a central box spans the quartiles
- a line in the
Chapter 1: Exploring Data
box marks the median
- observations more than 1.5*IQR outside the central box are
plotted individually
- lines extend from the box out to the smallest and largest
observations that are not outliers.
12. What does standard deviation measure? The standard deviation is a measure of spread.
It measures spread around the mean and should only be used when the mean is chosen as
the measure of center. 13. What is the relationship between variance and standard deviation? The standard
deviation, s, is the square root of the variance s2.
14. When does standard deviation equal zero? The standard deviation = 0 only when there is
no spread. This happens only when all observations have the same value. Otherwise s >
0. As the observations become more spread out about their mean, s gets larger.
15. What are the units for the standard deviation of a distribution? The standard deviation is
expressed in the same units as the data.
16. Is standard deviation resistant or nonresistant to extreme observations? Explain. The
standard deviation, s, like the mean, is not resistant. Strong skewness or a few outliers
can make s very large.
17. Use a five number summary when…you want to provide a quick overall description of
distribution. Remember, numerical summaries do not fully describe the shape of a
distribution. Always plot your data.
18. Use x and s when…the distribution is roughly symmetrical and not affected by outliers.
Chapter 1: Exploring Data