Download Chapter 1: Exploring Data Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 1:
Exploring Data Review
Vaishali Saseenthar
Christine Kil
Aditi Bellur
The BIG Idea
•
•
•
•
•
Data Production
Data Analysis
Probability
Statistical Inference
Presenting the statistics:
– Plotting the data: Dotplot, stemplot, histogram.
– Interpreting the data: SOCS (shape, outliers, center,
spread)
– Numerical summary: mean and SD, five-number summary
When do YOU use this??
• We use graphs such as histograms, stemplots, pie
graphs, and bar graphs to organize and present the
data.
• Data analysis is used to draw conclusions from our
data.
• Examining the distributions for shape, center, spread,
and deviations helps understand the data.
• Data analysis is used practically in economics,
politics, journalism, and other fields.
Vocab to know
•
•
•
•
•
•
•
•
•
•
Statistical inference- drawing conclusions about a large group based on a smaller
group.
Surveys- popular ways to gauge public opinion; asks the individuals in the sample
some questions and record their responses.
Observational study- observing individuals and measuring variables but no
influencing the responses.
Experiment- deliberately influencing individuals to observe responses.
Individuals- objects described by the data.
Variable- any characteristic of the individual.
Categorical variable- individual is in one of several groups.
Quantitative variable- numerical values for adding and averaging.
Mean- average value x-bar. Is sensitive to the influence of a few extreme
observation.
Outlier- individual value that falls outside the overall pattern.
Vocab to know dos
•
•
•
•
•
•
•
•
•
pth percentile of a distribution is the value such that p% of the observations fall at
or below it.
First quartile Q₁- median of observations to the left of the overall median in the
ordered list.
Third quartile Q₃- median of observations to the right of the overall median in the
ordered list.
Five-number summary of a set of observations = the smallest observation, the first
quartile, the median, the third quartile, and the largest observation written
smallest to largest.
Boxplot- graph of the five-number summary.
Interquartile range- the distance between the quartiles (the range of the center
half of the data) is a more resistant 50% of range.
Standard deviation measure spread by looking at how far the observations are
from their mean.
Variance of a set of observations- the average of the squares of the deviations of
the observations from their mean.
Data analysis- art of describing data using graphs and numerical summaries.
Key Topics Covered in this Chapter
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Graphs for Categorical Variables
Stem plots
Histograms
Examining Distributions
Dealing with Outliers
Relative Frequency and Cumulative Frequency
Time Plots
Measuring Center: The Mean
Mean versus Median
Measuring spread: The Quartiles
The Five-Number Summary and Boxplots
The 1.5 X IQR Rule for Suspected Outliers
Measuring Spread: The Standard deviation
Properties of Standard Deviation
Choosing Measures of Center and Spread
Changing the Unit of Measurement
Comparing Distributions
Calculator Key Strokes
• Calculator Boxplots and Numerical Summaries
1. Enter first set of data in L1/ list 1 and second set in L2/
list 2
2. Set up two statistics plots: Plot 1 to show modified
boxplot of data in list 1 and Plot 2 to show modified
boxplot of data in list 2
3. Use the calculators zoom feature to display the sideby-side boxplots
4. Calculate numerical summaries for each set of data
5. Notice the down-arrow on the left side of the display.
Press down to see other statistics.
Formulas You Should Know
• Mean:
• Variance:
x-bar = x(1) + x(2)….+ x(n)
n
• Standard Deviation:
• Linear transformation: X(new)= a + bx
Helpful HINTS
• Stemplots do not work well for large data sets where each
stem must hold a large # of leaves.
• Use histograms of %’s for comparing several distributions with
different # of observations.
• When examining a distribution, look for the overall pattern
and striking deviations from the pattern.
• Look for outliers that are clearly apart from the body of the
data, not just in the most extreme observations.
• The simplest useful numerical description of a distribution is
the measure of center and measure of spread.