Download Chapter 1 Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
ATM
Lesson 1-1
Collecting Data
In statistics, a variable is:
The population is:
A sample is:
A random sample is:
If a sample is not random, it is said to be ______________________.
Examples:
Identify the population, sample, and variable:
1) A doctor takes a biopsy of a suspicious growth to check for malignancy.
2) A company wants to know the educational background of all its employees. It asks all
the department managers what their level of education is.
3) Is the sample taken in #2 a random sample? Or is it biased?
Capture-Recapture
4) From several locations on an island, a naturalist captures 96 squirrels, then tags and
releases them. Ten days later 120 squirrels are caught and 36 of them have tags. Use this
information to estimate the number of squirrels on the island.
ATM 1-3
Other Displays
Time-Series Data – displays changes in a variable over time. Label axes appropriately
and give titles. Axes must always have equal increments.
Other methods of displaying data include:
Scatter Plot
Line Graph
Bar Graphs (Histograms)
Population of NE cities in thousands
For each of these, you can determine the Average Rate of Change (over time).
This is also known as _______________.
Stated as “Increase/Decrease of (quantity) over (period of time).”
Ex: What was the average rate of change in the U. S. minimum wage from 1950 to
1990?
Stemplots (also known as Stem and Leaf Plots) show the minimum, maximum, range
and outliers of data.
Scores on Chapter 8 Test
(out of 130)
What is the minimum value?
The maximum?
The range?
Outliers?
Back to Back Stem Plots
Use the data from our class heights to create a back-to-back stemplot.
Boys
Girls
ATM
1-4 Measures of Center
To describe typical values in a data set, use measures of center, or measures of central
tendency:
Mean:
Notation: X “x bar”
Median:
Mode:
QUARTILES:
1st Quartile (lower quartile):
3rd Quartile (upper quartile):
They will help us measure the spread of the data.
Example 1: Consider the heights of a high school basketball team:
# of players
a) How many players are on the team?
1
3
b) Find the mean height.
3
2
2
c) Find the median height.
1
1
d) Which # describes the “average” height?
e) Do there appear to be any outliers?
f) Find the 1st quartile.
g) Find the 3rd quartile.
SIGMA NOTATION (or Summation Notation)
Let Xi = the test score for the ith person in a class.
(So, X1 = 1st person’s score, X2 = 2nd person’s score, etc.)
X1 + X2 + X3 + X4 + X5 = ___________________________
height (in)
64
69
70
71
72
74
79
= the mean of those 5 test scores
In general:
Using sigma notation:
X =
Example 2 : Let X1 = 2, X2 = 4, X3 = 5, X4 = 7
a) Write an expression in sigma notation for the total of the four numbers above.
4
x
4
b) Find
 xi
i 2
c) Find
i 1
4
i
1-5 Quartiles, Percentiles, Box Plots
Consider the test scores on a math exam. Identify the median and quartiles.
43, 52, 65, 67, 70, 70, 71, 74, 75, 78, 80, 82, 85, 87, 88, 90, 92, 94, 98, 98
1st quartile:
Median:
3rd quartile:
So, quartiles divide the data into 4 sections, each containing 25% of the data.
Interquartile Range (IQR): 3rd quartile – 1st quartile . (A measure of spread, along
with the range of the data.
Give the IQR of the data.
Percentile: The pth percentile is the value in a set such that p percent of the numbers are
less than or equal to that value.
What is the percentile rank of 67?
Determining outliers: Find 1.5 ( IQR). Add this to the 3rd quartile and any value greater
than this is an outlier. Also, subtract it from 1st quartile and any value smaller is an
outlier.
Examples: Using the test scores given, answer the following questions.
1. ¾ of the class scored above a ___________ on the exam.
2. What is the percentile rank of 85?
3. Which score has a percentile rank of 80th?
4. Are there any outliers? (Show your work!)
5. Make a box-and-whisker plot of the scores.
1-6
Histograms
Frequencies
Relative Frequencies
Intervals
Frequency table
Histogram
Frequency distribution
43, 52, 65, 66, 67, 68, 70, 70, 71, 72, 73, 74, 75, 75, 76, 78, 78, 78, 78, 79,
80, 82, 85, 87, 87, 88, 89, 90, 90, 90, 92 , 93, 94, 94, 98
1.
Make a frequency table for the test scores. Hint: 1st determine reasonable
intervals.
Test Scores
Frequency
Relative
Frequency
2.
Make a histogram for the test scores using frequency for the vertical axis.
3.
Make a histogram using relative frequency for the vertical axis.
4.
Make a box-and-whisker plot to display the text scores. Use it to answer the
following questions.
a. Which quartile has the most spread? What is the interval for that quartile?
b. Which quartile has the least spread? What is the interval for that quartile?
c. Which interval in your histogram has the highest frequency? (tallest bar)
d. Which interval in your histogram has the lowest frequency? (shortest bar)
e. Determine a relationship between these intervals in the box-and-whisker plot and
histogram intervals.
5.
Use the histogram at the right showing the percent of students reporting how
much they paid for their last haircut to answer the following questions.
a.
What percent of students said they paid
between $10 and $20 for a haircut?
b.
What percent of students said they
paid $25 for a haircut?
c.
How many students got haircuts?
d.
In what interval is the median price paid?
e.
What is the median price paid?
6.
How can you determine what interval the median of a data set is in by looking
at a histogram for the data?
7.
Consider the following table of relative frequency table for test scores.
a. Why is it not possible to tell how
many students are in the class?
b. If there are 30 students in the class, how many students received a
score in the C range?
c. About what percent of the students got
A’s?
d. About what percent of the students passed?
ATM Lesson 1-7
Name ________________________
Using the Graphing Calculator
Period _________
Box Plots
Yearly Average Daily Temperature for Selected US Cities
(in degrees F)
1.
2.
3.
4.
5.
6.
7.
8.
Mobile, AL
Juneau, AK
Phoenix, AZ
San Francisco, CA
Miami, FL
Chicago, IL
Portland, OR
Dallas-Ft. Worth, TX
67.5
40.0
71.2
50.3
75.6
49.2
53.0
66.0
9. Philadelphia, PA
10. Atlantic City, NJ
11. Nashville, TN
12. Burlington, VT
13. Cincinnati, OH
14. Buffalo, NY
15. Detroit, MI
16. Boston, MA
54.3
53.1
59.2
44.1
53.4
47.6
48.6
51.5
1. Use your calculator to enter the data in a list and find the mean. ________________
2. Give a five-number summary of the temperatures. Identify any outliers using the
IQR X 1.5 method.
3. What is the range of the values? ______________ Indicate the values you entered
for your WINDOW or RANGE.
Xmin = _____________
Ymin = ________________
Xmax = _____________
Ymax = ________________
Xscl = ______________
Yscl = _________________
(Do the values entered for “y” in this situation make a difference? __________)
4. Construct a box plot on your calculator. Sketch the box plot below. Label the
number line; indicate any outliers with an “x”.
(Don’t erase your list! You’ll need it again……)
Histograms
Yearly Average Daily Temperature for Selected US Cities
(in degrees F)
1.
2.
3.
4.
5.
6.
7.
8.
Mobile, AL
Juneau, AK
Phoenix, AZ
San Francisco, CA
Miami, FL
Chicago, IL
Portland, OR
Dallas-Ft. Worth, TX
67.5
40.0
71.2
50.3
75.6
49.2
53.0
66.0
9. Philadelphia, PA
10. Atlantic City, NJ
11. Nashville, TN
12. Burlington, VT
13. Cincinnati, OH
14. Buffalo, NY
15. Detroit, MI
16. Boston, MA
54.3
53.1
59.2
44.1
53.4
47.6
48.6
51.5
5. What is the range of the values? ______________ Indicate the values you entered for your
WINDOW or RANGE.
Xmin = _____________
Ymin = ________________
Xmax = _____________
Ymax = ________________
Xscl = ______________
Yscl = _________________
(Do the values entered for “y” in this situation make a difference? __________)
6. Use your calculator to produce a histogram with intervals of 5. Sketch the histogram.
7. Use your calculator to produce a histogram with intervals of 10. Sketch the histogram.
Variance and Standard Deviation (1-8)
Standard deviation
Variance
Method and Formulas:
1.
2.
3.
4.
n
This is the variance. S2 =
(x
i 1
i
 x)2
n 1
5.
n
 (x
This is the standard deviation. S =
i 1
i
 x )2
n 1
Example
Find the variance and the standard deviation “by hand” for the data given.
{ 6, 9, 10, 13, 17 }
x
xx
x
n=
( x  x)2
Things to keep in mind:

Measures of center:

Measures of spread:

Standard deviation =
(Standard deviation)2 =
For example: if var. = 25,
std. dev. = ?
if std. dev. = 9, var. = ?
if var. = .1,
std. dev. = ?

In general, groups with most data close to the mean have _______________ standard
deviations than do groups with most data far from the mean.

Variance and standard deviation are always positive. Why?

Symbols - The ones we use are:
s2 = variance of a sample
s = std. dev. of a sample (divide by n – 1)
Also on your calculator are:
σ2 = variance of a population
σ = std. dev. of a population (divide by n)
Variance and Standard Deviation (1-8)
Standard deviation
Variance
Method and Formulas:
1.
2.
3.
4.
n
This is the variance. S2 =
(x
i 1
i
 x)2
n 1
5.
n
 (x
This is the standard deviation. S =
i
i 1
 x )2
n 1
Example
Find the variance and the standard deviation “by hand” for the data given.
{ 6, 9, 10, 13, 17 }
( x  x)2
xx
x
x
n=
Things to keep in mind:

Measures of center:

Measures of spread:

Standard deviation =
(Standard deviation)2 =
For example: if var. = 25,
std. dev. = ?
if std. dev. = 9, var. = ?
if var. = .1,
std. dev. = ?

In general, groups with most data close to the mean have _______________ standard
deviations than do groups with most data far from the mean.

Variance and standard deviation are always positive. Why?

Symbols - The ones we use are:
s2 = variance of a sample
s = std. dev. of a sample (divide by n – 1)
Also on your calculator are:
σ2 = variance of a population
σ = std. dev. of a population (divide by n)