Download Exploring.Data.Intro

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
AP Statistics Summer Institute
Exploring Univariate Data
Name: ______________________________
Participant
Gender
Years of
teaching
experience
Years
teaching AP
Statistics
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
1
Height
(inches)
Shoe size
A distribution of a variable tells us what values the variable takes and how often it takes these values.
How would you describe the distribution of years experience? Describe the center. What other
characteristics are important to note? How do you describe these?
We begin by looking at graphs and add numerical summaries.
Stem and leaf plot
0
0
1
1
2
2
3
3
4
4
What characteristics of the distribution are evident from the stem and leaf plot?
2
Back to back Stem and leaf plot
Males
Females
0
0
1
1
2
2
3
3
4
4
Compare and contrast the characteristics of the distributions of years experience by men and women.
3
Construct a histogram of the distribution of the years experience on the grid below.
0
10
20
30
40
Years of teaching experience
What characteristics of the distribution are evident from the histogram?
Compared to the stem and leaf plot, what detail does the histogram lack?
When would it be beneficial to use a histogram rather than a stem and leaf plot?
Notes regarding shape:
A distribution is said to be skewed to the right if it extends further to the right that it does to the left.
(The tail extends to the right)
A distribution is said to be skewed to the left if it extends further to the left that it does to the right. (The
tail extends to the left)
A distribution is said to be symmetric if the right and left sides of the histogram are approximately
mirror images of each other.
4
Describing distributions with numbers
Measures of Center
Median (M): The median is the value for which half of the observations in the set are greater than and
half of the observations are less than. To find the median:
1. Arrange the observations in increasing order.
2. If the number of observations is odd, the median is the middle value.
3. If the number of observations is even, the median is the average of the middle two.
Mean ( x ): The mean x is the average of the set of observations:
x
x1  x2   xn
n
or in sigma notation
x
1
 xi
n
Find the median and mean years of teaching experience.
Which measure of center is larger? Why?
5
Measures of Spread
Range = maximum – minimum
Interquartile Range (IQR): IQR  Q3  Q1 .
Quartiles:
The first quartile ( Q1 ) is the value for which 25% of the observations are less than. It is the Median of
the first half of the set of observations.
The third quartile ( Q3 ) is the value for which 75% of the observations are less than. It is the Median of
the second half of the set of observations.
Note: IQR is typically used to describe spread when Median is used to describe center.
Five number summary: Min, Q1 , Median, Q3 , Max
Outliers: An observation is called an outlier if it lies more than 1.5  IQR above Q3 or 1.5  IQR
below Q1 .
Variance ( s 2 ): The variance is the roughly the average of the squared differences between each
observation and the mean.
s2 
( x1  x) 2  ( x 2  x) 2    ( x n  x) 2
n 1
Or in sigma notation
s2 
1
( xi  x) 2

n 1
Standard deviation (s): The standard deviation is the square root of variance.
s
1
( xi  x ) 2

n 1
Note: Variance and Standard Deviation are used to measure spread when the mean is used to describe
center.
Note: When the distribution is approximately symmetric, the mean and standard deviation are generally
used to summarize the distribution. If the distribution is skewed, a five number summary is generally
used.
6
Find each of the following for the distribution of years of experience.
Q1 :
Q3 :
IQR:
Five number summary:
Are there any outliers in the distribution of years of experience?
Complete the table to find variance and standard deviation.
Participant
x
( x  x)
( x  x) 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
s2 
1
( xi  x) 2 = ______

n 1
s
1
( xi  x) 2 = ______

n 1
Which would be more appropriate in
describing the distribution of years of
experience: a five number summary or
the mean and median? Why?
 (x
i
 x) 2 =
7
Construct a boxplot for the number of years experience using the grid as a guide.
0
10
20
30
40
Years of teaching experience
Construct parallel boxplots for the number of years experience for men and women using the grid as a
guide.
Men
Women
0
10
20
30
40
Years of teaching experience
Using the boxplots above, compare and contrast the distributions of years experience for men and
women.
8
Linear transformations: When every value of the variable x is transformed into a new value x new given
by the equation xnew  a  bx .
Original Data (x)
3, 4, 6, 8, 12, 15, 20
Median
Mean
Range
IQR
St. Dev.
Variance
IQR
St. Dev.
Variance
Multiply each value in the original data by 3 and complete the table.
Median
Mean
Range
IQR
x new  3x
9, 12, 18, 24, 36, 45, 60
St. Dev.
Variance
Multiply each value in the original data by 2 and add 3 and complete the table.
Median
Mean
Range
IQR
St. Dev.
xnew  3  2 x
9, 11, 15, 19, 27, 33, 43
Variance
Add 4 to each value in the original data and complete the table.
Median
Mean
Range
xnew  4  x
7, 8, 10, 12, 16, 19, 24
How is each summary statistic of x affected by the linear transformation xnew  a  bx ?
Median new =
Mean new =
Range new =
IQR new =
St. Dev. new =
Variance new =
Suppose a teacher gave a test for which x  70 and s  21 . He wants to apply a linear transformation
xnew  a  bx to “scale” the grades so that x new  82 and s new  7 . Find a and b.
9