Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP Statistics Summer Institute Exploring Univariate Data Name: ______________________________ Participant Gender Years of teaching experience Years teaching AP Statistics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 1 Height (inches) Shoe size A distribution of a variable tells us what values the variable takes and how often it takes these values. How would you describe the distribution of years experience? Describe the center. What other characteristics are important to note? How do you describe these? We begin by looking at graphs and add numerical summaries. Stem and leaf plot 0 0 1 1 2 2 3 3 4 4 What characteristics of the distribution are evident from the stem and leaf plot? 2 Back to back Stem and leaf plot Males Females 0 0 1 1 2 2 3 3 4 4 Compare and contrast the characteristics of the distributions of years experience by men and women. 3 Construct a histogram of the distribution of the years experience on the grid below. 0 10 20 30 40 Years of teaching experience What characteristics of the distribution are evident from the histogram? Compared to the stem and leaf plot, what detail does the histogram lack? When would it be beneficial to use a histogram rather than a stem and leaf plot? Notes regarding shape: A distribution is said to be skewed to the right if it extends further to the right that it does to the left. (The tail extends to the right) A distribution is said to be skewed to the left if it extends further to the left that it does to the right. (The tail extends to the left) A distribution is said to be symmetric if the right and left sides of the histogram are approximately mirror images of each other. 4 Describing distributions with numbers Measures of Center Median (M): The median is the value for which half of the observations in the set are greater than and half of the observations are less than. To find the median: 1. Arrange the observations in increasing order. 2. If the number of observations is odd, the median is the middle value. 3. If the number of observations is even, the median is the average of the middle two. Mean ( x ): The mean x is the average of the set of observations: x x1 x2 xn n or in sigma notation x 1 xi n Find the median and mean years of teaching experience. Which measure of center is larger? Why? 5 Measures of Spread Range = maximum – minimum Interquartile Range (IQR): IQR Q3 Q1 . Quartiles: The first quartile ( Q1 ) is the value for which 25% of the observations are less than. It is the Median of the first half of the set of observations. The third quartile ( Q3 ) is the value for which 75% of the observations are less than. It is the Median of the second half of the set of observations. Note: IQR is typically used to describe spread when Median is used to describe center. Five number summary: Min, Q1 , Median, Q3 , Max Outliers: An observation is called an outlier if it lies more than 1.5 IQR above Q3 or 1.5 IQR below Q1 . Variance ( s 2 ): The variance is the roughly the average of the squared differences between each observation and the mean. s2 ( x1 x) 2 ( x 2 x) 2 ( x n x) 2 n 1 Or in sigma notation s2 1 ( xi x) 2 n 1 Standard deviation (s): The standard deviation is the square root of variance. s 1 ( xi x ) 2 n 1 Note: Variance and Standard Deviation are used to measure spread when the mean is used to describe center. Note: When the distribution is approximately symmetric, the mean and standard deviation are generally used to summarize the distribution. If the distribution is skewed, a five number summary is generally used. 6 Find each of the following for the distribution of years of experience. Q1 : Q3 : IQR: Five number summary: Are there any outliers in the distribution of years of experience? Complete the table to find variance and standard deviation. Participant x ( x x) ( x x) 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 s2 1 ( xi x) 2 = ______ n 1 s 1 ( xi x) 2 = ______ n 1 Which would be more appropriate in describing the distribution of years of experience: a five number summary or the mean and median? Why? (x i x) 2 = 7 Construct a boxplot for the number of years experience using the grid as a guide. 0 10 20 30 40 Years of teaching experience Construct parallel boxplots for the number of years experience for men and women using the grid as a guide. Men Women 0 10 20 30 40 Years of teaching experience Using the boxplots above, compare and contrast the distributions of years experience for men and women. 8 Linear transformations: When every value of the variable x is transformed into a new value x new given by the equation xnew a bx . Original Data (x) 3, 4, 6, 8, 12, 15, 20 Median Mean Range IQR St. Dev. Variance IQR St. Dev. Variance Multiply each value in the original data by 3 and complete the table. Median Mean Range IQR x new 3x 9, 12, 18, 24, 36, 45, 60 St. Dev. Variance Multiply each value in the original data by 2 and add 3 and complete the table. Median Mean Range IQR St. Dev. xnew 3 2 x 9, 11, 15, 19, 27, 33, 43 Variance Add 4 to each value in the original data and complete the table. Median Mean Range xnew 4 x 7, 8, 10, 12, 16, 19, 24 How is each summary statistic of x affected by the linear transformation xnew a bx ? Median new = Mean new = Range new = IQR new = St. Dev. new = Variance new = Suppose a teacher gave a test for which x 70 and s 21 . He wants to apply a linear transformation xnew a bx to “scale” the grades so that x new 82 and s new 7 . Find a and b. 9