Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 1 Mr. Lang’s AP Statistics Power point Homework Assignment For the A: 1, 3, 5, 7, 8,11- 25 Odd, 27 – 32, 37 – 59 Odd, 60, 69 – 74, 79 – 105 Odd (except 85, 99, 101) 107 – 110, R1-R10 For the C: 1, 3, 5, 8, 11- 25 Odd, 37 – 59 Odd, 79 – 103 Odd (except 85, 99, 101) R1R10 For the D- : 1, 3, 5, 11, 15, 19, 23, 37, 41, 45, 49, 79, 83, 87, 91, 97, 103, R1- R10 Statistics the science of collecting, analyzing, and drawing conclusions from data Descriptive statistics the methods of organizing & summarizing data Inferential statistics involves making generalizations from a sample to a population Population The entire collection of individuals or objects about which information is desired Sample A subset of the population, selected for study in some prescribed manner Variable any characteristic whose value may change from one individual to another Data observations on single variable or simultaneously on two or more variables Types of variables Categorical variables or qualitative identifies basic differentiating characteristics of the population Numerical variables or quantitative observations or measurements take on numerical values makes sense to average these values two types - discrete & continuous Discrete (numerical) listable set of values usually counts of items Continuous (numerical) data can take on any values in the domain of the variable usually measurements of something Classification by the number of variables Univariate - data that describes a single characteristic of the population Bivariate - data that describes two characteristics of the population Multivariate - data that describes more than two characteristics (beyond the scope of this course Identify the following variables: 1. the income of adults in your city Numerical 2. the color of M&M candies selected at random from a bag Categorical 3. the number of speeding tickets each student in AP Statistics has received Numerical 4. the area code of an individual Categorical 5. the birth weights of female babies born at a large hospital over the course of a year Numerical Self Check #1 Assignment #1 Graphs for categorical data Bar Graph Used for categorical data Bars do not touch Categorical variable is typically on the horizontal axis To describe – comment on which occurred the most often or least often May make a double bar graph or segmented bar graph for bivariate categorical data sets Using class survey data: graph birth month graph gender & handedness Pie (Circle) graph Used for categorical data To make: – Proportion 360° – Using a protractor, mark off each part To describe – comment on which occurred the most often or least often Graphs for numerical data Dotplot Used with numerical data (either discrete or continuous) Made by putting dots (or X’s) on a number line Can make comparative dotplots by using the same axis for multiple groups Stemplots (stem & leaf plots) Used with univariate, numerical data Must have key sobethat we graph knowfor how Would a stemplot a good the to read number of pieces of gun chewed per day by numbers AP Stat students? Why or why not? Can split stems when you have long list of Would a stemplot be a good graph for the leaves number of pairs of shoes owned by AP Stat Can havestudents? a comparative stemplot with two Why or why not? groups Example: The following data are price per ounce for various brands of dandruff shampoo at a local grocery store. 0.32 0.21 0.29 0.54 0.17 0.28 Can you make a stemplot with this data? 0.36 0.23 Example: Tobacco use in G-rated Movies Total tobacco exposure time (in seconds) for Disney movies: 223 176 548 37 158 51 299 37 11 165 74 9 2 6 23 206 9 Total tobacco exposure time (in seconds) for other studios’ movies: 205 162 6 1 117 5 91 155 24 55 17 Make a comparative stemplot. Graphing Activity Self Check #2 Assignment #2 Histograms Used with numerical data Would a histogram be a good graph for the Bars touch on histograms Twofastest typesspeed driven by AP Stat students? Why or why not? – Discrete • Bars are centered over discrete values – Continuous • Bars cover a class (interval) of values Would a histogram be a good graph for the For comparative histograms – use two separate number of pieces of gun chewed per day by graphs same scale axis APwith Statthe students? Whyon or the whyhorizontal not? Cumulative Relative Frequency Plot (Ogive) . . . is used to answer questions about percentiles. Percentiles are the percent of individuals that are at or below a certain value. Quartiles are located every 25% of the data. The first quartile (Q1) is the 25th percentile, while the third quartile (Q3) is the 75th percentile. What is the special name for Q2? Interquartile Range (IQR) is the range of the middle half (50%) of the data. IQR = Q3 – Q1 Ogive Activity Self Check #3 Multiple Choice Test #1 Types (shapes) of Distributions Symmetrical refers to data in which both sides are (more or less) the same when the graph is folded vertically down the middle bell-shaped is a special type –has a center mound with two sloping tails Uniform refers to data in which every class has equal or approximately equal frequency Skewed (left or right) refers to data in which one side (tail) is longer than the other side the direction of skewness is on the side of the longer tail Bimodal (multi-modal) refers to data in which two (or more) classes have the largest frequency & are separated by at least one other class Distribution Activity . . . Self Check #4 How to describe a numerical, univariate graph What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ? 1. Center discuss where the middle of the data falls three types of central tendency –mean, median, & mode What strikes you as the most distinctive difference among the distributions of scores in classes D, E, & F? Class 2. Spread discuss how spread out the data is refers to the variability of the data –Range, standard deviation, IQR What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ? 3. Shape refers to the overall shape of the distribution symmetrical, uniform, skewed, or bimodal What strikes you as the most distinctive difference among the distributions of exam scores in class K ? K 4. Unusual occurrences outliers - value that lies away from the rest of the data gaps clusters anything else unusual 5. In context You must write your answer in reference to the specifics in the problem, using correct statistical vocabulary and using complete sentences! Features of the Distribution Activity Means & Medians Parameter Fixed value about a population Typical unknown Statistic Value calculated from a sample Measures of Central Tendency Median - the middle of the data; 50th percentile – Observations must be in numerical order – Is the middle single value if n is odd – The average of the middle two values if n is even NOTE: n denotes the sample size Measures of Central Tendency parameter Mean - the arithmetic average – Use m to represent a population mean statistic – Use x to represent a sample mean Formula: x x n S is the capital Greek letter sigma – it means to sum the values that follow Measures of Central Tendency Mode – the observation that occurs the most often – Can be more than one mode – If all values occur only once – there is no mode – Not used as often as mean & median Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median. The numbers are in order & n is odd – so find the middle observation. 2 The median is 4 lollipops! 3 4 8 12 Suppose we have sample of 6 customers that buy the following number of lollipops. The median is … The median is 5 The numbers are in order & n lollipops! is even – so find the middle two observations. Now, average these two values. 5 2 3 4 6 8 12 Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean. To find the mean number of lollipops add the observations and divide by n. x 5.833 2 3 4 6 8 12 6 2 3 4 6 8 12 Using the calculator . . . What would happen to the median & mean if the 12 lollipops were 20? The median is . . . The mean is . . . 5 7.17 2 3 4 6 8 20 6 What happened? 2 3 4 6 8 20 What would happen to the median & mean if the 20 lollipops were 50? The median is . . . The mean is . . . 5 12.17 2 3 4 6 8 50 6 What happened? 2 3 4 6 8 50 What would happen to the median & mean if the 20 lollipops were 50? The median is . . . The mean is . . . 5 12.17 2 3 4 6 8 50 6 What happened? 2 3 4 6 8 50 Resistant Statistics that are not affected by outliers Is the median resistant? ►Is YES the mean resistant? NO Look at the following data set. Find the mean. 22 23 24 25 25 26 29 30 x 25 .5 Now find how each observation Will this sum deviates always equal zero? from the mean. This is the What is deviation the sum from the mean? mean. YES of the deviations from the x x 0 Look at the following data set. Find the mean & median. Mean = 27 Median = 27 Create a histogram with the data. (use x-scale 2) Then find the Look at theof placement of the mean median. mean andand median in this symmetrical 21 23distribution. 23 24 26 30 26 30 27 30 27 31 25 27 32 25 27 32 26 28 Look at the following data set. Find the mean & median. Mean = 28.176 Median = 25 Create a histogram with the data. (use x-scale 8) Then find the Look at theofplacement of the mean median. mean andand median in this right skewed distribution. 22 29 28 22 24 25 28 21 23 24 23 26 36 38 62 23 25 Look at the following data set. Find the mean & median. Mean = 54.588 Median = 58 Create a histogram with the data. Then find the placement mean and median. Look at the of the mean and median in this skewed left distribution. 21 46 54 47 53 60 55 55 56 58 58 58 58 62 63 64 60 Recap: In a symmetrical distribution, the mean and median are equal. In a skewed distribution, the mean is pulled in the direction of the skewness. In a symmetrical distribution, you should report the mean! In a skewed distribution, the median should be reported as the measure of center! Trimmed mean: To calculate a trimmed mean: Multiply the % to trim by n Truncate that many observations from BOTH ends of the distribution (when listed in order) Calculate the mean with the shortened data set Find a 10% trimmed mean with the following data. 12 14 19 20 22 24 25 26 26 35 10%(10) = 1 So remove one observation from each side! 14 19 20 22 24 25 26 26 22 8 Matching Graphs Activity Mean and Median Assignment Why use boxplots? ease of construction convenient handling of outliers construction is not subjective (like histograms) Used with medium or large size data sets (n > 10) useful for comparative displays Disadvantage of boxplots does not retain the individual observations should not be used with small data sets (n < 10) How to construct find five-number summary Min Q1 Med Q3 Max draw box from Q1 to Q3 draw median as center line in the box extend whiskers to min & max Modified boxplots display outliers fences mark off mild & ALWAYS use modified extreme outliers boxplots in this class!!! whiskers extend to largest (smallest) data value inside the fence Inner fence Interquartile Range Q1 –– 1.5IQR Q3 + 1.5IQR (IQR) is the range (length) of theobservation box Any outside this Q3 -fence Q1 is an outlier! Put a dot for the outliers. Q1 Q3 Modified Boxplot . . . Draw the “whisker” from the quartiles to the observation that is within the fence! Q1 Q3 Outer fence Q1 – 3IQR Q3 + 3IQR observation between AnyAny observation outside this theisfences is considered fence an extreme outlier! a mild outlier. Q1 Q3 For the AP Exam . . . . . . you just need to find outliers, you DO NOT need to identify them as mild or extreme. Therefore, you just need to use the 1.5IQRs A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & midwestern states in 1999. 5.9 4.8 8.0 1.3 6.9 4.4 5.0 4.5 7.2 5.9 3.5 3.2 4.5 7.2 5.6 6.4 Create a modified boxplot. Describe the distribution. Use the calculator to create a modified boxplot. 4.1 5.5 6.3 5.3 Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer. (see data on note page) Create parallel boxplots. Compare the distributions. Cancer No Cancer 100 200 Radon The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85. Matching Box Plots, Histograms, and Summary Statistics Activity Self Check #5 Comparative Boxplots Assignment Why is the study of variability important? Allows us to distinguish between usual & unusual values In some situations, want more/less variability – scores on standardized tests – time bombs – medicine Measures of Variability range (max-min) interquartile range (Q3-Q1) deviations x x Lower case variance 2 Greek letter sigma standard deviation Suppose that we have these data values: 24 16 34 28 26 21 30 35 37 29 Find the mean. Find the deviations. x x What is the sum of the deviations from the mean? 24 16 34 28 26 21 30 35 37 29 x x 2 Square the deviations: Find the average of the squared deviations: 2 x m 2 n The average of the deviations squared is called the variance. Population parameter 2 Sample s 2 statistic Calculation of variance of a sample xn x s n 1 2 2 df Degrees of Freedom (df) n deviations contain (n - 1) independent pieces of information about variability A standard deviation is a measure of the average deviation from the mean. Use calculator Which measure(s) of variability is/are resistant? Mean and Variance Activity Mean and Variance Worksheet Self Check #6 Show me the Money Assignment Multiple Choice Test #2 Assignment #3 Linear transformation rule When adding a constant to a random variable, the mean changes but not the standard deviation. When multiplying a constant to a random variable, the mean and the standard deviation changes. An appliance repair shop charges a $30 service call to go to a home for a repair. It also charges $25 per hour for labor. From past history, the average length of repairs is 1 hour 15 minutes (1.25 hours) with standard deviation of 20 minutes (1/3 hour). Including the charge for the service call, what is the mean and standard deviation for the charges for labor? m 30 25(1.25) $61.25 1 25 $8.33 3 Rules for Combining two variables To find the mean for the sum (or difference), add (or subtract) the two means To find the standard deviation of the sum (or differences), ALWAYS add the variances, then take the square root. Formulas: m a b m a mb ma b ma mb 2 a a b 2 b If variables are independent Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted, etc.). Based on past experience, the times for each setup phase are independent with the following means & standard deviations (in minutes). What are the mean and standard deviation for the total bicycle setup times? Phase Mean SD Unpacking Assembly Tuning 3.5 21.8 12.3 0.7 2.4 2.7 mT 3.5 21.8 12.3 37.6 minutes T 0.7 2 2.42 2.7 2 3.680 minutes Self Check #7