Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Describing Distributions 3 Topics: 1. Shape 2. Center 3. Spread Shape • Always describe the basic shape of the distribution. • Symmetric – left and right side are mirror images, or approximate since we’re dealing with real data • Skewed Left – majority of the data is on the right side but tails out to the left • Unimodal – one major mode where the data is collected around • Skewed Right – majority of the data is on the left side but tails out to the right • Bimodal – two major modes • Multimodal – multiple modes • Uniform – data is flat and featureless • Also mention any unusual features • Outliers – observations away from the main distribution. Can be some of the most informative and interesting • Gaps – spaces between clumps of data Measure of Center • Median or • Mean Median • • • • the middle observation Arrange the data in order from least to greatest If an odd number of obs. find the middle number If an even number of obs. find the average of the middle two numbers Mean • Arithmetic average, add all observations together and divide by the number of observations. • If there are n observations which are labeled as n x1 , x2 ,..., xn The mean is x = ∑x i =1 n i 1 n = ∑ xi n i =1 • Read as “x bar” • Must be written with the bar!!!! • Finding Summary Stats on your calculator!!! Mean vs Median • If the distribution is symmetric the mean the median will be approximately the same Mean vs Median • If the distribution is skewed to the left, the mean will be less than the median • If the distribution is skewed to the right, the mean will be higher than the median Symmetric Unimodal No Outliers x= Med = Skewed Right Unimodal No Outliers x= Med = Skewed Left Unimodal Lower Outlier at 10 x= Med = The mean is affected by outliers and the median is resistant to outliers Ex. AGE ` When the data is symmetric with no major outliers use the mean ` When the data is skewed or has major outliers use the median ` 5 Number Summary 2 2 3 4 5 5 6 6 7 7 7 7 7 8 9 9 9 10 11 12 14 16 19 27 35 Q1 = 5.5 Med = 7 Q3 = 11.5 • Numeric description of the distribution 1. Minimum 2. 1st Quartile (Q1) – Middle number of the bottom half of the data 3. Median 4. 3rd Quartile (Q3) – Middle number of the top half of the data 5. Maximum Measures of Spread • Put data in order Min = 2 5 Number Summary Max = 35 • Range – difference between the highest and lowest values • Range = Max – Min Interquartile Range – difference between the 25% percentile and 75% percentile • Range of the middle 50% of the data • IQR = Q3 – Q1 Measure of Spread Standard Deviation facts • Standard Deviation is a number that describes how much data vary or spread out. • Uses difference of each data value from the mean. n 2 ( xi − x ) ∑ s = i =1 n −1 • The higher the number the more the data is spread out. • Can never be negative • If s = 0 all the values in the data set are equal. • The square of the standard deviation is call the variance. (s2 = var) Two machines are set to cut a piece of metal that should weigh 15 grams. A random sample of 50 pieces from each machine is taken and the weights are measured. Describing Distributions: With Upper Outlier x = 20.43 Both are centered at 15 grams. s1 = 0.80 g s2 = 2.08 g s = 4.18 med = 20.03 IQR = 1.8 Without Upper Outlier x = 19.84 s = 2.38 med = 20.03 IQR = 1.8 1. If you use the mean, use st. dev. 2. If you use the median, use the IQR • Finding Summary Stats on your calculator!!!