Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 2.1 Visualizing Distributions: Shape, Center, and Spread Exploration of univariate data should follow this sequence: 1. plot → 2. shape → 3. center → 4. spread 1. 2. 3. 4. Choose an appropriate plot 4 kinds of graphs will be shown: Dot plot, Stem plots, Bar charts, Histogram (or Relative Frequency) plots Describe the shape 4 kinds of shapes will be demonstrated: Uniform, Normal, Skewed, Bi-modal Find a measure of center appropriate to the shape 2 kinds of centers will be calculated (formulas to be taught): mean and median (sometimes mode) Find a measure of spread that agrees with the measure of center. 3 kinds of spread will be calculated (formulas to be taught): deviations, standard deviation, and variance Data Distributions - “shapes” We will be considering 4 types of data distribution shapes: Uniform Normal Skewed Bimodal Uniform Distribution – Ideal Shapes For a uniform distribution of data, all the values occur equally often (or the same amount) AKA a rectangular distribution because of its shape Dot plots – Ideal Graph Uniform Distribution – Ideal Shape Answer: a set of data (group of numbers) where each datum (single number) is graphed as a single dot on an x-y graph. Example: If you roll a 10-sided die (labeled 0 to 9) 50 times, you may get a graph of 50 dots on a uniform distribution similar to below: Histogram – Not Ideal Graph Uniform Distribution – Not Ideal Shape Describing : The graph shows a roughly uniform distribution of births across the months. Dot Plot – Not Ideal Graph Uniform Distribution – Not Ideal Shape Activity 2.1a, page 28 Last Digit of 30 Phone Numbers Does this graph look like a uniform distribution? Is this a uniform distribution graph? Why or why not? [Hint: Would only 2 phone number digits look uniform? How about 1000 phone number digits?] Normal Distribution – Ideal Shape Idealized normal shape is: perfectly symmetric single peak, or mode, at line of symmetry curve drops off smoothly on both sides, never touching x-axis, and stretches infinitely far in both directions. Normal Distribution – Ideal Shape Center & Spread Center : Mean = Mode Spread : Standard deviation (SD) is the horizontal distance from the mean to an inflection point. Normal Distribution – Ideal Shape Center & Spread Use the mean to describe the center and standard deviation to describe spread of a normal distribution. Example: A typical random sample of five workers has an average age of 47 years, give or take about 4 years. Statistically speaking: IF the population is a normal distribution, the Mean is 47 years, the standard deviation is 4 years. (That is a “big if”) Skewed Distributions – Ideal Shape Uniform and normal distributions are symmetric. Many common distributions show bunching at one end and a long tail stretching out in the other direction. These distributions are called skewed. Skewed Distributions – Ideal Shape Direction of tail tells whether distribution is skewed left or skewed right. Skewed Distributions – Ideal Shape Often the bunching in skewed distribution occurs because values “bump up against a wall.” Examples: Either a minimum that values can not go below, such as 0 for distances, or a maximum that values can not go above, such as 100 for percentages. Note: To have a “wall”, values must be bunched up against it Skewed Distributions – Ideal Shape Center & Spread Use median to describe the center with the lower and upper quartiles to describe the spread. Example: The middle 50% of the SAT math scores were between 630 and 720, with half above 680 and half below. Skewed Distributions – Not Ideal Shape Center & Spread The middle 50% of the Polar Bear weights are between about 115 and 250, with half above about 155 lbs and half below. Bimodal Distributions – Ideal Shape Many distributions have only one peak-unimodal. Some have two peaks (bimodal) or even more. Bimodal Distributions – Not Ideal Shape Locate the two peaks. However, it is more useful if you can find another variable that divides the data into two groups centered at the two peaks. Other Possible Data Characteristics Outlier: unusual value that stands apart from the bulk of the data Cluster: a group of data “clustering” close to the same value, away from other groups Gap: on a plot, the space that separates clusters of data