Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stats Review Chapter 1 1.1 Displaying Distributions Definitions: Individuals – objects described by a set of data Variable – any characteristic of an individual Categorical – places an individual into a group Quantitative – numerical data about the individual. Examining a Distribution Look for the overall pattern and for deviations from that pattern Describe using shape (symmetric, skewed), center (median, mode, mean) and spread (variation, standard deviation, IQR) Look for outliers and skewness Stemplots: 1. Separate the data into different classes 2. Write the stems in a vertical column 3. Write each leaf as a single digit to the side of each stem Histograms: 1. Separate the data into different classes of equal width 2. Write the classes along the horizontal axis 3. Write the relative frequency (count or percentage) along the vertical axis 4. Create a bar for each class, with no space between Time plots: 1. Write the time or order along the horizontal axis 2. Write the count along the vertical axis 3. Plot each observations value in the order they occurred 1.2 Number Summaries Measures of Center: Mean – Average. Susceptible to influence by outliers and skewness Median – The middle value (the average of the middle two if n is even). Not greatly affected by outliers. Quartiles: 1. Arrange the data in increasing order and locate the median M. 2. The first quartile Q1 is the median of the observations that lie below the median M. 3. The third quartile Q3 is the median of the observations that lie above the median M. Interquartile Range (IQR) The interquartile range is the distance between the first and third quartiles. IQR=Q3 – Q1 1.5 x IQR Criterion for Outliers Call an observation a suspected outlier if it falls more than 1.5 x IQR above Q3 or below Q1. Observations below Q1 – (1.5 x IQR) Observations above Q3 + (1.5 x IQR) are considered possible outliers 5 Number Summaries Minimum, Q1, Median, Q3, Maximum Boxplots SUSPECTEDOUTLIERS (1.5 X IRQ RULE) MAXIMUM NON OUTLIER THIRD QUARTILE MEDIAN FIRST QUARTILE MINIMUM NON OUTLIER Variance The average of the squares of the differences between the observation and the mean. FORMULA: Standard Deviation s The square root of the variance. FORMULA: Properties of the Standard Deviation s is a measure of spread about the mean Only use when mean is measure of center s=0 implies that there is no spread and all observations are the same value s is not resistant and will become very large when there are a few outliers Linear Transformations Multiplying each observation by a positive number b, multiplies the mean, median, IRQ and standard deviation by b. Adding the same number a to each observation, adds a to mean and median but does not change IRQ or standard deviation. 1.3 Normal Distributions Strategies For Exploring Quantitative Data 1. Always plot your data (usually a stemplot or histogram). 2. Look for overall pattern and for striking deviations such as outliers. 3. Calculate a numerical summary to briefly describe center and spread (5 number, mean & standard deviation). Density Curves Always on or above the horizontal axis Area under the curve always equals one Skew Skew refers to the tail not the bump The mean (balance point) is always closer to the tail than the median (cuts area in half). Standard Deviation of Normal Density curves Points of inflection on the normal density curve lie 1 σ away from the mean on each side 68-95-99.7 Rule 68% of observations fall with in σ of the mean μ. 95% of observations fall with in 2σ of the mean μ. 99.7% of observations fall with in 3σ of the mean μ. Z-score Normal distributions can be standardized by the following formula: Normal Quartile Plots Used to assess the normality of a distribution Arrange data from smallest to largest. Record what percentile of the data each value occupies. Example, the smallest observation of a set of 20 is at the 5% point. Find the z-score from Table A that corresponds to each percentile. Example, z=-1.645 for the 5% point Plot each data point x against its corresponding z-score. If the plotted points lie close to a straight line then the distribution is approximately normal. If the line bends up at the right, then skewed right. If bends down on the left, then the distribution is skewed left. Outliers appear as points for away from the overall pattern of points Review Exercises: 1.106, 1.114, 1.116, 1-119, 1.123