Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review • Range: the difference between the largest and the smallest observations Range is not resistant and it ignores the numerical values of nearly all the data. • deviation from the mean: the difference between the observation and the sample mean d = x − x̄ • variance: the average of the squared deviations P sum of squared deviations (x − x̄)2 2 = s = n−1 sample size − 1 • standard deviation: square root of the variance sP (x − x̄)2 s= n−1 Properties of the Standard Deviation, s: • The greater the spread of the data, the larger is the value of s. • s = 0 only when all observations take the same value. • s can be influenced by outliers. Empirical Rule for bell-shaped distribution: • 68% of the observations fall within 1 standard deviation of the mean, that is, between x̄ − s and x̄ + s (denoted x̄ ∓ s). • 95% of the observations fall within 2 standard deviation of the mean(x̄ ∓ 2s). • All or nearly all observations fall within 3 standard deviations of the mean (x̄ ∓ 3s). sample statistics sample mean x̄ sample s.d. s −→ −→ −→ −→ population parameters population mean µ population s.d. σ • percentile: the pth percentile is a value such that p percent of the observations fall below or at that value. The 50th percentile is usually referred to as the median. • quartiles: first quartile (Q1): the 25th percentile second quartile (Q2): the 50th percentile, which is the median third quartile (Q3): the 75th percentile The quartiles split the distribution into four parts, each containing one quarter (25%) of the observations. Finding Quartiles: • Arrange the data in order • Find the median of the data. This is the second quartile, Q2. • For the lower half of the observations, find the median. This is the first quartile, Q1. • For the upper half of the observations, find the median. This is the third quartile, Q3. • interquartile range (IQR): the distance between the third and first quartiles. IQR = Q3 − Q1 • The 1.5× IQR Criterion for Identifying Potential Outliers An observation is a potential outlier if it falls more than 1.5× IQR below the first quartile or more than 1.5× IQR above the third quartile. • The Five-Number Summary of Positions minimum value, first quartile Q1, median, third quartile Q3, and the maximum value. • box plot: Constructing A Box Plot • A box goes from the lower quartile Q1 to the upper quartile Q3. • A line is drawn inside the box at the median. • A line goes from the lower end of the box to the smallest observation that is not a potential outlier. A separate line goes from the upper end of the box to the largest observation that is not a potential outlier. These lines are called whiskers. The potential outliers are shown seqarately.