Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Rules of Data Dispersion • By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval about the mean. Rules of Data Dispersion • Empirical Rule • Chebyshev’s Theorem (IMPORTANT TERM: AT LEAST) Empirical Rule Applicable for a symmetric bell shaped distribution / normal distribution. There are 3 rules: i. 68% of the observations lie in the interval (mean ±SD) ii. 95% of the observations lie in the interval (mean ±2SD) iii. 99.7% of the observations lie in the interval (mean ±3SD) Empirical Rule Empirical Rule • Example: 95% of students at school are between 1.1m and 1.7m tall. Assuming this data is normally distributed can you calculate the mean and standard deviation? Empirical Rule Empirical Rule • The age distribution of a sample of 5000 persons is bell shaped with a mean of 40 yrs and a standard deviation of 12 yrs. Determine the approximate percentage of people who are 16 to 64 yrs old. Chebyshev’s Theorem Chebyshev’s Theorem • Applicable for any distribution /not normal distribution 1 (1 ) of the observations will be in • At least k the range of k standard deviation from mean where k is the positive number exceed 1 or (k>1). 2 Chebyshev’s Theorem • Example Assuming that the weight of students in this class are not normally distributed, find the percentage of student that falls under 2SD. Chebyshev’s Theorem • Consider a distribution of test scores that are badly skewed to the right, with a sample mean of 80 and a sample standard deviation of 5. If k=2, what is the percentage of the data fall in the interval from mean? Measures of Position To describe the relative position of a certain data value within the entire set of data. •z scores •Percentiles •Quartiles •Outliers Quartiles • Divide data sets into fourths or four equal parts. Boxplot IQR Q3 Q1 Lower Fence Q1 1.5( IQR) Upper Fence Q3 1.5( IQR) Boxplot Outliers • Extreme observations • Can occur because of the error in measurement of a variable, during data entry or errors in sampling. Outliers Checking for outliers by using Quartiles Step 1: Determine the first and third quartiles of data. Step 2: Compute the interquartile range (IQR). IQR Q3 Q1 Step 3: Determine the fences. Fences serve as cutoff points for determining outliers. Lower Fence Q1 1.5( IQR) Upper Fence Q3 1.5( IQR) Step 4: If data value is less than the lower fence or greater than the upper fence, considered outlier.