Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 1—Top 10 List Vocabulary: case--an individual person or thing for which values of a variable are recorded distribution—shows the pattern of variation of a variable; show value of variable and frequency variable—any measurable or observable characteristic of a group of people or objects quartiles percentiles Types of variables: Categorical –values describe some characteristic of the population Binary categorical—can take on only 2 values Quantitative (measurement)—takes on a numerical value Types of distributions: symmetric uniform skewed right skewed left clustered Measures of central tendency: Mean—numerical average Median—the middle number when the data is arranged in numerical order Mode—the value that occurs most often (bimodal—2 values that occur most often) Measures of variability (spread): Range = maximum – minimum IQR (interquartile range)=Q3 – Q1 Standard deviation-a measure of the spread from the mean s x i x 2 n 1 2 ( Variance = s ) Note: (x i x) 0 Choosing appropriate measures of center and spread: For symmetric data, report the mean and standard deviation For skewed data, use the 5-number summary In a skewed distribution the mean is further into the tail (mean, median, mode—in that order, from the tail) Displaying data: Dot plot Box and whisker plot-- full plot (does not show outliers); modified plot (shows outliers) 5 number summary: min, Q1, median, Q3, max 1.5(IQR) rule for determining outliers Advantage: can show multiple distributions Stem-and-Leaf plot Stem may be multi-digit; leaf is just one digit Include key Back-to-back plot Advantages: Preserves all data Can show 2 distributions Quick to construct for small data sets Disadvantages: cumbersome for large data sets Histogram Advantage: easy to read, works well with large data sets Disadvantage: doesn’t show all data values Time plot Displays change over time Analyzing Graphs: Center, shape, spread (variability), outliers, look for patterns Linear Transformations: a+bx Adding a constant “a” to all values in a data set increases the mean and median by “a” Multiplying all values in a data set by a constant “b”, the mean, median, IQR, and standard deviation are all multiplied by “b” Density curve: The density curve is an idealized description of the distribution of data. For the idealized distribution, the mean is and the standard deviation is . ( x and s are the mean and standard deviation computed from the actual data.) Describes the overall pattern of a distribution (a relative frequency distribution) Properties: 1) lies on or above the x-axis 2) area under the curve equals 1 The area under the curve and above the x-axis for any range of data (x) values is the proportion of all observations that fall in that range. The median is the point that separates the area into equal areas; the mean is the point of balance Normal distributions: N( , ) Represent one class of density curves Described completely by its mean and standard deviation Characteristics: symmetric, mound-shaped (bell-shaped), mean = median, inflection points are one standard deviation on either side of the mean, area under the curve is 1 (true for any density curve) Note: Not all bell-shaped distributions are normal ! ! ! Empirical Rule: In a normal distribution, approximately: 68% of the data lies within 1 standard deviation of the mean: P( - < x < + ) .68 95% of the data lies within 2 standard deviations of the mean: P( -2 < x < +2 ) .95 99.7% of the data lies within 3 standard deviations of the mean: P( -3 < x < +3 ) .997 Assessing normality: Normal probability plot: Will be (approximately) linear in a normal distribution Ratio of IQR to the standard deviation in a normal distribution is approximately 1.3 IQR 1.3 Standard score (z) Applies only to normal distributions Refers to the number of standard deviations an observation is from the mean observed value mean x z standard deviation Standard normal distribution The normal distribution with a mean of 0 and standard deviation of 1 Any normal distribution can be transformed to a standard normal distribution Standard normal distribution table: For any z score, the table shows the proportion of observations that are less than or equal to that score. Percentiles: A data point is at the nth percentile if n% of the data lies below that point Calculator: Stat plots 1-var stats Be able to calculate the percent of data under any part of a normal density curve normalcdf(lower bound, upper bound) For a standard normal curve, can use -5 and 5 for the min and max Be able to calculate a data point that cuts off given the percent under the curve invNorm( % to the left of the data point)