Statistics and Probability Measuring data, analyzing data, the shape of data, visualizing data, center, variability, & distribution Vocabulary: statistics •Statistics is the science of collecting, organizing, representing, and interpreting data. Vocabulary: graph Series 1 Series 2 Series 3 6 5 4 3 2 1 0 A pictorial device used to show a numerical relationship Vocabulary: spread Vocabulary: spread • A measure of how much a collection of data is spread out. • Commonly used types include range and quartiles. • Also known as measures of variation or dispersion. Vocabulary: center •Knowing where the middle (center) of a set of data would be gives good information. Vocabulary: center • An average; a single value that is used to represent a collection of data. Three commonly used types of averages are mode, median, and mean. • Also called measures of central tendency or measures of average. Vocabulary: outlier or striking deviation A number in a data set that is much larger or smaller than most of the other numbers in the set. Vocabulary: variability Vocabulary: variability • A measure of how much a collection of data is spread out. • Commonly used types include range and quartiles. • Also known as spread or dispersion. Vocabulary: distribution The pattern of the data when a large sample is used will be more likely to look like chart A. This is considered a “normal distribution.” It is sometimes called a Bell Curve. Vocabulary: distribution A peak above the mean such as in chart C is “skewed to the right.” A peak below the mean such as in chart B is “skewed to the left.” Vocabulary: Measure of Center • An average; a single value that is used to represent a collection of data. Three commonly used types of averages are mode, median, and mean. (Also called measures of central tendency or measures of average.) Vocabulary: data • Information, especially numerical information. Usually organized for analysis. Vocabulary: dot or line plot A dot plot is also called a line plot. It is a diagram showing frequency of data on a number line. It is NOT a line graph. Vocabulary: tape diagram • A drawing that looks like a segment of tape, used to illustrate number relationships. Also known as a strip diagram, bar model, fraction strip, or length model Vocabulary: histogram Vocabulary: histogram • A bar graph in which the labels for the bars are numerical intervals. • The data is reported in clusters, or ranges. Vocabulary: box plots A box plot is a diagram that shows the five number summary of a distribution. Five number summary includes lowest value, lower quartile, median, upper quartile, and highest value. Vocabulary: box plot or box and whisker plot Vocabulary: interquartile range A box and whisker plot breaks the data into four parts. Each part is a quartile. The interquartile range identifies the difference between the upper quartile and the lower quartile. These are the boxes on the plot. Vocabulary: lower extreme Lower extreme The smallest or least number out of a data set, usually farther away from the interquartile range than other data in set. Also know as minimum. Vocabulary: Minimum Same as the lower extreme in the previous slide! Vocabulary: Maximum Opposite of the lower extreme; it is the upper extreme on the far right of the box &whisker plot minimum maximum Vocabulary: attribute Large, blue hexagon Small, red triangle A characteristic, such as size, shape or color Vocabulary: Measure of Center • We use several different ways to measure the center. Some are: –Mode (the piece of data most often repeated) –Median (the middle number when data are in numerical order) –Mean (the average of all the numerical data) Vocabulary: mean • • • • • • Step 1: add 3,5,5,4,5,6,2,5=32 Eight data points Step 2: divide 32 ÷8= 4 4 is the mean. Definition: the sum of a set of numbers divided by the number of elements in the set. A type of average. Vocabulary: median 13, 16, 17, 20, 22, 24, 24, 28, 32 When the numbers are arranged from least to greatest, it is the middle number. Vocabulary: median 13, 16, 17, 20, 22, 24, 28, 32 If there are an even number of data points, it is the average of the two middle numbers. (20+22=42; 42÷2=21 ) Vocabulary: mode • Mode (the piece of data most often repeated) • 5, 7, 8, 9, 9, 11 • The mode in the data set above is 9. • It is possible to have more than one mode. Vocabulary: Measure of variation • Range, spread, and mean absolute deviation are measures that indicate how much the data in one data set differ among themselves. • Each is a measure of variation. Vocabulary: range • The difference between the greatest number and the least number in a set of numbers. • Data set: 3, 2, 5, 4, 1, 6, 4, 4, 2, 5, 7, 3 • Largest number is 7 and smallest is 1 • So, the range is 6 because 7-1=6 Vocabulary: mean absolute deviation • • • • • Large cube= 45 kg Cylinder = 30 kg Small cube = 24 kg Mean = 33 kg 45-33=12, 30-33=-3, 24-33=-9, 12+3+9=24/3=8 • 8 is the mean absolute variation Vocabulary: mean absolute deviation • In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Vocabulary: statistical variability • A variability or spread in a variable or a probability distribution. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. Vocabulary: data distribution People in My Neighborhood Age Tally 0-19 11111 11111 1111 14 20-39 11111 111 8 40-59 1111 4 60-89 11111 11 7 1 1 90+ Frequency A table that shows how many there are of each type of data. Vocabulary: data • Information, especially numerical information, usually organized for analysis. Vocabulary: cluster • A group of the same or similar elements gathered or occurring closely together on a graph. Vocabulary: cluster Vocabulary: gap Vocabulary: gap Ages of orchestra members 10-15 xxxx 16-20 xxxxx xxxxx xxx 21-25 26-30 xxxxx xxxx 31-35 xxx Gap in the data • A place on a graph where no data values are present.