Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Statistics and Probability Measuring data, analyzing data, the shape of data, visualizing data, center, variability, & distribution Vocabulary: statistics •Statistics is the science of collecting, organizing, representing, and interpreting data. Vocabulary: graph Series 1 Series 2 Series 3 6 5 4 3 2 1 0 A pictorial device used to show a numerical relationship Vocabulary: spread Vocabulary: spread • A measure of how much a collection of data is spread out. • Commonly used types include range and quartiles. • Also known as measures of variation or dispersion. Vocabulary: center •Knowing where the middle (center) of a set of data would be gives good information. Vocabulary: center • An average; a single value that is used to represent a collection of data. Three commonly used types of averages are mode, median, and mean. • Also called measures of central tendency or measures of average. Vocabulary: outlier or striking deviation A number in a data set that is much larger or smaller than most of the other numbers in the set. Vocabulary: variability Vocabulary: variability • A measure of how much a collection of data is spread out. • Commonly used types include range and quartiles. • Also known as spread or dispersion. Vocabulary: distribution The pattern of the data when a large sample is used will be more likely to look like chart A. This is considered a “normal distribution.” It is sometimes called a Bell Curve. Vocabulary: distribution A peak above the mean such as in chart C is “skewed to the right.” A peak below the mean such as in chart B is “skewed to the left.” Vocabulary: Measure of Center • An average; a single value that is used to represent a collection of data. Three commonly used types of averages are mode, median, and mean. (Also called measures of central tendency or measures of average.) Vocabulary: data • Information, especially numerical information. Usually organized for analysis. Vocabulary: dot or line plot A dot plot is also called a line plot. It is a diagram showing frequency of data on a number line. It is NOT a line graph. Vocabulary: tape diagram • A drawing that looks like a segment of tape, used to illustrate number relationships. Also known as a strip diagram, bar model, fraction strip, or length model Vocabulary: histogram Vocabulary: histogram • A bar graph in which the labels for the bars are numerical intervals. • The data is reported in clusters, or ranges. Vocabulary: box plots A box plot is a diagram that shows the five number summary of a distribution. Five number summary includes lowest value, lower quartile, median, upper quartile, and highest value. Vocabulary: box plot or box and whisker plot Vocabulary: interquartile range A box and whisker plot breaks the data into four parts. Each part is a quartile. The interquartile range identifies the difference between the upper quartile and the lower quartile. These are the boxes on the plot. Vocabulary: lower extreme Lower extreme The smallest or least number out of a data set, usually farther away from the interquartile range than other data in set. Also know as minimum. Vocabulary: Minimum Same as the lower extreme in the previous slide! Vocabulary: Maximum Opposite of the lower extreme; it is the upper extreme on the far right of the box &whisker plot minimum maximum Vocabulary: attribute Large, blue hexagon Small, red triangle A characteristic, such as size, shape or color Vocabulary: Measure of Center • We use several different ways to measure the center. Some are: –Mode (the piece of data most often repeated) –Median (the middle number when data are in numerical order) –Mean (the average of all the numerical data) Vocabulary: mean • • • • • • Step 1: add 3,5,5,4,5,6,2,5=32 Eight data points Step 2: divide 32 ÷8= 4 4 is the mean. Definition: the sum of a set of numbers divided by the number of elements in the set. A type of average. Vocabulary: median 13, 16, 17, 20, 22, 24, 24, 28, 32 When the numbers are arranged from least to greatest, it is the middle number. Vocabulary: median 13, 16, 17, 20, 22, 24, 28, 32 If there are an even number of data points, it is the average of the two middle numbers. (20+22=42; 42÷2=21 ) Vocabulary: mode • Mode (the piece of data most often repeated) • 5, 7, 8, 9, 9, 11 • The mode in the data set above is 9. • It is possible to have more than one mode. Vocabulary: Measure of variation • Range, spread, and mean absolute deviation are measures that indicate how much the data in one data set differ among themselves. • Each is a measure of variation. Vocabulary: range • The difference between the greatest number and the least number in a set of numbers. • Data set: 3, 2, 5, 4, 1, 6, 4, 4, 2, 5, 7, 3 • Largest number is 7 and smallest is 1 • So, the range is 6 because 7-1=6 Vocabulary: mean absolute deviation • • • • • Large cube= 45 kg Cylinder = 30 kg Small cube = 24 kg Mean = 33 kg 45-33=12, 30-33=-3, 24-33=-9, 12+3+9=24/3=8 • 8 is the mean absolute variation Vocabulary: mean absolute deviation • In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Vocabulary: statistical variability • A variability or spread in a variable or a probability distribution. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. Vocabulary: data distribution People in My Neighborhood Age Tally 0-19 11111 11111 1111 14 20-39 11111 111 8 40-59 1111 4 60-89 11111 11 7 1 1 90+ Frequency A table that shows how many there are of each type of data. Vocabulary: data • Information, especially numerical information, usually organized for analysis. Vocabulary: cluster • A group of the same or similar elements gathered or occurring closely together on a graph. Vocabulary: cluster Vocabulary: gap Vocabulary: gap Ages of orchestra members 10-15 xxxx 16-20 xxxxx xxxxx xxx 21-25 26-30 xxxxx xxxx 31-35 xxx Gap in the data • A place on a graph where no data values are present.