Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 1: Exploring Data Review Vaishali Saseenthar Christine Kil Aditi Bellur The BIG Idea • • • • • Data Production Data Analysis Probability Statistical Inference Presenting the statistics: – Plotting the data: Dotplot, stemplot, histogram. – Interpreting the data: SOCS (shape, outliers, center, spread) – Numerical summary: mean and SD, five-number summary When do YOU use this?? • We use graphs such as histograms, stemplots, pie graphs, and bar graphs to organize and present the data. • Data analysis is used to draw conclusions from our data. • Examining the distributions for shape, center, spread, and deviations helps understand the data. • Data analysis is used practically in economics, politics, journalism, and other fields. Vocab to know • • • • • • • • • • Statistical inference- drawing conclusions about a large group based on a smaller group. Surveys- popular ways to gauge public opinion; asks the individuals in the sample some questions and record their responses. Observational study- observing individuals and measuring variables but no influencing the responses. Experiment- deliberately influencing individuals to observe responses. Individuals- objects described by the data. Variable- any characteristic of the individual. Categorical variable- individual is in one of several groups. Quantitative variable- numerical values for adding and averaging. Mean- average value x-bar. Is sensitive to the influence of a few extreme observation. Outlier- individual value that falls outside the overall pattern. Vocab to know dos • • • • • • • • • pth percentile of a distribution is the value such that p% of the observations fall at or below it. First quartile Q₁- median of observations to the left of the overall median in the ordered list. Third quartile Q₃- median of observations to the right of the overall median in the ordered list. Five-number summary of a set of observations = the smallest observation, the first quartile, the median, the third quartile, and the largest observation written smallest to largest. Boxplot- graph of the five-number summary. Interquartile range- the distance between the quartiles (the range of the center half of the data) is a more resistant 50% of range. Standard deviation measure spread by looking at how far the observations are from their mean. Variance of a set of observations- the average of the squares of the deviations of the observations from their mean. Data analysis- art of describing data using graphs and numerical summaries. Key Topics Covered in this Chapter • • • • • • • • • • • • • • • • • Graphs for Categorical Variables Stem plots Histograms Examining Distributions Dealing with Outliers Relative Frequency and Cumulative Frequency Time Plots Measuring Center: The Mean Mean versus Median Measuring spread: The Quartiles The Five-Number Summary and Boxplots The 1.5 X IQR Rule for Suspected Outliers Measuring Spread: The Standard deviation Properties of Standard Deviation Choosing Measures of Center and Spread Changing the Unit of Measurement Comparing Distributions Calculator Key Strokes • Calculator Boxplots and Numerical Summaries 1. Enter first set of data in L1/ list 1 and second set in L2/ list 2 2. Set up two statistics plots: Plot 1 to show modified boxplot of data in list 1 and Plot 2 to show modified boxplot of data in list 2 3. Use the calculators zoom feature to display the sideby-side boxplots 4. Calculate numerical summaries for each set of data 5. Notice the down-arrow on the left side of the display. Press down to see other statistics. Formulas You Should Know • Mean: • Variance: x-bar = x(1) + x(2)….+ x(n) n • Standard Deviation: • Linear transformation: X(new)= a + bx Helpful HINTS • Stemplots do not work well for large data sets where each stem must hold a large # of leaves. • Use histograms of %’s for comparing several distributions with different # of observations. • When examining a distribution, look for the overall pattern and striking deviations from the pattern. • Look for outliers that are clearly apart from the body of the data, not just in the most extreme observations. • The simplest useful numerical description of a distribution is the measure of center and measure of spread.