Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Thank you: William Lim Rachel Velasco Terry Woo INTRODUCTION Individuals and Variables Variables Individuals are the A categorical variable places objects described by a set of data. Individuals may be people, but they may also be animals or things. A variable is any characteristic of an individual. A variable can take different values for different individuals. an individual into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense. The distribution of a variable tells us what values the variable takes and how often it takes these values. 1.1 BAR GRAPHS Compares the sizes of the groups or categories More flexible Sizes can be measured as frequencies or percents Compares what part of the whole the PIE CHARTS group is Must include all categories that make up the whole Sizes can be measured as frequencies or percents DOTPLOTS Compares the range of the data and its variables Useful with regard to categorical or qualitative variables 1. Draw a horizontal line and label it with the variable. 2. Mark a dot above the number on the horizontal axis corresponding to each data value. STEMPLOTS 1. Separate each value into a “stem” made up of all but the rightmost digit and a “leaf”, the final digit. 2. Write the “stems” vertically in increasing order from top to bottom and draw a vertical line to the right of the “stems”. Write each “leaf” to the right of its “stem”. 3. Rearrange the “leaves” in increasing order out from the stem. TIPS FOR CONSTRUCTING STEMPLOTS Be sure each stem is assigned an equal number of possible leaf digits Too few stems—skyscraper-shaped plot Too many stems—flat “pancake” graph Five stems is a good minimum Get flexible by rounding data so the final digit after rounding is suitable as a leaf Do this when the data have too many digits Useful for comparing quantitative distributions. HISTOGRAMS Most common graph of the distribution of one quantitative variable Five classes is a good minimum Avoid skyscraper/pancake Choose classes all the same width to compare area OGIVE (CUMULATIVE RELATIVE FREQUENCY PLOT) Use instead of a histogram to show relative standing of an individual observation Percentile: the pth percentile of a distribution is the value such that p percent of the observations fall at or below it TIME PLOTS Plots each observation against the time at which it was measured Mark the time scale on the horizontal axis and the variable of interest on the vertical axis Connecting the points by lines helps show the patterns of changes over time if there aren’t too many points • Trend: a long-term upward or downward movement over time • Seasonal variation: a pattern that repeats itself at regular time intervals DISTRIBUTION Shape Symmetric: right and left sides are approximately mirror images of each other Skewed right: right side of the distributions extends much farther out than the left Skewed left: left side of the distribution extend much farther out than the right Clusters or several distinct peaks or gaps Uniform (same response for any value of x) Center: separates the values roughly in half Median and mean Spread: scope of the values from smallest to largest Outliers: extreme values 1.2 MEASURING CENTER Mean: add the values up and divide by the number of observations 𝑥 = 1 𝑛 𝑥𝑖 Median: midpoint of a distribution Resistant MEASURING SPREAD Range: difference between the largest and smallest values Interquartile Range (IQR) IQR = Q 3 − Q1 Q1: 25th percentile Q2: 50th percentile (median) Q3: 75th percentile Resistant Variance: averaging the squared differences of all the values from the mean 𝑠 2 = (𝑥−𝑥)2 𝑛−1 Degrees of freedom: n - 1 Standard deviation: square root of the variance 𝑠 = (𝑥−𝑥)2 𝑛−1 MEASURING SPREAD Variance Large if the observations are widely spread about their mean Small if the observations are all close to the mean Sum of the deviations of the observations will always be zero Standard Deviation s = 0 when there is no spread s > 0: as spread increases, s gets larger Nonresistant Strongly influenced by outliers or skewness OUTLIERS Q1−(1.5 × IQR) Q3+(1.5 × IQR) FIVE-NUMBER SUMMARY Minimum Q1 Median Q3 Maximum BOXPLOTS Graph of a five number summary Modified boxplot: outliers are plotted individually (the lines show the largest data points that aren’t outliers- they don’t show “the fence”) LINEAR TRANSFORMATIONS When you add constant a to all the values, the mean and median increase by a When you multiply by constant b, the mean, median, IQR, and standard deviation are multiplied by b • • • • • Normal Curves/Distributions- A type of density curve that is symmetric, single-peaked and bell-shaped Changing mu moves the curve from side to side along the horizontal axis Changing sigma changes the spread of the curve Normal distributions follow the 68-95-99.7 rule • 68% of observations fall within 1 standard deviation of the mean • 95% of observations fall within 2 standard deviations of mean • 99.7% of observations fall within 3 standard deviations of the mean Abbreviate normal distributions with the notation N(mean, standard deviation) 2.2 STANDARD NORMAL CALCULATIONS • To standardize a normal distribution: • The standardized value is called a z-score • Z-score tells us how many standard deviations the observation is from the mean Standardizing a variable that is normally distributed produces a new variable that has a "standard normal distribution" • This "standard normal distribution" is described N(0,1) with mean 0 and standard deviation 1 •