Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stats Review Topics 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Combinations and Permutations (2) Graphs: Box & Whisker, histograms (4) Shape of data distributions: shape, outliers, skew, spread & center (3) 2-way tables and conditional probability (3) Probability calculations – mutual exclusive, independent, conditional (5) Normal Distribution & Probability (3) Regression – finding equation from mean and variance of each sample, correlation (4) Sampling methods & bias (3) Mean and variance of random variables (expected value) (3) Hypothesis testing (5) Combinations & Permutations Permutations: The number of ways to put n out of r items IN ORDER. 1. To put all n items in order n! 2. To put n out of r items in order nPr = n!/ (n-r)! 3. Calculator: MATH > PRB 2:nPr Combinations: The number of ways to choose n out of r items when ORDER DOES NOT MATTER. 1. To choose n items: n! /[ (n-r)!r!] 2. Calculator: MATH > PRB 3:nCr Example 1: How many ways can a 4 digit code be created if none of the digits can be repeated? Ex 2: How many ways can Gold, Silver & Bronze be awarded in a race with 8 runners? Ex 3: How many ways can an ordered playlist of 6 songs be made from the 11 songs on an album? Ex 4: How many ways can 4 students from a class of 20 be picked to receive the same award? Ex 5: How many ways can the letters in the word DAWSON be rearranged? Ex 6: How many ways can you choose 2 side dishes with your meal at a restaurant if there are 8 to choose from? Ex 7: An ice cream shop allows 4 different toppings on a sundae. How many different sundaes can be made if they have 7 available toppings to choose from? Ex 8: A test has 5 essay questions and each person must complete 3. How many different choices are there of 3 essay questions to answer? Graphs of 1 variable data Histogram Equal width “bins” to collect data values. Bars need to touch… it is a continuous graph Height of each bar represents the frequency of data items in that “bin” Used to see information on center and skew Box & Whisker Shows the “5 number summary” Min, 1Q, median, 3Q, Max Easy to find the Inter-quartile Range (IQR) – this is the width of the box Easy to see outliers . Any value (1.5)IQR higher than 3Q or lower than 1Q Divides the data into 4 equal sections Example 1: Using the following data draw a histogram and a box & whisker plot. 16, 13, 18, 12.5, 9.5, 15, 22.5, 16.5, 14, 13, 11, 12, 15, 12.5, 11, 13.5 Example 2: Are there any outliers in the data. What effect would removing any outliers have on the data? Shape of Data Distributions SHAPE - Symmetrical, Skewed, Modes OUTLIERS – Points that are further out than 1.5IQR from the 1st or 3rd quartiles CENTER – Mean if data is symmetrical / Median if Data is skewed SPREAD – Standard Deviation goes with Mean / IQR goes with Median / Range Example 1: A sample of SAT scores from 100 students is symmetrical and unimodal. What measure of center and spread should be used? Example 2: The ages of people at a theme park is skewed to the right. What measure of center and spread should be used? Example 3: The 5 number summaries of points scored in basketball games are shown below. Which team has the smallest IQR? Team 1: 42, 58, 62, 68, 75 Team 3: 59, 64, 68, 74, 91 Team 2: 44, 51, 59, 64, 68 Team 4: 62, 68, 74, 76, 79 Example 4: In the above example, which teams had game scores that would be outliers? Example 5: The following distances were recorded by a company’s fleet vehicles: 330, 402, 350, 382, 31, 412, 375, 363 It turns out that the 31 was a mistake and should be 310. How does that change effect the mean, median, IQR, range, and standard deviation? Two-Way Tables and Conditional Probability Two Way tables show the relationship between two variables. Marginal Probability – the probability along the edges of the table in the TOTAL column and row Joint Probability – The probability of the cells inside the table. Based on the table TOTAL Conditional Probability – A joint probability based on only the column or row specified Independent – the probability of event A is the same as the probability of A given B Blue L.H. 12 R.H. Total 37 Green 8 17 25 Brown 35 71 106 Total 55 113 168 Example 1: What is the probability of being left handed? Example 2: What is the probability of having green eyes and being right handed? Example 3: What is the probability of having green eyes, given that someone is right handed? Example 4: What is the probability of having green eyes? Example 5: Compare your answers from #3 and 4. Are they about the same? Does that mean green eyes are dependent or independent from right handedness? Girls Boys Total 10th 0 4 1 11th 2 6 8 12th 13 15 28 total 15 25 37 Example 6: What is the probability of being a girl in this class? Example 7: What is the probability of being in 11th grade? Example 8: What is the probability of being a girl given that a student is in 11th grade? Example 9: What is the probability of being in 11th grade given that the student is a girl? Example 10: Do these two variables appear to be dependent or independent? Pepsi Coke Total Dogs 25 Cats 75 Total 40 60 100 Example 11: If Pet preference is independent from soda preference, how many cat lovers should prefer Coke? Conditional Probability Two events are INDEPENDENT if P(A) is the same, with or without event B. So P(A) = P(A|B). For independent events, P(A) AND P(B) = P(A)*P(B) P(A) OR P(B) = P(A) + P(B) – P(A)*P(B) Two events are MUTUALLY EXCLUSIVE if they can not happen at the same time. These are NOT independent. For mutually exclusive events: P(A) AND P(B) = 0 P(A) OR P(B) = P(A)+P(B) P(A|B) = P(A and B) / P(B) P(A and B) = P(A|B) * P(B) Example 1: P(A) = .7 and P(B) = .4 and the probability of both happening together is 0.15. What is the conditional probability of event A given event B? ANSWER: P(A|B) = P(A and B) / P(B) 0.15/.4 = .375 Example 2: P(A) = .6 and P(B) = .2 and the probability of both happening together is .17. What is the conditional probability of B given A Example 3: The probability of rain is 40%, the probability of temperatures over 90 is 20% and the probability of both is 5%. What is the probability of the temp being over 90 given that it’s raining? Example 4: In a certain bloodline, the probability of gray-eyed goats is 15% and the probability of blueeyed goats is 25%. These are mutually exclusive. What is the probability of a goat having blue OR gray eyes? Example 5: In an experiment, you flip a coin and roll a standard die. What is the probability the coin landing on heads OR the die showing an even number. These are independent. Example 6: The probability that I go to the beach on Saturday is 0.75. The probability that I go to the beach and get sunburn is .30. What is the probability that I get sunburn given that I am going to the beach? Example 7: P(A) = .3 and P(B|A) = .3. Find the probability of (A and B). Are these events mutually exclusive? Are they independent? Normal Distribution Calculator: STAT > Test > Z-test. Type in the mean and standard deviation of the population. The number you are testing is considered x-bar and n=1. You can find the probability of a number being greater or less than your chosen value. Either CALCULATE or DRAW will give you p (the probability associated with the number) 1) An insurance company finds that the ages of motorcyclists killed in crashes are normally distributed with a mean of 26.9 years and a standard deviation of 8.4 years. If we randomly select one such motorcyclist, find the probability that he or she was under 25 years old 2) A population of 700 scores has a mean of 5.40, a standard deviation of 1.20, and its distribution is approximately normal. If a score is randomly selected, find the probability that it is greater than 5.00. 3) A sociologist finds that for a certain segment of the population, the numbers of years of formal education are normally distributed with a mean of 13.20 years and a standard deviation of 2.95 years. a) For a person randomly selected from this group, find the probability that he or she has between 13.20 and 13.50 years of education. b) For a person randomly selected from this group, find the probability that he or she has at least 12.00 years of education. 4) Scores on a standard IQ test are normally distributed with a mean of 100 and a standard deviation of 15. Find the probability that a randomly selected subject will achieve a score between 90 and 120. Regression & Correlation The regression line is the line of “best fit” through a set of data points. The slope of the line is m = r (sy / sx) The line also passes through the point (𝑋̅, 𝑌̅) r = correlation coefficient which is how well Y is really matched with X. residual: the difference between the predicted value from the equation and the actual data value. Remember: Variance = standard deviation 2 Calculator: Enter numbers into STAT: EDIT Use STAT: CALC: 1 var stats or 2 var stats for easy calculations Use STAT: CALC: Lin Reg (a+bX) to get regression equation and correlation information (b=slope) Example 1: Two different tests were designed to measure understanding of a topic. The two tests were given to ten students with the following results: Find the equation of the regression line, round to the nearest hundredth. Example 2: Using the data above, find the variance of each of the test versions. Is there a difference in variance between the tests? Example 3: The accompanying data illustrates the number of movie theaters showing a popular film and the film's weekly gross earnings, in millions of dollars. Find the slope of the regression equation: Theaters: mean = 616.75 standard deviation = 205.08 Gross Earnings: mean = 4.63 standard deviation = 2.13 a) correlation = 0.9807 Find the slope of the regression equation b) Write the appropriate regression equation c) When the movie was shown in 530 theatres, the actual income was $4.05 million. Find the residual. Sampling Methods & Bias Methods to know: Convenience Sample: Sample is chosen simply because they are easy to contact. Simple Random Sample: Each member of the population has an equal chance to be chosen. Cluster Sample: The population is divided into clusters, each of which could represent the population and one or a few of the clusters are randomly selected to include in the sample. Stratified Sample: The population is separated into distinct groups and a random sample is taken from each group so that some members of each group are included in the sample. Census: A survey of the entire population BIAS Response: Questions are asked in a way that influences the answers. Or respondents are lead toward non-truthful answers. Non response: The results of a survey are altered due to a large number of people who can not or will not respond, especially if there is a common reason for their non-response. Voluntary Response: Survey members are self-selected volunteers Underrepresentation: Members of the population are inadequately represented in the sample. This is usually a result of convenience sampling. Example 1: I want a survey of the students at UHS to see how they feel about having a final exam week. Identify the types of sampling represented: a. b. c. d. e. I ask the first 20 students in building 3 in my hallway in the morning. I ask 10 freshmen, 10 sophomores, 10 juniors, and 10 seniors I list all students in alpha code order and use a random number generator to pick 50 students I ask all the students in my 4th period class I put a box in the cafeteria for students to enter their responses on paper if they choose. Example 2: In each of the samples above, what type of bias may occur? Random Variable (Expected Value) A random variable includes possible values of the variable and the probability of each value. The probabilities of all options must add up to 1 The mean (expected value) of the variable is the sum of each possible value times its possibility E(X) = ∑ 𝒏 𝑷(𝒏) The variance of the random variable Example 1 E(x2) – [E(X)]2