Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 1 An Introduction to Statistics and Statistical Inference Copyright 2011 by W. H. Freeman and Company. All rights reserved. 1 Statistics 1. 2. 3. Science of collecting and interpreting data. (newer names and directions: data science, analytics, data mining, machine learning) Examples of statistics in everyday life: Online shopping, netflix, recommending system National election polling Marketing research; actuary; Clinical trial for medicine Make decisions, assess risk, draw a conclusion. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 2 Descriptive Statistics Graphical and numerical methods used to describe, organize, and summarize data. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 3 Inferential Statistics Def:Techniques and methods used to analyze a small, specific set of data in order to draw a conclusion about a large, more general collection of data. Question: T/F: Descriptive statistics are used to indicate how the data are collected. T/F: Inferential statistics are used to draw a conclusion about a population. Fill in the blank: The entire collection of objects being studied is called the ----. A small subset from the set of all 2013 minivans is called ----. Consider the amount of sugar in breakfast cereals. This characteristics of breakfast cereal is called -----. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 4 Population, Sample, Variable A population is the entire collection of individuals or objects to be considered or studied. A sample is a subset of the entire population, a small selection of individuals or objects taken from the entire collection. A variable is a characteristic of an individual or object in a population of interest. Example: Marketing and consumer behavior: Magazines, newspapers, and books have become more readily available in digital format. In addition, the quality of readers, for example, the Kindle, Nook, and iPad, has increased. A recent study suggests that 21% of adults read an ebook within the past year in US. Suppose a subset of 500 adults in US is obtained. Describe the population and the sample in the problem. Write a probability and a statistics question involving the population and the sample. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 5 Probability vs. Statistics In order to solve a probability problem, certain characteristics of a population are assumed known. We then answer questions concerning a sample from that population. In a statistics problem, we assume very little about a population. We use the information about a sample to answer questions concerning the population. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 6 Probability vs. Statistics Illustration Copyright 2011 by W. H. Freeman and Company. All rights reserved. 7 Observational Study In an observational study, we observe the response for a specific variable for each individual or object. Example: poll; ebook example above; case-control retrospective studies on smoking habit and health risk Copyright 2011 by W. H. Freeman and Company. All rights reserved. 8 More on sampling If a sample is not random, it is biased. (In Toronto Mayor election, if we collect samples only from Etobicoke, what will happen?) election Non-response bias Self-selection bias If the population is infinite, the number of simple random sample is also infinite Copyright 2011 by W. H. Freeman and Company. All rights reserved. 9 Experimental Study In an experimental study, we investigate the effects of certain conditions on individuals or objects in the sample. Examples: clinical trial; agricultural experiments Copyright 2011 by W. H. Freeman and Company. All rights reserved. 10 Simple Random Sample (SRS) A (simple) random sample (SRS) of size n is a sample selected in such a way that every possible sample of size n has the same chance of being selected. Question: If we have a population of 5 individuals:A,B,C,D,E. If we draw SRS of size 2, what is the probability that a subset of A and B are chosen? What is the probability that individual A is chosen in the sample? Question: How to implement SRS? Method 1: random number table Copyright 2011 by W. H. Freeman and Company. All rights reserved. 11 Copyright 2011 by W. H. Freeman and Company. All rights reserved. 12 How to select a SRS of size 5? • • • Suppose the population subjects are labelled from 00,01,…99. Start any location and follow any direction on the table. Assume we start from the first row. We record two digits at a time, discard the repeated ones until we get 5 unique pairs of two-digit labels: 11, 74, 26, 93, 81. We got the sample of {11,74,26,93,81} Method 2: Use computer: www.r-fiddle.org x<-sample(0:99,5) Try it a few times, do you get different SRS? Copyright 2011 by W. H. Freeman and Company. All rights reserved. 13 Statistical Inference Procedure The process of checking a claim can be divided into four parts Claim Experiment Likelihood Conclusion Copyright 2011 by W. H. Freeman and Company. All rights reserved. 14 Statistical Inference Procedure Claim This is a statement of what we assume to be true. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 15 Statistical Inference Procedure Experiment In order to check the claim, we conduct a relevant experiment. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 16 Statistical Inference Procedure Likelihood Consider the likelihood of occurrence of the observed experimental outcome, assuming the claim is true. We will use many techniques to determine whether the experimental outcome is a reasonable observation (subject to reasonable variability) or whether it is a rare occurrence. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 17 Statistical Inference Procedure Conclusion There are only two possible conclusions: 1. If the outcome is reasonable, then we cannot doubt the claim. We usually write, “There is no evidence to suggest that the claim is false.” 2. If the outcome is rare, we disregard the lucky alternative, and question the claim. A rare outcome is a contradiction. It shouldn't happen (often) if the claim is true. In this case we write, “There is evidence to suggest that the claim is false.” Copyright 2011 by W. H. Freeman and Company. All rights reserved. 18 Example The wireless emporium ships a box containing 1000 cell phone chargers and claims 999 are in perfect condition and only 1 is defective. Upon receipt of the shipment, a quality control inspector reaches into the box and mixed the chargers around a bit, select one at random and it is defective! Copyright 2011 by W. H. Freeman and Company. All rights reserved. 19 Example Claim: There were 999 good cell phone chargers and 1 defective charger in the box. Experiment: Random select one charger and it is defective. Likelihood: If the claim is true, the probability of getting the defective one is 0.001. Conclusion: The experiment outcome is extremely rare and unlikely. We disregard the lucky alternative and question the claim. There is evidence to suggest that the claim is false. Copyright 2011 by W. H. Freeman and Company. All rights reserved. 20