Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture Hours: Wednesdays, ST2002 11.00 – 12.00 (LB01) 3.00 – 4.00 (107) Lecturenotes: http://www.cs.tcd.ie/Rozenn.Dahyot/ Rozenn Dahyot http://www.cs.tcd.ie/Rozenn.Dahyot/ [email protected] Room 128 Lloyd Institute Contents Statistical inference Analysis of Variance Regression Sampling (Some) References An Introduction to Statistical Analysis for Business and Industry; a problem solving approach, Stuart, M., 2003 Regression with Graphics: a second course in applied statistics, Hamilton, Lawrence C., 1992. Introductory Statistics, 5th edition, P.S. Mann Quantitative analysis In domains such as business, it is important: identifying the information requirements for solving the problem; producing the necessary data; analysing the data to determine key patterns and relationships; quantifying the uncertainty involved; reporting the results in readily interpretable form; advising on interpretation and on solution implementation. What is statistics ? Statistics is a group of methods used to collect, analyse, present, and interpret data and to make decisions. Descriptive statistics: for organizing, displaying and describing data (using table, graph, summary measure) Inferential statistics: use sample results to help make decisions or predictions about a population. 1 Population Vs Sample Sampling A population consists of all elements whose characteristics are being studied. Random sample: each member of the population as an equal and known chance of being selected. A sample: a portion of the population selected for study. Systematic sampling: a sample drawn at regular interval from a list of population members. As long as there is no hidden order in the list, this is as good as random sampling. Stratified sampling: the population is divided into subsets and random samples are drawn from each subsets. The set of all members of the population is called a census. When only a portion of the population is available, we have a sample survey. A representative sample is a sample that captures as closely as possible the characteristics of the population. Sampling Convenience sampling: used in (inexpensive) exploratory research. The samples are selected because they are convenient. Judgement sampling: the selection of the sample is motivated from prior knowledge. Quota sampling: the population is divided into subsets (like stratified sampling) and their proportion is assessed. However their sample is not randomly selected. Sampling distribution The probability distribution of a sample statistic is called its sampling distribution. Parameters of the sampling distribution can be computed. Population distribution The population distribution is the probability distribution of the population data. Errors Sampling error is the difference between the value of a sample statistic and the value of the corresponding population parameter. Nosampling errors: errors occurring in the collection, recording, and tabulation of data. 2 Errors: Exercise Shape of the sampling distribution Five students took a test and their scores are: 70, 78, 80, 80, 95. Suppose one sample of three scores (70, 80, 95) is selected from this population. Mean and standard deviation of The Distribution x µx = µ σx = σ n f (x ) (a) Find the sampling error. is Normal: • If f(x) is normal then the sum of independent normal r.v. is normal (b) now suppose that one selected score has been mistakenly set to 82 instead of 80. Find the non sampling error. Application of the sampling distribution Sampling distribution: Exercise The mean wage per hour for all 5000 employees who work at a large company is $17.5 and the standard deviation is $2.9. Find the mean and standard deviation (i.e. standard error) of for a sample size of • If f(x) is not normal then the central limit theorem justifies the choice of normality for the sampling distribution x You can calculate: P( µ x − σ x ≤ x ≤ µ x + σ x ) = .68 P( µ x − 2σ x ≤ x ≤ µ x + 2σ x ) = .95 (a) 30 (b) 75 (c) 200 P ( µ x − 3σ x ≤ x ≤ µ x + 3σ x ) = .997 Use the standard normal distribution table with: z= x−µ σx 3