Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Definitions Population: A collection, or set, of individuals, objects, or events whose properties are to be analyzed. Sample: A subset of the population. We desire knowledge about an entire population but is most often the case that it is prohibitively expensive, so we select representative sample from the population and study the individual items in the sample. Descriptive Statistics: The collection, presentation, and description of of the sample data. Inferential Statistics: The technique of of interpreting the values resulting from the descriptive techniques and making decisions and drawing conclusions about the population. Section 1.1, Page 4 1 Definitions Parameter: A numerical value summarizing all the data of a population. For example, the average high school grade point of all Shoreline Students is 3.20. We often use Greek letters to identify parameters, μ = 3.20. Statistic: A numerical value summarizing the sample data. For example, the average grade point of a sample of Shoreline Students is 3.18. We would use the symbol, x 3.18 The statistic corresponds to the parameter. We usually don’t know the value of the parameter, so we take a sample and estimate it with the corresponding statistic. Sampling Variation: While the parameter of a population is considered a fixed number, the corresponding statistic will vary from sample to sample. Also, different populations give rise to more or less sampling variability. Considering the variable age, samples of 60 students from a Community college would have less variability than samples of a Seattle neighborhood. Section 1.1, Page 4 2 Variables Variable: A characteristic of interest about each element of a population. Data: The set of values collected for the variable from each of the elements that belong to the sample. Variability: The extent to which data values for a particular variable differ from each other. Numerical or Quantitative Variable: A variable that quantifies an element of the population. The HS grade point of a student is a numerical variable. Numerical variables are numbers for which math operations make sense. The average grade point of a sample makes sense. Continuous Numerical Variable: The variable can take on take on an uncountable number of values between to points on the number line. An example is the weight of people. Discrete Numerical Variable: The variable can take on a countable number of values between two points on a number line. An example is the price of statistics text books. Section 1.1, Page 8 3 Variables (2) Categorical or Qualitative Variable: A variable that describes or categorizes an element of a population. The gender of a person would be a categorical variable. The categories are male and female. Nominal Categorical Variable: A categorical variable that uses a number to describe or name an element of a population. An example is a telephone area code. It is a number, but not a numerical variable used on math operations. The average area code does not make sense. Ordinal Categorical Variable: A categorical variable that incorporates an ordered position or ranking. An example would be a survey response that ranks “very satisfied” ahead of “satisfied” ahead of “somewhat satisfied.” Limited math operations may be done with ordinal variables. Section 1.1, Page 8 4 Problems Problems, Page 19 5 Problems Problems, Page 18 6 Problems Section 1.3, Page 20 7 Observational Studies and Experiments Observational Study: Researchers collect data without modifying the environment or controlling the process being observed. Surveys and polls are observational studies. Observational studies cannot establish causality. Example: For a randomly selected high school researchers collect data on each student, grade point and whether the student has music training, to see if there is a relationship between the two variables. Experiments: Researchers collect data in a controlled environment. The investigator controls or modifies the environment and observes the effect of a variable under study. Experiments can establish causality. Example: Randomly divide a sample of people with migraine headaches into a control and treatment groups. Give the treatment group a experimental medication and the control group a placebo, and then measure and compare the reduction of frequency and severity of headaches for both groups. Section 1.3, Page 12 8 Single-Stage Sampling Methods Single-stage sampling: A sample design in which the elements of the sampling frame treated equally and there is no subdividing or partitioning of the frame. Simple Random Sample: Sample selected in such a way that every element of the population has an equal probability of being selected and all samples of size n have an equal probability of being selected. Example: Select a simple random sample of 6 students from from a class of 30. 1.Number the students from 1 to 30 on the roster. 2.Get 6 non-recurring random numbers between 1 and 30. 3.The six students who match the six random numbers are the sample. Section 1.3, Page 13 9 Multistage Sampling Designs Multistage Sampling: A sample design in which the elements of the sampling frame are subdivided and the sample is chosen in more than one stage. Stratified Random Sampling: A sample is selected by stratifying the population, or sampling frame, and then selecting a number of items from each of the strata by means of a simple random sampling technique. The strata are usually subgroups of the sampling frame that are homogeneous but different from each other. Example: Select a sample of six students from a class of 30 so that the sample contains an equal number of males and females. 1.List the males and females separately 2.Take a simple random sample of 3 students from each group. 3.The six students selected are the sample. Section 1.3, Page 15 10 Multi-Stage Sampling Designs Cluster Sample: A sample obtained stratifying the population, or sampling frame, and then selecting some or all of the items from some, but not all of the strata. The strata are usually easily identified subgroups of the sampling frame that are similar to each other. This is often the most economical way to sample a large population. Example: Take a sample of 300 Catholics in the Seattle Area. 1. Get a list of the Catholic Parishes in the Seattle area. 2. Take a random sample of 3 parishes. 3. In each parish, select a simple random sample of 100 parishioners. Section 1.3, Page 16 11 Problem a. Find the mean, variance, and standard deviation. b. Find the 5-number summary. c. Make a box and whisker display and label the numbers. d. Calculate the Interquartile range and the range e. Describe the shape of the distribution Problems, Page 50 12 Summary of Probability Formulas Equally Likely Outcomes: P(A) = n(A)/n Complement: P(A) = 1- P(not A); P(not A) =1- P(A) General Addition Rule: P(A or B) = P(A) + P(B) – P(A and B) If A and B are disjoint, P(A and B) = 0 Then the Special Addition Rule: Then P(A or B) = P(A) + P(B) General Multiplication Rule: P(A and B) = P(A)×P(B|A) If A and B are independent, P(B|A) = P(B) Then the Special Multiplication Rule: P(A and B) = P(A)×P(B) Odds If the odds for A are a:b, then the odds against A are b:a. The probability of A is a/(a+b). The probability of not A is b/(b+a) Chapter 4 13 Problems Problems, Page 95 14 Problem Problems, Page 95 15 Problems Problems, Page 97 16 Problems Problems, Page 99 17 Z Score Problems Problems, Page 52 18 Problems Problems, Page 132 19 Problems 6.51 IQ scores are normally distributed with a mean of 100 and a standard deviation of 16. Find the following: a. The 66th percentile. b. The 80th percentile. c. The minimum score required to be in the top 10%. d. The minimum score to be in the top 25%. 6.52 Find the two z-scores that bound the middle 30% of the standard normal distribution. Problems, Page 133 20 Problems Problems, Page 149 21 Problems Problems, Page 151 22 Problems Problems, Page 50 23 Problems Problems, Page 179 24 Problems Test the claim that the BMI of the cardiovascular technologists is different than the BMI of the general population. Use α = .05. Assume the population of the BMI of the cardiovascular technologists is normal. a.State the necessary hypotheses. b.Is the sampling distribution normal. Why? c.Find the p-value. d.State your conclusion. e.If you made an error, what type of error did you make? Problems Page 181 25 Problems Problems, Page 179 26 Problems a. b. c. d. e. Find the 98% confidence interval. Find the critical value Find the margin of error. Find the standard error. What assumption must we make about the the population to have a t-sampling distribution. f. What are the proper words to describe the confidence interval? g. If you wanted to have a margin of error of one minute and the 98% confidence interval for this data, how large must the sample be? Problems, Page 205 27 Problems a. Find the p-value. b. State your conclusion. c. What is the name of the probability model used for the sampling distribution d. What is the mean of the sampling distribution? e. What is the value of the standard error? f. If your conclusion is in error, what type of error is it? Problems, Page 205 28 Problems Problems, Page 208 29 Problems a. b. c. d. e. f. g. Check the conditions for a normal sampling distribution. State the hypotheses. Find the p-value. State your conclusion If you make an error in your conclusion, what type is it? Find the mean of the sampling distribution. Find the standard error of the sampling distribution. Problems, Page 207 30 Dependent and Independent Samples Section 10.1, Page 208 31 Problems a. Test the hypotheses that the people increased their knowledge. Use α=.05 and assume normality. State the appropriate hypotheses. b. Find the p-value and state your conclusion. c. Find the 90% confidence interval for the mean estimate of the increase in test scores. Problems, Page 231 32 Problems a. State the hypothesis (Assume Normality) b. Find the p-value, and state you conclusion. c. Find the 95% confidence interval for the difference of the means; Gouda-Brie. d. Find the mean and standard error of the sampling distribution Problems, Page 232 33 Problems a. State the appropriate hypotheses. b. Find the p-value and state your conclusion. c. What model is used for the sampling distribution and what is the mean of the sampling distribution and its standard error? d. Find the 98% confidence interval for the difference in proportions, men – women. Problems, Page 234 34 Summary of Chi-Square Applications Goodness of Fit Test Given one categorical variable with a fixed set of proportions for the categories. Ha: The observed data does not fit the proportions. Calculate expected values (Ho true proportion * total observations) Observed and Expected data in List Editor PRGM: GOODFIT Test for Independence Given two categorical variables measured on the same population. Ha: The variables are not independent (They are related) Observed data in Matrix Editor Stat-Tests-χ2 Test Test for Homogeneity Given one categorical variable and two or more populations. Ha: The proportions for the categories are not the same for for all populations. Observed data in Matrix Editor Stat-Tests-χ2 Test Chapter 12, Summary 35 Chi-Square Distribution Fair Die Example Now we need a sampling distribution for the Χ2 statistic = 2.2, so we can calculate the probability of getting a Χ2 ≥ 2.2 when the true proportions are all equal to 1/6. Χ2 Distribution for 5 df This is a distribution of all possible Χ2 statistics calculated from all possible samples of 60 observations when there are 6 proportions or cells. Note that the degree of freedom equals the number of proportions – 1. Finding the p-value on the TI-83, Given Χ2 Stat, df PRGM – CHI2DIST LOWER BOUND: 2.2 UPPER BOUND: 2ND E99 df: 5 Output: P-VALUE = 0.8208 The null hypothesis cannot be rejected. Section 11.2, Page 240 36 Problems a. Perform a hypotheses test to see if the preferences are not all the same. State the hypotheses. b. Find the p-value and state your conclusion c. What is the name of the model used for the sampling distribution? Problems, Page 252 37 Problems a. Perform a hypotheses test to see if the preferences are not all the same. State the hypotheses. b. Find the p-value and state your conclusion c. What is the name of the model used for the sampling distribution? Problems, Page 252 38 Problems a. Test the hypotheses that the size of community reared in is independent of the size of community residing in. State the appropriate hypotheses. b. Find the p-value and state your conclusion c. What is the name of the sampling distribution? d. What are the necessary conditions, and are they satisfied? What is the value of the smallest expected cell? Section 11.3, Page 254 39 The F-Distribution 4. Each sample must be from a normal distribution Sec 10.5, Page 226 40 Problem Set up the problem so that the the F-Stat >1. a. State the necessary hypotheses. b. Find the p-value and state your conclusion. c. What is the name of the model used for the sampling distribution? Problems, Page 234 41 Problems a. b. c. d. State the necessary hypotheses. Sketch the side-by-side box plots. Does it appear that the means are all the same? Find the p-value and state your conclusion. What is the name of the model used for the sampling distribution? Sec 12.1, Page 268 42 Problems Sample Size Sample Sample Mean St. Dev. Atlanta 6 24.67 7.76 Boston 7 33.00 9.56 Dallas 7 30.86 7.58 Philadelphia 5 32.20 7.47 Seattle 5 27.40 9.40 St. Louis 6 25.83 10.03 a. Test the hypotheses that not all the mean commute times are all the same. State the appropriate hypothesis. b. Find the p-value and state your conclusion. c. What is the name of the sampling distribution? d. What is the F-Statistic, the df numerator and df denominator? Problems, Page 268 43