* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download STA2023 Statistical Methods Class NOTES
Survey
Document related concepts
Transcript
STA2023 Statistical Methods NOTES – Prof L. Blanchette Textbook: Elementary Statistics: A Step by Step Approach, a Brief Version, 5th Edition Allan G. Bluman, McGraw-Hill, 2010 CHAP 1: This chapter is an introduction to statistics. Read through carefully; focus on the broad concepts and pay special attention to the following: Statistics: The sciences of conducting studies to collect, organize, summarize, analyze and draw conclusions from data. Data: measurements, counts, outcomes, or observations Variable: a characteristic or attribute that can assume different values [such as temperature, height, etc.] Random Variable: a variable whose value is determined by chance [such as the face value of a die.] Descriptive Statistics: used to describe what was or is actually observed Inferential Statistics: used to generalize from samples to populations, to estimate, to predict, to perform hypothesis testing, or to determine relationships; uses probability. Population: all subjects of the study. The population must be clearly defined. Sample: a subset of the population; a group from the population. A sample gives useful, reliable information about the population if it is selected correctly and if it is large enough. Probability: has to do with the chance or likelihood of an event (or a specific outcome) occurring; enables us to predict future occurrences. Qualitative Variable: non-numeric; descriptive [such as color, gender, etc.] Quantitative Variable: numerical [such as age, time, etc.] Discrete Variable: assumes values that can be counted; there are a finite or countable number of possible outcomes. [Such as number of students, number of cars, etc.] Continuous Variable: can assume infinitely many values between any two specific possible values; obtained by measuring. [Such as weight, volume, length, etc.] Continuous data values must be rounded: for example 7 inches long implies a length from 6.5 inches to 7.5 inches, not including 7.5 (which would actually round up to 8 in.) The boundaries of 7 are written (for convenience) as 6.5-7.5 inches. Levels of Measure: (1) Nominal: think naming. Nominal data classifies data into mutually exclusive (non-overlapping) exhaustive categories in which no order or ranking can be imposed; differences between data values and averages are not meaningful. [Examples: zip codes, gender, etc.] (2) Ordinal: think ordering. Ordinal data classifies data into categories that can be ordered or ranked; differences between data and averages are not meaningful. [Examples: Small, Medium, Large; A, B, C letter grades, etc.] Page 1 of 3 STA2023 Statistical Methods NOTES – Prof L. Blanchette (3) (4) Interval: think no natural zero. Interval data classifies data into categories which can be ordered or ranked, in which precise differences between data values and averages of values are meaningful, but in which there is no meaningful zero – that is, zero does not imply “none.” [Examples: temperature, the calendar year, IQ scores, etc.] Ratio: think “does twice as much” make sense? Ratio data are like interval data except the zero is meaningful. Ratios are also meaningful. [Examples: height, weight, time, age, cost, etc.] Collecting Data: some possible concerns involve time, cost, poorly worded questions, bias of the question or interviewer, order of the questions, emphasis, ethical considerations, moral considerations, affect on the subject (may need to break the object in order to get the data result), etc. Often the entire population cannot be accessed and a sample is used instead. Bias: tending towards a particular result or outcome. Unbiased sample: an unbiased sample is representative of the population; each subject in the population has an equally likely chance of being selected. Four Basic Methods for Sampling: (1) Random Sampling: Uses chance methods or random numbers generated by a calculator or computer; close your eyes and point to a random number on a table of such values, begin here and use the next few values according to the desired sample size. (2) Systematic Sampling: List all the subjects (if possible,) randomly select a place to begin, and then choose every kth subject for the sample. [Example, use every 12th subject from the list.] (3) Stratified Sampling: Divide the population into groups (strata) according to some important characteristic (like age, gender, race, etc.) and then randomly select subjects from each group. (4) Cluster Sampling: Divide the population into groups (clusters) according to some criteria like geographical or organizational factors; randomly select a number of these clusters and use all subjects from these clusters. [Example: specific elementary schools, specific districts, etc.] Convenience Sampling: when subjects are selected for the sample based on convenience; this sample is most likely biased and does not represent the population. [Example: asking only your best friends or surveying only shoppers at one mall close to your home.] Observational Study: a study in which you observe and record, summarize, analyze, and interpret (draw conclusions) based on what has or is occurring, without any input or interference on your part. Advantages include: natural setting, may be more ethical, may be less expensive. Disadvantages include: cannot directly control the study, may take more time, and may involve data collected by others which is of unknown reliability. Experimental Study: a study in which you manipulate one of the variables and see what happens. Advantages include: the researcher selects the subjects and directly manipulates the variable, the study is in a controlled environment like a lab or a test tube, and the subject may be an animal rather than a human. Disadvantages: the Page 2 of 3 STA2023 Statistical Methods NOTES – Prof L. Blanchette experiment may be costly, the manipulation may be unethical or immoral, the setting is often not a natural one and so the results in “real life” may not duplicate the results found in this artificial setting. Independent Variable: the variable being manipulated Dependent Variable: the resultant (outcome) variable; this is the variable being studied. Treatment Group: this group receives treatment (the variable that is manipulated.) Control Group: this group receives no treatment or receives a placebo (a fake treatment that they think is the real thing.) Hawthorne Effect: when the subjects change their behavior because they know they are being studied. Confounding Variable: a different variable not controlled or accounted for, which influences the outcome. Suspect Samples: suspect samples are samples that are too small and/or incorrectly selected; may be self-selected, may be made up of volunteers, may be from one particular group only, may be a convenience sample, may be too few subjects to convey meaningful data. Overgeneralization: be careful to clearly define the population, identify the sampling method used and do not over generalize the results; the results may not apply to another region, another culture, another time period, etc. Ambiguous Averages: when mean, median, mode or midranges are all referred to as averages. Misleading Graphs: graphs that imply relationships, proportions or differences that are not correct. Implied Connections: when two variables are implied to be related in a significant way by using words such as “may help,” “suggest,” “in some cases,” or “up to…” Faulty Survey Questions: some problems include leading questions, the order of the questions, and the emphasis placed on certain aspects of a question. Computers, Calculators, and Software Programs: technology helps with numerical computations, saves time, and allows us to process huge data bases. Remember to enter the data carefully, use appropriate commands or menu options, and interpret the results correctly. Do not merely record an answer derived from a calculator or computer without justifying the answer and showing proper steps towards the conclusions. Page 3 of 3