Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Introduction to Statistical Inference Glossary µd Population or long-run parameter for the average of the differences ....................................... 7-5 2×2 table A two-way table where the explanatory and response variables each have two categories .... 5-6 2SD Method Approximating a confidence interval by taking the statistic and the standard deviation of the statistic (from simulation or formula) and extending two standard deviations in each direction from the statistic. .......................................................................................................... 2-15, 2-19 3S Strategy A framework for evaluating the strength of evidence against the chance model (null hypothesis). The 3 S’s are: Statistic, Simulate, and Strength of Evidence .......................................... 1-8, 1-16 alternative hypothesis The not by chance or there is an effect explanation, it is our research conjecture. ....... 1-18, 1-26 ANOVA test Analysis of variance test, is an overall test of multiple means that explores the variation between groups compared to the variation within groups .................................................................... 9-21 association Two variables are associated or related if the distribution of the response variable differs across the values of the explanatory variable 4-3, 4-5 bar graph A graphical display of the distribution of a categorical variable .............................................. 1-10 biased A sampling method is biased if the results from different samples consistently overestimate or consistently underestimate the population parameter of interest. ............................ 3-1, 3-4, 3-13 binary variable Categorical variable with only two outcomes ................................................................ 1-17, 1-18 boxplots A graphical display of the five number summary ............................................................ 9-4, 9-10 causation Inferring that the explanatory variable is causing the effect seen in the response variable .... 6-27 cause-and-effect In well-designed studies (randomized experiments), can conclude the explanatory variable is causing the effect seen in the response variable ..................................................................... 4-3 cell contributions Contribution of cell in 2-way table to the chi-square statistic. Helpful in determining where large differences from observed data to what would be expected if the null hypothesis were true . 8-22 cells Entries in two-way tables ........................................................................................................ 5-3 census When data are gathered on all individuals in the population .................................................... 3-2 chance models A real or computerized process to generate data according to a well-understood set of conditions....................................................................................................................... 1-5, 1-16 Chi-square distribution A non-negative, right-skewed distribution used in theory-based test for an association between two categorical variables ....................................................................................................... 8-21 An Introduction to Statistical Inference Chi-square statistic Theory-based test statistic used to evaluate the strength of evidence for an association between two categorical variables with multiple categories ................................................................. 8-20 coefficient of determination: Denoted by r2 or R2. It is interpreted as the percentage of the variability in the response variable that is explained by the least-squares regression on the explanatory variable. The coefficient of determination is equal to the square of the correlation coefficient................................................................................................................. 10-23, 10-30 conditional proportion Proportion of response variable for a given category of the explanatory variable ............. 5-4, 5-7 confidence intervals An inference tool used to estimate the value of the parameter, with an associated measure of uncertainty due to the randomness in the sample data ........................................................... 2-2 confidence level A statement of reliability in the confidence interval method ................................... 2-7, 2-11, 2-33 confounding variable A confounding variable is a variable that is related to both the explanatory and response variable in such a way that its effects on the response variable cannot be separated from the explanatory variable. ........................................................................................................ 4-4, 4-6 convenience samples A non-random sample of a population..................................................................................... 3-4 correlation coefficient Statistic that measures direction and strength of a linear relationship between two quantitative variables ...................................................................................................................... 10-4, 10-9 data table Format for storing data values ................................................................................................. 3-2 distribution: The characteristics of a variable’s behavior ............................................................................. 6-2 double-blind study Both the researchers and subjects are blind to which treatment each subject receives ......... 6-19 estimation Using the sample statistic to create a confidence interval to estimate the parameter of interest 626 expected counts Number of observational units you would expect to observe in each cell of the two-way table if the null hypothesis of no association were true ..................................................................... 8-30 experiment A study in which researcher actively assign subjects to treatment groups ............................ 4-10 experimental units What observational units are called in an experiment study ........................................ 4-10, 4-15 explanatory variable The variable that, if the alternative hypothesis is true, is explaining changes in the response variable; sometimes known as the independent or predictor variable ............................... 4-3, 4-6 extrapolation Predicting values for the response variable for given values of the explanatory variable that are outside of the range of the original data ................................................................... 10-24, 10-29 F-distribution Theory-based approximation for simulated null distribution of F-statistic is non-negative and skewed right................................................................................................................. 9-20, 9-26 five-number summary minimum, lower quartile, median, upper quartile, maximum .................................................... 6-3 An Introduction to Statistical Inference follow-up analysis A second step in the analysis process that follows a significant ANOVA test. A follow-up test tells where significant differences between pairs of groups are found. This is usually presented as confidence intervals for the difference in each pair of means .................................. 9-23, 9-27 F-statistic Ratio of variation between the groups to the variation within the groups ............. 9-19, 9-25, 9-29 generalize, generalization Extension of conclusions from a sample to a population; this is only valid when the sample is representative of the population. .............................................................................. 3-1,3-7, 6-26 H0 Denotes null hypothesis ........................................................................................................ 1-33 Ha Denotes alternative hypothesis ............................................................................................. 1-33 histogram A graph used with quantitative variables ........................................................................ 3-3, 3-35 independent groups design Each individual in a group is unrelated to all the other individuals in the study. Each individual provides one response value. ...................................................................................... 4-22, 4-25 independent samples The data recorded on one sample are unrelated to those recorded on the other sample. In other words, if the data from the samples can be rearranged without affecting the outcome then the samples are independent. .............................................................................................. 7-4, 7-10 influential observations An observational is considered influential if removing it from the data set dramatically changes the correlation coefficient or regression line. Often have extreme x values. ... 10-5, 10-25, 10-28 inter-quartile range (IQR) The difference between the upper quartile and the lower quartile ........................................... 6-3 lower quartile The value for which 25% of the data lie below ........................................................................ 6-3 MAD A for testing an association between two categorical variables of more than two categories. (M)ean of the (A)bsolute values of the (D)ifferences in the conditional proportions ........ 8-6, 8-12 margin-of-error The half-width of a confidence interval ......................................................................... 2-15, 2-19 matched-pairs design Randomize the order in which each subject receives treatment, but each subject receives both treatments ............................................................................................................................... 7-3 mean squares error Denominator of the F-statistic. Measures the within group variation. It is similar to averaging the standard deviations across the groups being compared........................................................ 9-21 mean squares for treatment Numerator of the F-statistic. Measures the variation between the groups. ............................ 9-21 model A mathematical or probabilistic conceptualization meant to closely match reality, but always making assumptions about the reality which may or may not be true: ..................................... 1-5 n A symbol used to indicate the sample size ............................................................................ 1-19 no association General statement of the null hypothesis when two or more variables are involved. ............. 5-11 non-sampling errors Reasons why the statistic may not be close to the parameter that are separate from how the An Introduction to Statistical Inference sample was selected from the population.............................................................................. 3-44 null distribution Distribution of simulated statistics that represent what could have happened in the study assuming the null hypothesis was true ......................................................................... 1-20, 1-27 null hypothesis The by chance alone or no effect explanation; A hypothesis that can be modeled by simulation. .................................................................................................................................... 1-18, 1-26 observational study Studies in which researchers observe individuals and measure variables of interest, but do not intervene in order to attempt to influence responses. ................................................... 4-10, 4-15 one-proportion z-test name for the theory-based approach with one proportion ..................................................... 1-62 outliers An observation with a large residual, not necessarily influential ............................................ 10-5 paired data Data collected on paired samples consist of two sets of observations on the response variable that are recorded on the same set of observational units ........................................................ 7-4 paired design Study design that allows for the comparison of two groups on a response variable but by comparing two measurements on each observational unit instead of on completely separate groups of individuals. This serves to reduce variability in the response variable........... 4-22, 4-25 parameter A number calculated from the underlying process or population from which the sample was selected ......................................................................................................................... 3-2, 3-12 p-hat The proportion or percentage of observational units that have a particular characteristic based on a measured variable. A statistic........................................................................................ 1-19 plausible value A parameter value tested under the null hypothesis where, based on the data gathered, we do not find strong evidence against the null ................................................................................. 2-5 plausible A term used to indicate that the chance model is a reasonable/believable explanation for the data we observed.................................................................................................................... 1-9 population The entire set of observational units we want to know about ................................................... 3-1 predictor Another word for explanatory variable, often used in correlation/regression settings ............. 10-2 process A situation which we think of as a random selection from an underlying set of possible outcomes ............................................................................................................................................. 3-18 p-value The proportion of statistics in the null distribution that are at least as extreme as the value of the statistic actually observed in the study. ........................................................................ 1-21, 1-28 quantitative variable Measures on an observational unit for which arithmetic operations (e.g., adding, subtracting) make sense ............................................................................................................................ 3-2 quasi-experiments Experiments that manipulate the explanatory variable, but not randomly .............................. 4-14 r Symbol for correlation coefficient, values range from -1 to 1 and are unit-less. Values close to -1 and 1 denote a strong linear relationship while values close to 0 denote a weak or no linear An Introduction to Statistical Inference relationship ........................................................................................................................... 10-4 random digit dialing A common sampling technique when a sampling frame is unavailable. It involves a computer randomly dialing phone numbers within a certain area code by randomly selecting the digits to be dialed after the area code................................................................................................. 3-42 random sampling Using a probability device to select observational units from a population or process .... 3-7, 3-47 randomized experiment An experiment where experimental units are randomly assigned to two or more treatment conditions and the explanatory variable is actively imposed of the subjects. ................ 4-10, 4-15 relative risk The ratio of conditional proportions .................................................................................. 5-5, 5-8 representative Describes a sample with statistics similar to the parameters in the entire population. Simple random samples are representative; convenience samples may not be representative ... 3-1, 3-4 residuals The vertical distances between a point and the least squares regression line .......... 10-24, 10-26 resistant A statistic is resistant if its value does not change considerably when extreme observations are removed from a data set .............................................................................................. 3-25, 3-34 response rate Of those selected to be in the sample, the percentage that respond. .................................... 3-43 response variable The variable that, if the alternative hypothesis is true, is impacted by the explanatory variable; sometimes known as the dependent variable. ......................................................................... 4-3 sample The subgroup of the population on which we record data ......................................... 1-4, 3-1, 3-2 sample size The number of observational units in the sample .................................................................... 1-4 sampling frame A list of all of the members of the population of interest ................................................. 3-4, 3-14 sampling variability The amount that a value changes as it is observed repeatedly ............................................... 3-6 segmented bar graphs Graphical display of conditional proportion from two-way table ............................................... 5-4 significance Are the sample results unlikely to have arisen by chance alone ............................................ 6-26 significance level A value used as a criterion for deciding how small a p-value needs to be to provide convincing evidence against the null hypothesis .............................................................................. 2-6, 2-10 simple random sample When you randomly choose individuals from the sampling frame, so that each individuals has the same probability of being selected into the sample ........................................... 3-4, 3-7, 3-14 skewed distrbution The bulk of observations tend to fall on one side of the distribution ....................................... 3-24 slope Change in predicted response variable divided by change in explanatory variable .. 10-22, 10-28 SSE Sum squared error. It is the sum of all the squared residuals. ............................................. 10-27 An Introduction to Statistical Inference standard deviation of p-hat The standard deviation of the distribution of sample proportions can be shown mathematically to follow the formula: (1 ) / n . .......................................................................................... 1-55 standard error Estimate for the standard deviation of the null distribution of the statistic .............. 2-21, 2-265-37 standardize To standardize an observation, compute the distance of the observation from the mean and divide by the standard deviation of the distribution. ...................................................... 1-35, 1-39 statistic A number calculated from the observed data which summarizes information about the variable or variables of interest ...................................................................................................... 1-4, 3-2 statistical significance Results unlikely to have arisen by chance alone ..................................................................... 1-2 statistically significant Unlikely to occur just by random chance ................................................................................. 1-3 strength of evidence How much evidence we have against the null hypothesis ....................................................... 1-2 subjects Study participants that are human ......................................................................................... 1-25 test of significance A procedure for measuring the strength of evidence against a null hypothesis about the parameter of interest ............................................................................................................. 1-17 transform Express data on a different scale, such as logarithmic, often used to meet validity conditions 1046 t-standardized statistic A standardized statistic (used with means) that follows a theoretical t distribution when the null hypothesis is true. ................................................................................................................. 6-35 two by two (2 × 2) A two-way table where the explanatory and response variables each have two categories .... 5-4 two-sided test Estimates the p-value by considering results that are at least as extreme as our observed result in either direction.......................................................................................................... 1-46, 1-52 two-way table A tabular summary of two categorical variables, also called a contingency table .................... 5-3 Type I Error Rejecting the null hypothesis when it is actually true (false alarm) ............................... 2-36, 2-39 Type II Error Failing to reject a null hypothesis that is actually false (missed opportunity) ................. 2-36, 2-39 unbiased A sampling method that, on average across many random samples, produces statistics whose average is the value of the population parameter ............................................................. 3-6, 3-7 upper quartile The value for which 25% of the data lie above ........................................................................ 6-3 validity condition Check to see that certain conditions are met the render the theory-based approach valid. Often these conditions deal with sample size and shape and variability of distributions. ................. 1-55 validity condition for one-sample z-procedures At least 10 successes and at least 10 failures ....................................................................... 2-16 An Introduction to Statistical Inference validity conditions for chi-square test Each cell of the two=way table must have at least 10 observations ............................. 8-22, 8-26 validity conditions for one-sample t-test A sample size of at least 20 or the distribution of the quantitative variable is not highly skewed .. 3-31, 3-39 validity conditions for Paired t test Sample size of pairs is at least 20 OR the differences follow a normal distribution ................ 7-27 validity conditions for two-sample z-procedures At least 10 observations in each of the cells of the 2x2 table................................................. 5-35 xbar The sample average of a quantitative variable ........................................................................ 3-5 xbard: Observed sample average of the differences .......................................................................... 7-5 z-statistic z-statistic is synonymous with standardized sample proportion, also called the standardized statistic. ................................................................................................................................. 1-55 (pi) Greek symbol used for the unknown underlying process probability or true population proportion. A parameter. ....................................................................................................... 1-19