Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A BRIEF SUMMARY A population contains all the items of interest whereas a sample contains only a portion of the items in the population. A statistic is a summary measure describing a sample whereas a parameter is a summary measure describing an entire population. Descriptive statistical methods deal with the collection, presentation, summarization, and analysis of data whereas inferential statistical methods deal with decisions arising from the projection of sample information to the characteristics of a population. Categorical random variables yield categorical responses, such as yes or no answers. Numerical random variables yield numerical responses such as your height in inches. Discrete random variables produce numerical responses that arise from a counting process. Continuous random variables produce numerical responses that arise from a measuring process. An operational definition is a universally accepted meaning that is clear to all associated with an analysis. Without an operational definition, confusion can occur. A bar chart is useful for comparing categories. A pie chart is useful when examining the portion of the whole that is in each category. The bar chart for categorical data is plotted with the categories on the vertical axis and the frequencies or percentages on the horizontal axis. In addition, there is a separation between categories. The histogram is plotted with the class grouping on the horizontal axis and the frequencies or percentages on the vertical axis. This allows one to more easily determine the distribution of the data. In addition, there are no gaps between classes in the histogram. Because the categories are arranged according to frequency or importance, it allows the user to focus attention on the categories that have the greatest frequency or importance. Percentage breakdowns according to the total percentage, the row percentage, and/or the column percentage allow the interpretation of data in a two-way contingency table from several different perspectives. The first quartile is the value below which ¼ of the total ranked observations will fall, The median is the value that divides the total ranked observations into two equal halves Third quartile is the observation above which ¼ of the total ranked observations will fall. Variation is the amount of dispersion, or “spread,” in the data. The Z score measures how many sample standard deviations an observation in a data set is away from the sample mean. The range is a simple measure, but only measures the difference between the extremes. The interquartile range measures the range of the center fifty percent of the data. The standard deviation measures variation around the mean while the variance measures the squared variation around the mean, and these are the only measures that take into account each observation. The coefficient of variation measures the variation around the mean relative to the mean. The range, standard deviation, variance and coefficient of variation are all sensitive to outliers while the interquartile range is not. The empirical rule relates the mean and standard deviation to the percentage of values that will fall within a certain number of standard deviations of the mean. The Chebyshev rule applies to any type of distribution while the empirical rule applies only to data sets that are approximately bell-shaped. The empirical rule is more accurate than Chebyshev rule in approximating the concentration of data around the mean. Shape is the manner in which the data are distributed. The shape of a data set can be symmetrical or asymmetrical (skewed). Census A survey to collect data on the entire population. Data The facts and figures collected, analyzed, and summarized for presentation and interpretation. Data set All the data collected in a particular study. Descriptive statistics Tabular, graphical, and numerical summaries of data. Elements The entities on which data are collected. Observation The set of measurements obtained for a particular element. Population The set of all elements of interest in a particular study. Qualitative data Labels or names used to identify an attribute of each element. Qualitative data use either the nominal or ordinal scale of measurement and may be nonnumeric or numeric. Qualitative variable A variable with qualitative data. Quantitative Numeric values that indicate how much or how many of something. Quantitative data data are obtained using either the interval or ratio scale of measurement. Quantitative A variable with quantitative data. variable Sample A subset of the population. Sample survey A survey to collect data on a sample. Statistical inference The process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population. Statistics The art and science of collecting, analyzing, presenting, and interpreting data. Variable A characteristic of interest for the elements. Bar graph A graphical device for depicting qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution. Class midpoint The value halfway between the lower and upper class limits. Cumulative frequency distribution A tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class. Cumulative percent frequency distribution A tabular summary of quantitative data showing the percentage of data values that are less than or equal to the upper class limit of each class. Cumulative relative frequency distribution A tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class. Dot plot A graphical device that summarizes data by the number of dots above each data value on the horizontal axis. Frequency distribution A tabular summary of data showing the number (frequency) of data values in each of several non-overlapping classes. Histogram A graphical presentation of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis. Percent frequency distribution Pie chart A tabular summary of data showing the percentage of data values in each of several nonoverlapping classes. A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class. Qualitative data Labels or names used to identify categories of like items. Quantitative data Numerical values that indicate how much or how many. Relative frequency distribution A tabular summary of data showing the fraction or proportion of data values in each of several non-overlapping classes. Stem-and-leaf display An exploratory data analysis technique that simultaneously rank orders quantitative data and provides insight about the shape of the distribution. Chebyshev’s theorem A theorem that can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean. Coefficient of variation A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100. Empirical rule A rule that can be used to compute the percentage of data values that must be within one, two, and three standard deviations of the mean for data that exhibit a bellshaped distribution. Grouped data Data available in class intervals as summarized by a frequency distribution. Individual values of the original data are not available. Interquartile range (IQR) A measure of variability, defined to be the difference between the third and first quartiles. Mean A measure of central location computed by summing the data values and dividing by the number of observations. Median A measure of central location provided by the value in the middle when the data are arranged in ascending order. Mode A measure of location, defined as the value that occurs with greatest frequency. Outlier An unusually small or unusually large data value. Percentile A value such that at least p percent of the observations are less than or equal to this value and at least (100 – p) percent of the observations are greater than or equal to this value. The 50th percentile is the median. Point estimator The sample statistic, such as , s2, and s, when used to estimate the corresponding population parameter. Population parameter A numerical value used as a summary measure for a population (e.g., the population mean, Ì, the population variance, σ2, and the population standard deviation, σ). Quartiles The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data. Range A measure of variability, defined to be the largest value minus the smallest value. Sample statistic A numerical value used as a summary measure for a sample (e.g., the sample mean, , the sample variance, s2, and the sample standard deviation, s). Standard deviation A measure of variability computed by taking the positive square root of the variance. Variance A measure of variability based on the squared deviations of the data values about the mean. Weighted mean The mean obtained by assigning each observation a weight that reflects its importance. z-score A value computed by dividing the deviation about the mean (xi – ) by the standard deviation s. A z-score is referred to as a standardized value and denotes the number of standard deviations xi is from the mean. Complement of A Conditional probability Event Experiment Independent events The event consisting of all sample points that are not in A. The probability of an event given that another event already occurred. The conditional probability of A given B is P(A | B) = P(A B)/P(B). A collection of sample points. A process that generates well-defined outcomes. Two events A and B where P(A | B) = P(A) or P(B | A) = P(B); that is, the events have no influence on each other. Intersection of A and B The event containing the sample points belonging to both A and B. Joint probability The probability of two events both occurring; that is, the probability of the intersection of two events. Marginal probability The values in the margins of a joint probability table that provide the probabilities of each event separately. Mutually exclusive events Events that have no sample points in common; that is, A B is empty and P(A B) = 0. Probability Sample point Sample space Tree diagram Union of A and B Venn diagram A numerical measure of the likelihood that an event will occur. An element of the sample space. A sample point represents an experimental outcome. The set of all experimental outcomes. A graphical representation that helps in visualizing a multiple-step experiment. The event containing all sample points belonging to A or B or both. A graphical representation for showing symbolically the sample space and operations involving events in which the sample space is represented by a rectangle and events are represented as circles within the sample space.