Download Quantitative data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
A BRIEF SUMMARY

A population contains all the items of interest whereas a sample contains only
a portion of the items in the population.

A statistic is a summary measure describing a sample whereas a parameter is
a summary measure describing an entire population.

Descriptive statistical methods deal with the collection, presentation,
summarization, and analysis of data whereas inferential statistical methods
deal with decisions arising from the projection of sample information to the
characteristics of a population.

Categorical random variables yield categorical responses, such as yes or no
answers.

Numerical random variables yield numerical responses such as your height in
inches.

Discrete random variables produce numerical responses that arise from a
counting process.

Continuous random variables produce numerical responses that arise from a
measuring process.

An operational definition is a universally accepted meaning that is clear to all
associated with an analysis. Without an operational definition, confusion can
occur.

A bar chart is useful for comparing categories.

A pie chart is useful when examining the portion of the whole that is in each
category.

The bar chart for categorical data is plotted with the categories on the vertical
axis and the frequencies or percentages on the horizontal axis. In addition,
there is a separation between categories.

The histogram is plotted with the class grouping on the horizontal axis and the
frequencies or percentages on the vertical axis. This allows one to more easily
determine the distribution of the data. In addition, there are no gaps between
classes in the histogram.

Because the categories are arranged according to frequency or importance, it
allows the user to focus attention on the categories that have the greatest
frequency or importance.

Percentage breakdowns according to the total percentage, the row
percentage, and/or the column percentage allow the interpretation of data in a
two-way contingency table from several different perspectives.


The first quartile is the value below which ¼ of the total ranked observations will fall,
The median is the value that divides the total ranked observations into two equal halves

Third quartile is the observation above which ¼ of the total ranked observations will fall.

Variation is the amount of dispersion, or “spread,” in the data.

The Z score measures how many sample standard deviations an observation in a data
set is away from the sample mean.

The range is a simple measure, but only measures the difference between the extremes.

The interquartile range measures the range of the center fifty percent of the data.

The standard deviation measures variation around the mean while the variance measures
the squared variation around the mean, and these are the only measures that take into
account each observation.

The coefficient of variation measures the variation around the mean relative to the mean.
The range, standard deviation, variance and coefficient of variation are all sensitive to
outliers while the interquartile range is not.

The empirical rule relates the mean and standard deviation to the percentage of values
that will fall within a certain number of standard deviations of the mean.

The Chebyshev rule applies to any type of distribution while the empirical rule applies
only to data sets that are approximately bell-shaped.

The empirical rule is more accurate than Chebyshev rule in approximating the
concentration of data around the mean.

Shape is the manner in which the data are distributed. The shape of a data set can be
symmetrical or asymmetrical (skewed).
Census
A survey to collect data on the entire population.
Data
The facts and figures collected, analyzed, and summarized for presentation and
interpretation.
Data set
All the data collected in a particular study.
Descriptive
statistics
Tabular, graphical, and numerical summaries of data.
Elements
The entities on which data are collected.
Observation The set of measurements obtained for a particular element.
Population
The set of all elements of interest in a particular study.
Qualitative
data
Labels or names used to identify an attribute of each element. Qualitative data use
either the nominal or ordinal scale of measurement and may be nonnumeric or
numeric.
Qualitative
variable
A variable with qualitative data.
Quantitative Numeric values that indicate how much or how many of something. Quantitative data
data
are obtained using either the interval or ratio scale of measurement.
Quantitative
A variable with quantitative data.
variable
Sample
A subset of the population.
Sample survey A survey to collect data on a sample.
Statistical
inference
The process of using data obtained from a sample to make estimates or test hypotheses about
the characteristics of a population.
Statistics
The art and science of collecting, analyzing, presenting, and interpreting data.
Variable
A characteristic of interest for the elements.
Bar graph
A graphical device for depicting qualitative data that have been summarized in a frequency,
relative frequency, or percent frequency distribution.
Class midpoint The value halfway between the lower and upper class limits.
Cumulative
frequency
distribution
A tabular summary of quantitative data showing the number of data values that are less than or
equal to the upper class limit of each class.
Cumulative
percent
frequency
distribution
A tabular summary of quantitative data showing the percentage of data values that are less than
or equal to the upper class limit of each class.
Cumulative
relative
frequency
distribution
A tabular summary of quantitative data showing the fraction or proportion of data values that are
less than or equal to the upper class limit of each class.
Dot plot
A graphical device that summarizes data by the number of dots above each data value on the
horizontal axis.
Frequency
distribution
A tabular summary of data showing the number (frequency) of data values in each of
several non-overlapping classes.
Histogram
A graphical presentation of a frequency distribution, relative frequency distribution, or percent
frequency distribution of quantitative data constructed by placing the class intervals on the
horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical
axis.
Percent
frequency
distribution
Pie chart
A tabular summary of data showing the percentage of data values in each of several nonoverlapping classes.
A graphical device for presenting data summaries based on subdivision of a circle into sectors
that correspond to the relative frequency for each class.
Qualitative data Labels or names used to identify categories of like items.
Quantitative
data
Numerical values that indicate how much or how many.
Relative
frequency
distribution
A tabular summary of data showing the fraction or proportion of data values in each of several
non-overlapping classes.
Stem-and-leaf
display
An exploratory data analysis technique that simultaneously rank orders quantitative data and
provides insight about the shape of the distribution.
Chebyshev’s theorem
A theorem that can be used to make statements about the proportion of data
values that must be within a specified number of standard deviations of the
mean.
Coefficient of variation
A measure of relative variability computed by dividing the standard deviation by the
mean and multiplying by 100.
Empirical rule
A rule that can be used to compute the percentage of data values that must be within
one, two, and three standard deviations of the mean for data that exhibit a bellshaped distribution.
Grouped data
Data available in class intervals as summarized by a frequency distribution. Individual
values of the original data are not available.
Interquartile range (IQR)
A measure of variability, defined to be the difference between the third and first
quartiles.
Mean
A measure of central location computed by summing the data values and dividing by
the number of observations.
Median
A measure of central location provided by the value in the middle when the data are
arranged in ascending order.
Mode
A measure of location, defined as the value that occurs with greatest frequency.
Outlier
An unusually small or unusually large data value.
Percentile
A value such that at least p percent of the observations are less than or equal to this
value and at least (100 – p) percent of the observations are greater than or equal to
this value. The 50th percentile is the median.
Point
estimator
The sample statistic, such as , s2, and s, when used to estimate the corresponding
population parameter.
Population
parameter
A numerical value used as a summary measure for a population (e.g., the population mean, Ì, the
population variance, σ2, and the population standard deviation, σ).
Quartiles
The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile
(median), and third quartile, respectively. The quartiles can be used to divide a data set into four
parts, with each part containing approximately 25% of the data.
Range
A measure of variability, defined to be the largest value minus the smallest value.
Sample
statistic
A numerical value used as a summary measure for a sample (e.g., the sample mean, , the
sample variance, s2, and the sample standard deviation, s).
Standard
deviation
A measure of variability computed by taking the positive square root of the variance.
Variance
A measure of variability based on the squared deviations of the data values about the mean.
Weighted
mean
The mean obtained by assigning each observation a weight that reflects its importance.
z-score
A value computed by dividing the deviation about the mean (xi – ) by the standard deviation s.
A z-score is referred to as a standardized value and denotes the number of standard deviations xi
is from the mean.
Complement
of A
Conditional
probability
Event
Experiment
Independent
events
The event consisting of all sample points that are not in A.
The probability of an event given that another event already occurred. The conditional
probability of A given B is P(A | B) = P(A B)/P(B).
A collection of sample points.
A process that generates well-defined outcomes.
Two events A and B where P(A | B) = P(A) or P(B | A) = P(B); that is, the events have no
influence on each other.
Intersection of
A and B
The event containing the sample points belonging to both A and B.
Joint
probability
The probability of two events both occurring; that is, the probability of the intersection of two
events.
Marginal
probability
The values in the margins of a joint probability table that provide the probabilities of
each event separately.
Mutually
exclusive events Events that have no sample points in common; that is, A B is empty and P(A B) = 0.
Probability
Sample point
Sample space
Tree diagram
Union of A and B
Venn diagram
A numerical measure of the likelihood that an event will occur.
An element of the sample space. A sample point represents an experimental outcome.
The set of all experimental outcomes.
A graphical representation that helps in visualizing a multiple-step experiment.
The event containing all sample points belonging to A or B or both.
A graphical representation for showing symbolically the sample space and operations involving
events in which the sample space is represented by a rectangle and events are represented as
circles within the sample space.