Download Statistics and Data Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistics and Data Analysis
(1B, 1D, 1E, 1F, 2C, 3A, 3D, 4C, 5C, 6A, 6C, 7D, 8D, 9A)
Data: information, especially numerical information that is organized.
Discrete data is exact data that can be counted. (Ex. Acres,
bushels, livestock) Scatter plots, bar graphs and circle
graphs show this type of data well.
Continuous data is measurements that are taken over a
period of time. (Ex. Temperatures, gallons of fuel
consumed, weight gain in livestock) Line graphs show this
type of data well.
Statistics: numbers used to describe a set of data. Three different statistics are used
to measure the central tendency of a distribution. A measure that
summarizes a whole set of measurements by representing the approximate
center of a distribution is called a measure of central tendency.
The mean of a distribution is the arithmetic average of the scores.
The mean is the measure found by adding together all of
the scores of a distribution and dividing by the total
number of scores.
The median of a distribution is the middle number after the scores are arranged
in numerical order.
Scores may be arranged in either ascending or descending
order. If there is an even number of scores, the median is
determined by the average of the two middle scores.
The mode of a distribution is the score that occurs most frequently.
It is possible that there may not be a mode. It is also
possible to have more than one mode.
Correlation coefficient is statistic that is used to determine how closely the points in a
scatter plot cluster about a line. It is used to predict future outcomes.
The correlation coefficient measures the strength of the
linear relationship between two variables. A high
correlation does not necessarily imply a cause-and-effect
relationship.
Measures of dispersion: values that indicate how a distribution is scattered.
The range is the difference between the largest and smallest numbers in a set.
The deviation is the difference between a single score and the mean of the set of
scores in a distribution.
The variance is the mean of the squared deviations in a distribution.
The further it is from the mean, the more pronounced its effect on the size
of the variance.
Variance gives a better measure of scatter than does the range because
variance takes into account how much each score deviates from the
mean. If most scores cluster near the mean, the variance will be small.
But if many scores deviate widely from the mean, the variance will be
large.
The standard deviation is the square root of the variance: that is, the square root
of the mean of the sum of the squared deviations in a distribution.
Standard deviation is much more commonly used than variance as a measure of
scatter. This is because it is expressed in the same unit as the scores, where the
variance is expressed in square units.
In a normal distribution, approximately 68% (about 2/3rds) of all the
measurements lie within one standard deviation on either side of the mean,
approximately 95% of all measurements lie with two standard deviations on
either side of the mean, and approximately 99% of all the measurements lie
within three standard deviations of the mean.
Standard deviation gives a good indication of how widely the scores in a
distribution are scattered.
A standard deviation of 0 means every one has the same score.
A large standard deviation means the scores are spread out.
A small standard deviation indicates a much narrower
distribution of the scores.
The normal distribution is a symmetrical frequency distribution in which most scores or
measures occur in the intervals closest to the center of the distribution and fewer scores
are found in the intervals farther from the center. In a true normal distribution, the data is
continuous – the highest frequency occurs in the center of the distribution and the lowest
frequencies are found at the extremes. When such a distribution is graphed, the result is
a “bell-shaped curve” called the normal curve. (Normal curve serves as a mathematical
model.)