Download File - Glorybeth Becker

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Unit 1—Top 10 List
Vocabulary: case--an individual person or thing for which values of a variable are recorded
distribution—shows the pattern of variation of a variable; show value of variable and frequency
variable—any measurable or observable characteristic of a group of people or objects
quartiles
percentiles
Types of variables: Categorical –values describe some characteristic of the population
Binary categorical—can take on only 2 values
Quantitative (measurement)—takes on a numerical value
Types of distributions:
symmetric
uniform
skewed right
skewed left
clustered
Measures of central tendency:
Mean—numerical average
Median—the middle number when the data is arranged in numerical order
Mode—the value that occurs most often (bimodal—2 values that occur most often)
Measures of variability (spread):
Range = maximum – minimum
IQR (interquartile range)=Q3 – Q1
Standard deviation-a measure of the spread from the mean
s
 x
i
 x
2
n 1
2
( Variance = s )
Note:
(x
i
 x)  0
Choosing appropriate measures of center and spread:
For symmetric data, report the mean and standard deviation
For skewed data, use the 5-number summary
In a skewed distribution the mean is further into the tail (mean, median, mode—in that order, from
the tail)
Displaying data:
Dot plot
Box and whisker plot-- full plot (does not show outliers); modified plot (shows outliers)
5 number summary: min, Q1, median, Q3, max
1.5(IQR) rule for determining outliers
Advantage: can show multiple distributions
Stem-and-Leaf plot
Stem may be multi-digit; leaf is just one digit
Include key
Back-to-back plot
Advantages: Preserves all data
Can show 2 distributions
Quick to construct for small data sets
Disadvantages: cumbersome for large data sets
Histogram
Advantage: easy to read, works well with large data sets
Disadvantage: doesn’t show all data values
Time plot
Displays change over time
Analyzing Graphs:
Center, shape, spread (variability), outliers, look for patterns
Linear Transformations: a+bx
Adding a constant “a” to all values in a data set increases the mean and median by “a”
Multiplying all values in a data set by a constant “b”, the mean, median, IQR, and standard deviation
are all multiplied by “b”
Density curve:
The density curve is an idealized description of the distribution of data. For the idealized
distribution, the mean is  and the standard deviation is  . ( x and s are the mean and standard
deviation computed from the actual data.)
Describes the overall pattern of a distribution (a relative frequency distribution)
Properties: 1) lies on or above the x-axis
2) area under the curve equals 1
The area under the curve and above the x-axis for any range of data (x) values is the proportion of all
observations that fall in that range.
The median is the point that separates the area into equal areas; the mean is the point of balance
Normal distributions: N(  ,  )
Represent one class of density curves
Described completely by its mean and standard deviation
Characteristics: symmetric, mound-shaped (bell-shaped), mean = median, inflection points are one
standard deviation on either side of the mean, area under the curve is 1 (true for any density curve)
Note: Not all bell-shaped distributions are normal ! ! !
Empirical Rule:
In a normal distribution, approximately:
68% of the data lies within 1 standard deviation of the mean: P(  -  < x <  +  )  .68
95% of the data lies within 2 standard deviations of the mean: P(  -2  < x <  +2  )  .95
99.7% of the data lies within 3 standard deviations of the mean: P(  -3  < x <  +3  )  .997
Assessing normality:
Normal probability plot: Will be (approximately) linear in a normal distribution
Ratio of IQR to the standard deviation in a normal distribution is approximately 1.3
IQR

 1.3
Standard score (z)
Applies only to normal distributions
Refers to the number of standard deviations an observation is from the mean
observed value  mean
x
z

standard deviation

Standard normal distribution
The normal distribution with a mean of 0 and standard deviation of 1
Any normal distribution can be transformed to a standard normal distribution
Standard normal distribution table:
For any z score, the table shows the proportion of observations that are less than or equal to that
score.
Percentiles: A data point is at the nth percentile if n% of the data lies below that point
Calculator:
Stat plots
1-var stats
Be able to calculate the percent of data under any part of a normal density curve
normalcdf(lower bound, upper bound)
For a standard normal curve, can use -5 and 5 for the min and max
Be able to calculate a data point that cuts off given the percent under the curve
invNorm( % to the left of the data point)