Download Topics 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Thank you:
William Lim
Rachel Velasco
Terry Woo
INTRODUCTION
Individuals and Variables
Variables
Individuals are the
A categorical variable places
objects described by a
set of data. Individuals
may be people, but they
may also be animals or
things.
A variable is any
characteristic of an
individual. A variable can
take different values for
different individuals.
an individual into one of
several groups or categories.
A quantitative variable takes
numerical values for which
arithmetic operations such as
adding and averaging make
sense.
The distribution of a variable
tells us what values the
variable takes and how often
it takes these values.
1.1
BAR GRAPHS
Compares the
sizes of the
groups or
categories
More flexible
Sizes can be
measured as
frequencies or
percents
Compares what part of the whole the
PIE CHARTS
group is
Must include all categories that make
up the whole
Sizes can be measured as frequencies
or percents
DOTPLOTS
Compares the range of
the data and its
variables
Useful with regard to
categorical or
qualitative variables
1. Draw a horizontal line and
label it with the variable.
2. Mark a dot above the
number on the horizontal
axis corresponding to
each data value.
STEMPLOTS
1. Separate each value into a “stem” made up of all but
the rightmost digit and a “leaf”, the final digit.
2. Write the “stems” vertically in increasing order from
top to bottom and draw a vertical line to the right of
the “stems”. Write each “leaf” to the right of its “stem”.
3. Rearrange the “leaves” in increasing order out from the
stem.
TIPS FOR CONSTRUCTING STEMPLOTS
 Be sure each stem is assigned an
equal number of possible leaf digits
 Too few stems—skyscraper-shaped
plot
 Too many stems—flat “pancake”
graph
 Five stems is a good minimum
 Get flexible by rounding data so
the final digit after rounding is
suitable as a leaf
 Do this when the data have too many
digits
Useful for comparing quantitative
distributions.
HISTOGRAMS
 Most common graph of the
distribution of one
quantitative variable
 Five classes is a good
minimum
 Avoid skyscraper/pancake
 Choose classes all the
same width to compare
area
OGIVE (CUMULATIVE RELATIVE FREQUENCY
PLOT)
 Use instead of a
histogram to show
relative standing of an
individual observation
 Percentile: the pth
percentile of a
distribution is the value
such that p percent of
the observations fall at
or below it
TIME PLOTS
 Plots each observation
against the time at which it
was measured
 Mark the time scale on the
horizontal axis and the
variable of interest on the
vertical axis
 Connecting the points by lines
helps show the patterns of
changes over time if there
aren’t too many points
• Trend: a long-term upward or downward
movement over time
• Seasonal variation: a pattern that repeats itself
at regular time intervals
DISTRIBUTION
 Shape
 Symmetric: right and left sides are approximately mirror images of
each other
 Skewed right: right side of the distributions extends much farther out
than the left
 Skewed left: left side of the distribution extend much farther out than
the right
 Clusters or several distinct peaks or gaps
 Uniform (same response for any value of x)
 Center: separates the values roughly in half
 Median and mean
 Spread: scope of the values from smallest to largest
 Outliers: extreme values
1.2
MEASURING CENTER
Mean: add the values up and divide by
the number of observations
𝑥 =
1
𝑛
𝑥𝑖
Median: midpoint of a distribution
Resistant
MEASURING SPREAD
Range: difference between
the largest and smallest
values
Interquartile Range (IQR)
IQR = Q 3 − Q1
Q1: 25th percentile
Q2: 50th percentile
(median)
Q3: 75th percentile
Resistant
Variance: averaging the
squared differences of all the
values from the mean
𝑠 2
=
(𝑥−𝑥)2
𝑛−1
Degrees of freedom: n - 1
Standard deviation: square
root of the variance
𝑠 =
(𝑥−𝑥)2
𝑛−1
MEASURING SPREAD
Variance
Large if the observations
are widely spread about
their mean
Small if the observations
are all close to the mean
Sum of the deviations of
the observations will
always be zero
Standard Deviation
s = 0 when there is no
spread
s > 0: as spread
increases, s gets larger
Nonresistant
Strongly influenced
by outliers or
skewness
OUTLIERS
Q1−(1.5 × IQR)
Q3+(1.5 × IQR)
FIVE-NUMBER SUMMARY
Minimum
Q1
Median
Q3
Maximum
BOXPLOTS
Graph of a five
number summary
Modified boxplot:
outliers are plotted
individually (the lines
show the largest
data points that
aren’t outliers- they
don’t show “the
fence”)
LINEAR TRANSFORMATIONS
When you add constant a to all the
values, the mean and median increase
by a
When you multiply by constant b, the
mean, median, IQR, and standard
deviation are multiplied by b
•
•
•
•
•
Normal Curves/Distributions- A type of density curve that is
symmetric, single-peaked and bell-shaped
Changing mu moves the curve from side to side along the horizontal
axis
Changing sigma changes the spread of the curve
Normal distributions follow the 68-95-99.7 rule
• 68% of observations fall within 1 standard deviation of the
mean
• 95% of observations fall within 2 standard deviations of mean
• 99.7% of observations fall within 3 standard deviations of the
mean
Abbreviate normal distributions with the notation N(mean, standard
deviation)
2.2 STANDARD NORMAL CALCULATIONS
•
To standardize a normal distribution:
•
The standardized value is called a z-score
• Z-score tells us how many standard deviations the
observation is from the mean
Standardizing a variable that is normally distributed produces
a new variable that has a "standard normal distribution"
• This "standard normal distribution" is described N(0,1) with
mean 0 and standard deviation 1
•