Download slide:stat1010 - faculty.georgebrown.ca

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Data
• Quantitative data are numerical
observation
– age of students in a class. Age is
quantitative data because it quantifies the
age of a person
• Qualitative data are categorical
observation.
– Marital status; married, single, divorced,
widowed.
Continued
• Variable: assumes different value.
– Gender is a variable which includes male,
female
• grouping data
– 5< classes<15
• Use Sturges rule to determine # of
classes.
• Use a bar graph to describe qualitative
data
Frequency Distribution.
• Relative frequency: frequency of a
given data value/ total frequency.
– Relative frequency is a fraction of the total
frequency.
– 25% of the total number of students
passed the test.
• Cumulative frequency is used to show
the number of observations below or
above a certain value.
Measure of Central Tendency
• Extreme values:very high or low
compared to the rest of data
– Draw a box-plot to determine the number
of extreme values/outliers
• When there are no extreme values, use
mean for the central tendency. X- is
used to denote a sample mean
• When there are extreme values, use
median for the central tendency
Measure of variation in data
• Range is the simplest measure of
variation in data and is = Highest value
-lowest value
– Not a good measure because you are using
only two data of the data set.
• Standard deviation measures the
dispersion of data from the mean of the
data set.
– Better than range because it used all data
in the data set.
Mean of data
• When data are given without frequency
column,
• add all values of x and then divide by
the total number of data
• For repeat data, multiply each data
value by its frequency and then add
them. Divide the sum by the sum of all
frequencies.
• For grouped data, get the class mark of
Median of data
• Arrange the data in ascending order
and find the middle number.This applies
to odd number of data. Formula:
(n+1)/2 indicates the position of the
number.
• When the number of data is even, find
two middle numbers and get the
arithmetic mean
• (n/2+(n/2+1)/2 indicates the positions
Mode
• Mode is the number that shows up very
often in the data set
• For mode of repeat and grouped data,
follow the formulas given in the class.
• If there are more than two modes in a
given set of data, the sense of central
tendency is obscured.
Quartiles
• First quartile value cuts of a lower-tail
area of 25% of the distribution. If the
first quartile mark of the test #1 is 60,
25 of the students received less than
60.(Q1)
• The second quartile cuts of the lower
50 of the distribution. (Q2)
• The third quartile separates the lower
75% of the distribution from the upper
Skewness
• The distribution is skewed to the right
when the right tail is longer than the
left one. mean is > median> mode.
• The distribution is skewed to the left
when the left tail is longer than the
right one. Mean is <median<mode.
• The distribution is symmetric when the
left side is the mirror image of the right.
Empirical Rule
• For the distribution that is symmetric
and unimodal, about 68%of the data
fall within  one standard deviation of
the mean.
• 95% of the data fall within  2 standard
deviation of the mean.
• About 99.9 5 of the data fall within 3
standard deviation of the mean.