Download slide:stat1010 - faculty.georgebrown.ca

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Data mining wikipedia, lookup

Transcript
Data
• Quantitative data are numerical
observation
– age of students in a class. Age is
quantitative data because it quantifies the
age of a person
• Qualitative data are categorical
observation.
– Marital status; married, single, divorced,
widowed.
Continued
• Variable: assumes different value.
– Gender is a variable which includes male,
female
• grouping data
– 5< classes<15
• Use Sturges rule to determine # of
classes.
• Use a bar graph to describe qualitative
data
Frequency Distribution.
• Relative frequency: frequency of a
given data value/ total frequency.
– Relative frequency is a fraction of the total
frequency.
– 25% of the total number of students
passed the test.
• Cumulative frequency is used to show
the number of observations below or
above a certain value.
Measure of Central Tendency
• Extreme values:very high or low
compared to the rest of data
– Draw a box-plot to determine the number
of extreme values/outliers
• When there are no extreme values, use
mean for the central tendency. X- is
used to denote a sample mean
• When there are extreme values, use
median for the central tendency
Measure of variation in data
• Range is the simplest measure of
variation in data and is = Highest value
-lowest value
– Not a good measure because you are using
only two data of the data set.
• Standard deviation measures the
dispersion of data from the mean of the
data set.
– Better than range because it used all data
in the data set.
Mean of data
• When data are given without frequency
column,
• add all values of x and then divide by
the total number of data
• For repeat data, multiply each data
value by its frequency and then add
them. Divide the sum by the sum of all
frequencies.
• For grouped data, get the class mark of
Median of data
• Arrange the data in ascending order
and find the middle number.This applies
to odd number of data. Formula:
(n+1)/2 indicates the position of the
number.
• When the number of data is even, find
two middle numbers and get the
arithmetic mean
• (n/2+(n/2+1)/2 indicates the positions
Mode
• Mode is the number that shows up very
often in the data set
• For mode of repeat and grouped data,
follow the formulas given in the class.
• If there are more than two modes in a
given set of data, the sense of central
tendency is obscured.
Quartiles
• First quartile value cuts of a lower-tail
area of 25% of the distribution. If the
first quartile mark of the test #1 is 60,
25 of the students received less than
60.(Q1)
• The second quartile cuts of the lower
50 of the distribution. (Q2)
• The third quartile separates the lower
75% of the distribution from the upper
Skewness
• The distribution is skewed to the right
when the right tail is longer than the
left one. mean is > median> mode.
• The distribution is skewed to the left
when the left tail is longer than the
right one. Mean is <median<mode.
• The distribution is symmetric when the
left side is the mirror image of the right.
Empirical Rule
• For the distribution that is symmetric
and unimodal, about 68%of the data
fall within  one standard deviation of
the mean.
• 95% of the data fall within  2 standard
deviation of the mean.
• About 99.9 5 of the data fall within 3
standard deviation of the mean.