Download Distribution of Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Distribution of Data
The following terms are commonly used to describe the distribution of data.
Describing Data:
The left and right side look the same, so the data is
symmetric. There is a cluster from 31-39. There are no
gaps or outliers, and the peak is at 35.
The distribution of a set of data shows the arrangement of the values.
It is described by its center, spread (a description of the variation of
the values within the set), and overall shape. The terms symmetric, gaps, clusters,
and outlier are used to describe the shape of the data.
Mean
deviation or
IQ Range
Mean,
median, or
mode
The data on the right is not symmetric.
It has gaps at 9 and 13. There is a cluster from 10-12.
There are no outliers.
Is this data symmetric?
Are there outliers?
The data is kind of symmetric…..
There are no outliers.
Where are the gaps?
The gap is from 19-21.
Where is the peak?
The peak is at 22.
A box plot does not allow for the detail of
any gaps. We cannot determine the peak.
We can see there are no outliers as the
whiskers are about the same size.
The data is symmetrical.
We can see the data is not symmetrical.
There are no gaps.
The cluster of the data is from :01 to 7:30.
The peak is the :01 – 2:30 time interval.
What is being measured?
What units are used?
The data is not symmetric.
There is an outlier.
median
What would be the preferred measure of center?
The data is not symmetric. This, along with the
presence of the outlier would make the interquartile
range the best way to describe the spread of the data.
What are the measures of variation??????? There are 19 values.
12
Median _________
11
Lower Quartile ____________
13
Upper Quartile _______________
IQ Range _____
2
10 10 10 11 11 11 11 11 12 12 12 12 12 12 13 13 13 14 19
How many people
responded to this
survey?
Clusters: 0-49 minutes
Peak: 20-29
Gap: 50-89
Outlier: 90-99
The data is NOT symmetrical. The outlier is 20. Notice the way the outlier is indicated on
this box plot. You would use the median and IQ Range as measures to represent this data.
a) The data is symmetrical. There are no gaps or outliers. There is a peak at 4. This
distribution would indicate that the mean and absolute mean deviation would
be the best measures to represent this data.
We will find both measures.
The mean and absolute mean deviation would be the
appropriate measures to describe the
center and spread of the distribution.
Mean =
π’”π’–π’Ž
πŸπŸ•
Mean =
πŸ”πŸ–
πŸπŸ•
Mean = 4
1 + 4 + 9 + 20 + 15 + 12 + 7
4
3+4+3+0+3+4+3
Absolute mean deviation =
Absolute mean deviation =
3+4+3+0+3+4+3
πŸπŸ•
20
πŸπŸ•
Absolute mean deviation = 1.2
What have we done
β€’ We have described the shape of data.
β€’ Based on the shape of the data, we have chosen the appropriate
measures of center and spread.