Download ap statistics summer assignment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
June 17, 2013
Dear Student:
Welcome to Advanced Placement Statistics! I commend you on your decision to enroll in this college level
class.
Since this is an Advanced Placement course, pace of instruction and expectations are modeled after
those found at the collegiate level. Consequently, to ensure adequate time to cover the required
curriculum and to prepare for the necessary exams, your first assignment for the upcoming school year is
attached. It is designed to cover the material from Chapters P (What is Statistics?) and 1 (Exploring Data).
Included in this packet are several assignments that you will need to complete. This assignment should take
the average student several hours to complete as there are some topics which you may need to research and
review in order to complete them. It is my recommendation that you do not wait until the last part of the
summer to complete this assignment, but rather work on it regularly throughout the vacation. The
assignment must be completed by and submitted on the first day of class. It will be graded based
upon completeness and accuracy of solutions. Therefore, it is expected that you will show the work needed to
get the indicated answers. This review assignment will count toward a significant portion of your first
quarter grade. Moreover, you should expect to be assessed on all the material located in the assignment
within the first few days of school.
Please feel free to email me with any questions or concerns during the week prior to the beginning of school. I
will get back to you ASAP. I look forward to working with you during the upcoming school year. I hope you
are prepared for a rewarding, yet challenging, year. Enjoy the summer!
Respectfully,
Kelly McCabe
AP Statistics Instructor
[email protected]
AP STATISTICS SUMMER ASSIGNMENT
(2013-2014)
In order to be ready to hit the ground running in September, you have some preliminary work
(Chapters P & 1 in the textbook) to complete on your own. Please take these assignments seriously, as
we will use the material presented in the first two chapters throughout the entire course. Though you
may email me with questions ([email protected]), what you turn in is to be YOUR OWN
WORK, and any indication of identical commentary or presentation will be treated as a cheating
incident. THIS ASSIGNMENT WILL BE DUE ON THE FIRST DAY OF CLASS. A TEN POINT
PENELTY WILL BE DEDUCTED FROM EACH ASSIGNMENT FOR EVERY DAY IT IS PAST
DUE! Some general helpful hints:
1) Please read the textbook carefully. This course is based on vocabulary and understanding of concepts (not
just calculations) presented in the text, so it should be read and reread to ensure maximum comprehension.
If you only skim or skip reading altogether, you will probably miss some detail needed to correctly answer
a question. Pay close attention to the examples that are worked out in each section.
2) The textbook is not the only resource you have. Search on-line for sites that help you understand the
material best. A good one to get you started is www.stattrek.com.
3) Graphs (any visual display of data) SHOULD BE DONE ON GRAPH PAPER. You can find graph
paper on-line at http://incompetech.com/graphpaper/plain/ or at
http://mathbits.com/mathbits/studentresources/graphpaper/graphpaper.htm.
4) This is as much a writing course as it is a math course. Explaining in complete thoughts (sentences) is
required on this assignment and throughout the course. Often, questions will require you to comment on what
your graph tells you (so again, write clearly and in complete sentences when applicable). A pneumonic device
that might help you to remember the 4 major areas that need to be addressed when asked to describe your data
is “SOCS” (see last written page of summer assignment for specifics). In fact, forgetting any one can be a
critical omission! Also, DON’T JUST SPOUT numbers, USE NUMBERS IN CONTEXT (what they mean to
that particular problem using appropriate units like feet or $, for example).
5) You may want to make a copy of your summer assignment (for yourself). We will be going over the
assignment during the first week of class, but you will not have access to your own work the first few days of
class.
6) You will need to have a TI-83 or TI-84 (or equivalent) the first week of class. (It will also be useful
when completing the summer assignment).
Remember, don’t start the assignment too late, or you will feel the effects
in grade form and in understanding the necessary concepts for the very
first week of AP Statistics!!! Enjoy your summer!
PART 1: VOCABULARY
Read Chapter P: What Is Statistics? and Chapter 1: Exploring Data
You need to know the following vocabulary. On a separate sheet of paper, write the meaning of each
word.
Statistics
Population
Sample
Surveys
Census
Observational study
Experiment
Data analysis
Individuals
Variables
Categorical variable
Quantitative variable
Distribution
Bar graph
Side-by-side bar graph
Dotplot
Probability
Statistical inference
Roundoff error
Pie chart
Stemplots
Stem
Leaf
Back-to-back stemplot
Splitting stems
Trimming
Histogram
Frequency
Frequency table
Overall pattern
Deviations
Shape
Center
Spread
Outlier
Mode
Unimodal
Symmetric
Skewed
Skewed right
Skewed left
Ogive
Time plot
Seasonal variation
Mean
Median
Resistant measure
Range
Pth percentile
Quartiles
First quartile
Third quartile
Five-number summary
Boxplot
Interquartile range
1.5IQR Rule for Outliers
Modified boxplot
Variance
Standard deviation
Degrees of freedom
Linear transformation
PART 2: DATA TYPES
There are two types of data: qualitative (or categorical) and quantitative.
Qualitative variables or categorical variables are variables that categorize individuals (place them in
groups). These variables may take on values that are labels for categories. Examples are eye color (blue,
hazel, etc.), gender (male or female), method of transportation to school (bike, car, bus, etc.), class rank
(senior, junior, etc.). A specific type of qualitative variable is a binary variable. A binary variable is a
qualitative variable that has only two outcomes. Examples include gender, approve or disapprove of the
president’s handling of the war in Iraq, outcome of a coin toss, outcome of a die roll (when restricted to a
four or not a four), the response to the question “Do you play basketball?”
Quantitative variables are numerical variables that represent an amount or quantity. There are two kinds
of these: discrete and continuous.
Discrete variables are quantitative variables that assume only a countable number of values. Examples of
these include shoe size (…, 6, 6 ½ , 7, 7 ½ , …), score on a test, class size, number of cans collected for a
food drive.
Continuous variables are quantitative variables that can assume an infinite number of values. In the case
of continuous variables, the values can generally assume any decimal quantity within a small range of
values (even though we may round the answer like when we measure our height). These are typically
values that result from some kind of measurement. The units of measurement are pounds/ inches/ Kelvin/
degrees/ feet/ etc. Examples are height, weight, surface area of oranges, era in baseball (3.23, 2.78, etc.),
GPA.
Just because your variable’s values are numbers, don’t assume that it’s quantitative. For example 9, 10,
11, and 12 are labels for different class rankings at BCHS. Class rank is a qualitative variable (even
though it may be answered with a 9, 10, 11, or 12). Social security number is another example of a
numerical output that is not a quantitative variable. SSN doesn’t stand for any type of numerical quantity
(you are not the 412,327,642 person born in the US!). Phone number is not a quantitative variable either.
The 901 area code is a designation for a geographic region; it is not a numerical quantity.
qualitative
quantitative
binary
more than 2
categories
discrete
continuous
Answer the following questions and then decide if the data is qualitative or quantitative.
Then decide if it is also binary, discrete, or continuous.
Question
Answer
1. In which year did you take Algebra I?
________
_______________________
2. How many CDs do you own?
________
_______________________
3. What is you zip code?
________
_______________________
4. Choose a random integer from 1 to 20.
________
_______________________
5. How many siblings do you have?
________
_______________________
6. Do you like chocolate?
________
_______________________
________
_______________________
8. What gender are you?
________
_______________________
9. How tall are you (in inches)?
________
_______________________
________
_______________________
taking this year?
________
_______________________
12. How far away from school do you live?
________
_______________________
________
_______________________
math class: A, B, C, D, or F?
________
_______________________
15. What time is it?
________
_______________________
16. How fast can you run “ the 40”?
________
_______________________
7. Who is your favorite musician
?
Type
10. Where did you eat your last meal?
(1=home, 2=restaurant, 3=other)
11. How many AP classes will you be
13. How many miles per gallon does you
vehicle get while driving in the city?
14. What grade did you earn in your last
PART 3: DATA AND LISTS
Qualitative data can be stored on the TI-83 in lists. There are several ways to create a list. From the home
screen braces can be used to define a data set, which then can be stored in one of the list names L1
through L6 (Figure 1.1). Alternately, use STAT 1: Edit to go to the list editor and enter the data into
columns (Figure 1.2)
In either case, new lists can be created from existing lists, such as L2+5 (Figure 1.3). Make sure when you
enter the new list that you are on the L3 icon and not within the list of numbers.
On the TI-83, lists may also be given their own names and will be retained in memory
until deleted. This is particularly useful for data that will be used repeatedly.
Example 1: Create a named list for the following set of running speeds in mph for various
animals:
Cheetah 70
Warthog 30
Lion 50
Cat 30
Coyote 43
Man 27.89
Hyena 40
Pig 11
Greyhound 39.35
Tortoise 0.17
Rabbit 35
Snail 0.03
Source: 1996 Information Please Almanac.
Solution:
To create a named list go to the list editor and move to the right past L6. A “Name” prompt will appear
and the list name can be typed (figure 1.6). The values can be entered in the usual way
Example 2:
Create a new list showing the speeds in feet per second.
Solution:
New lists can be created from named lists on either the home screen or in the list editor. On the home
screen, the speeds in mph can all be converted to ft/sec and stored in a list named FTSEC by a single
command (Figure 1.10). In order for the TI-83 to distinguish a user defined list name from other symbols
it is necessary to preface a list name with a special character L that is located in the OPS sub-menu. The L
character may also be found in the CATALOG.
To organize named lists in the list editor use STAT5:SetUpEditor followed by the names in the order they
are to appear (Figure 1.11). The lists will appear in columns as requested
Exercises –
17. Create a list L1 using {4, 7, 9, 11, 14, 17, 20}.
Create new lists
A. L1 – 7: ___________________________________________
B. 2* L1: ____________________________________________
C. L12: ____________________________________________
D. Ln (L1): ___________________________________________
PART 4: NUMERICAL DESCRIPTIONS OF QUANTITATIVE DATA
There are two categories of numbers that are used to describe a set of data: measures of center and
measures of spread.
Measures of Center:
1. The mean is the average number. It is the sum of all the data values divided by the number (n) of
values.
2. The median is the value that separates the bottom 50% of data from the top 50% of data. It is the
middle element of an ordered set of data that is odd in number. It is the average of the two middle
elements of an ordered set of data that is even in number.
3. The mode is the value that occurs most often in a set of data. If the data occurs with the
same frequency, then there is no mode. If two (or more) values occur the most then they are both the
mode.
Measures of Spread:
1. The range is a measure of the spread of the entire data. It is calculated by subtracting
the minimum value from the maximum value.
Ex. {4, 36,10,22, 9} = {4, 9, 10, 22, 36,} range = 36 – 4 =32
2. The interquartile range (IQR) is a measure of the spread of the middle 50% of the data. It is
calculated by subtracting the 25th percentile (Q1) from the 75th percentile (Q3). Q1 is the median of the
lower half of the data. It separates the bottom 25% of values from the top 75% of values. Q3 is the median
of the upper half of the data. It separates the top 75% of values from the bottom 25% of values. In neither
of these cases is the median considered in the top half or the bottom half of the data.
3. The standard deviation is the measure of spread around the mean. It is calculated using the following
formula, which you will be expected to be able to use on the AP exam:
This means that the average number differs from the mean by about 12.89 units. The smaller the standard
deviation the closer the data should be clustered around the mean.
To see statistical results including the quartiles and standard deviation, use STAT CALC 1:1-Var Stats
(Figure 3.1), Followed by the list name (Figure 3.2)
If you push the down arrow key, then you can see the rest of the statistics (Figure 3.4).
Figure 3.4 Exercises
Here is a list of parents’ ages at the time their sons were born
Dad:
41
27
34
27
25
34
23
34
34
31
27
35
30
26
33
28
26
32
32
32
43
35
25
27
34
33
Mom: 39
24
34
26
24
35
23
33
26
30
24
31
28
23
33
24
23
32
32
23
38
30
23
24
35
29
Enter these as two lists in your calculator and use the 1-Var Stat option to calculate the following:
18. Find the mean and median for the Dad data:
Mean____________ Median____________ Which is larger?________________
19. Find the mean and median for the Mom data:
Mean____________ Median____________ Which is larger?________________
20. Now compare the two means you calculated. Which is larger?_____________
Is this what you expected?_______. Explain why or why not.
________________________________________________________________________
________________________________________________________________________
21. Calculate the standard deviations for both sets of data: Dad________ Mom________
Why might these values be different? Explain.
________________________________________________________________________
________________________________________________________________________
22. Find Q1 and Q3 and the IQR for the Dad data. Q1_____ Q3_____ IQR________
Find Q1 and Q3 and the IQR for the Mom data. Q1_____ Q3_____ IQR________
23. A company has two machines that fills cans of soft drinks. Samples from each machine show the
following number of ounces per can:
Machine A:
11.1, 12.0, 11.4, 12.1, 11.7, 11.5, 12.2, 11.4, 11.3, 11.9
Machine B:
10.9, 12.4, 12.7, 11.8, 12.3, 11.9, 12.0, 12.5, 12.7, 11.6
Find the mean and standard deviation for both machines.
A x =_________
A s =___________
B x =_________
B s =___________
24. Using you answer to #24, explain which machine is ”better” at filling soft drink cans.
________________________________________________________________________
________________________________________________________________________
ACTIVITY 5: ASSESSING THE SHAPE OF A GRAPH
When describing a set of data we look at the following features:
Shape
Outliers
Center
Spread
OR SOCS!!!!
We have several terms that we use to describe the shape but this packet will concentrate
on only two: symmetric and clustered.
One can tell if a graph is symmetric if a vertical line in the “center” divides the graph into two fairly
congruent shapes. The following sets of data can be described as symmetric
The mean and the median are approximately the same in a symmetric set of data.
One can tell if a graph is skewed if the graph has a big clump of data on either the left (skewed right) or
the right (skewed left) with a tendency to get flatter and flatter as the values of the data increase (skewed
right) or decrease (skewed left). A common misconception is that the “skewness” occurs at the big clump.
The following sets of data can be described as skewed:
The mean is larger than the median in a skewed right set of data. The mean is always
further along the “tail”.
Skewed left:
The mean is always smaller than the median in a skewed left set of data. The mean is always further
along the “tail”.
Exercises25. For the following graphs, find the shape, center (just do the median), and spread (find only the range).
If there any other notable features evident in the graph (clusters, gaps, or outliers), then say where they
are. Otherwise do not comment on clusters, gaps or outliers.
(Note: To find the center of these graphs, use the frequencies found on the y-axis. Count how many are in
each bar. Add these up and divide by two. This tells you where the median is located. Which bar is this
value in? That's the median. For graph A, n = 21, so the middle value is 10.5. Starting with the first bar
count 1 + 2 + 4 + 3 + 6... So the median is in the bar that contains the 10.5 value (bigger than 10 anyway).
That's 30. So, the median is 30.
A. Shape_________________
Center________________
Spread________________
Clusters?______________
B. Shape_________________
Center________________
Spread________________
Clusters?______________
C. Shape_________________
Center________________
Spread________________
Clusters?______________
D. Shape_________________
Center________________
Spread________________
Clusters?______________
E. Shape_________________
Center________________
Spread________________
Clusters?______________
F. Shape_________________
Center________________
Spread________________
Clusters?______________
G. Shape_________________
Center________________
Spread________________
Clusters?______________
H. Shape_________________
Center________________
Spread________________
Clusters?______________
I. Shape_________________
Center________________
Spread________________
Clusters?______________
SUBTLE LESSONS from CHAPTER 1 --“SOCS”
1) SHAPE
a) can be symmetric, skewed left, or skewed right (or bimodal)
b) remember to remove outliers before commenting on the shape, as outliers
should not be the sole reason for a skew (for example, it’s better to say “fairly
symmetric (without the high outlier)” than “skewed right (because of high
outlier)”)
c) don’t just state skews; tell what it means in terms of your data in the context of
the variable you’re measuring
2) OUTLIERS
a) math MUST be shown even if there are no outliers (it’s the only way to judge
you ever formally checked!)(Remember: Q3+1.5IQR and Q1-1.5IQR!)
b) always use modified box plots (showing outliers) over regular boxplots
(because outliers are shown!)
3) CENTER
a) address the center of your data early and specifically in your analysis (graphs
and number summaries don’t speak for themselves!)
b) don’t just state your mean/median; tell what it says about the central
tendencies of your data (in context!)
c) when the data is skewed, don’t use the mean (or standard deviation); the
median is the better judge of central number
4) SPREAD
a) don’t just state your 5-number summary (or how they were calculated), but use
these numbers to discuss what it means about your data IN THE CONTEXT
OF THE PROBLEM ANALYSIS (for example, “an IQR of $2 shows that the
middle 50% of the data are relatively compact”)
b) include statements when there are notably different spreads for different
quartile ranges, not just the min/max, range, or IQR (for example, “ my data
is increasingly spread as the number of feet increases” is better than just “the
spread is 10 feet”)
c) Q1 and Q3 are numbers, not ranges, so make this distinction in your
discussion (for example, say “between the min and Q1”, “between Q1 and the
median”, etc.)
d) for relatively symmetric data, standard deviation can be used but always with
the mean; for relatively skewed data, it’s better to use the 5 # summary with
the median
PART 6: MORE WORK WITH EXPLORING DATA
Complete the following examples from the book on a separate sheet of paper. Again,
all work must be included to receive full credit. Solutions for the odd numbered
problems appear in the back of the textbook. You should check your answers. These
problems may take longer than you think, so please do not wait until the last minute
to begin this assignment.
Chapter 1: 1.5, 1.6, 1.7, 1.8, 1.11, 1.12, 1.13, 1.14, 1.23, 1.27, 1.31, 1.32, 1.33, 1.37, 1.38,
1.39, 1.40, 1.41, 1.45, 1.50, 1.53, 1.58, 1.63