Download 1 Dubie

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistical Analysis



State that Error bars are a graphical representation
of the variability of data.
To answer an IB question involving 1.1.1
simply state that Error bars are a graphical
representation of the variability of data.
The variability of data refers to how close or far
away most data values are from the mean. A
high standard deviation indicates a high
variability of data and a low standard
deviation indicates a low variability of data.
1.1.1
The following is an example of
error bars.
Key vocabulary list
1.
2.
3.
4.
Error bars
Variability
Data
Graphical
 Calculate the mean and standard deviation of a set of values.






(Students should specify the standard deviation (s), not the population
standard deviation. Students will not be expected to know the formulas
for calculating these statistics. They will be expected to use the standard
deviation function of a graphic display or scientific calculator.
Aim 7:Students could also be taught how to calculate standard deviation
using a spreadsheet computer program.)
The mean is the average data value.
The sample standard deviation is the average difference from the
mean for data with a sample size that is less than thirty, which is
noted as “s”.
A statistic is a characteristic or measure obtained by using the data
values from a sample, as opposed to a parameter, which is a
characteristic or measure obtained by using all the data values
from a specific population.
A set of values is something that consists of data from multiple
subjects, which is either a sample or a population.
1.1.2
The following is an example of how
to calculate the mean.
1.1.2
The following is an example for
how to calculate the sample
standard deviation formula, which
is used when the sample size is less
than thirty. The sample size is the
number of data values that are in
the data set that the standard
deviation is of.
The formula for the
sample standard
deviation is.

To find the sample standard deviation (s) and the sample
mean on your calculator all that you need to do is press the
“STAT” button on your TI calculator, then press “1:Edit”
under the “EDIT” menu, then input the data into L1 (list 1)
by pressing “ENTER” each time you type a data value, then
pressing “STAT” after all of the data values have been typed
in, then press the right arrow button to go to the “CALC”
menu, then press “1:1-Var Stats” by using the “ENTER”
button, which takes you to the home screen, and then press
“Enter” one last time. Data will appear on the home screen.
The sample mean is represented by a X with a line over it.
The sample standard deviation is represented by the symbol
Sx. Find the two values that correspond to those symbols on
the home screen and you will have found the sample
standard deviation and the sample mean.
Key vocabulary list
1.
2.
3.
4.
5.
6.
7.
8.
Mean
Standard deviation
Values
Sample mean
Sample standard deviation
Population mean
Statistic
Sample size
 State that the term standard deviation is used to summarize
the spread of the values around the mean and that 68% of the
values fall within one standard deviation of the mean.


(For normally distributed data, about 68% of all
values lie within +-1 standard deviation (s or o) of
the mean. This rises to about 95% for +-2 standard
deviations.)
To answer an IB question involving this simply
write that the term standard deviation is used
to summarize the spread of the values around
the mean and that 68% of the values fall within
one standard deviation of the mean.





1.1.3 refers to the empirical rule, which states for data that is normally
distributed that 68% of all data values in a set of data lie within 1
standard deviation of the mean, 95% of all data values in a set of data lie
within 2 standard deviations of the mean, and 99.7% of all data values in
a set of data lie within 3 standard deviations of the mean.
Data is normally distributed if the mean, median, and mode are
practically all the same and the distribution is unimodal.
The empirical rule does not apply to non-normally distributed data and
in order to figure out how many data values lie within +-1 standard
deviation of the mean, +-2 standard deviations of the mean, and +- 3
standard deviations of the mean one must utilize methods that require
calculations. Such methods will not be discussed because IBO will not ask
you to do anything that involves the use of them.
A standard normal distribution has a mean of 0 and a standard deviation
of 1.
The spread of values about the mean refers to the average numerical
amount that a set data values differ from the value of the mean.
Key vocabulary list
1.
2.
3.
4.
5.
Mean
Standard deviation
Spread
Normally distributed
Normal distribution curve (bell curve)
 Explain how the standard deviation is useful for comparing
the means of the spread of data between two or more samples.

(A small standard deviation indicates that the data
is clustered closely around the mean value.
Conversely, a large standard deviation indicates a
wider spread around the mean.)



If one sample of data has a large standard
deviation and if another sample of data has a
small standard deviation, then it is clear that
the sample with the larger standard deviation
is much more variable than the sample with the
smaller standard deviation.
The standard deviation is the average spread
about the mean.
The variance is the standard deviation to the
2nd power.
Key vocabulary list
1.
2.
3.
4.
5.
Standard deviation
Spread
Sample
Clustered
Around
 Deduce the significance of the difference between two sets of data
using calculated values for t and the appropriate values.

(For a t-test to be applied, the data must have a normal
distribution and a sample size of at least 10. The t-test
can be used to compare two sets of data and measure the
amount of overlap. Students will not be expected to
calculate values of t. Only a two-tailed, unpaired t-test
is expected. Aim 7: While students are not expected to
calculate a value for the t-test, students could be shown
how to calculate such values using a spreadsheet
program or the graphic display calculator. TOK: The
scientific community defines an objective standard by
which claims about data can be made.)




If knowledge of degrees of freedom is needed to answer an IB biology SL
or HL test question all that one needs to know is that the degrees of
freedom is represented by d.f. and that degrees of freedom equals the
sample size minus 1 when constructing a two-tailed confidence interval.
Two-tailed means that variable that the test involves is thought to be
greater than it is presumed to be or less than it is presumed to be. The
presumption is referred to as the null hypothesis, which is Ho. When a t
confidence interval is two tailed the level of significance, which is
denoted by the Greek letter “alpha”, is divided by 2. For one-tailed
confidence intervals the level of significance is unchanged.
The sample size must be greater than or equal to 10 or less than 30 and
the population standard deviation must be unknown in order for a t-test
to be used. If those conditions are not meet a z-test must be used, but IBO
will not ask you to do a z-test.
To the deduce the significance calculations involving t-values need to be
done, which involves the use of a formula. So any IB question involving
1.1.5 should just involve the use of a formula and several calculations, so
it is all numerical and not verbal.
This may need to be used on an IB test to answer a question that involves 1.1.5. The
table can be used to find t-values using the degrees of freedom and the probability that chance
alone could produce the difference, which is 1 minus the percent of confidence in decimal form.
Key vocabulary list
1. Confidence interval
2. Level of significance (alpha)
3. Degrees of freedom
4. T-distribution
5. Sample standard deviation
6. Sample mean
7. Probability
8. Population mean
9. Two-tailed
10. Unpaired(in reference to t-tests)
11. Normal distribution
 Explain that the existence of a correlation does not establish
that there is a casual relationship between two variables.

(Aim 7: While calculations of such values are not
expected, students who want to use r and r2 values
in their practical work could be shown how to
determine such values using a spreadsheet
program.)




When a mathematical correlation test is used the values of r
range form -1 to 1. A r-value of 1 implies that there is a
completely positive correlation. A r-value of -1 implies that
there is a completely negative correlation. A r-value of 0
implies that there is no correlation.
If the r-values show that there is a correlation between the
two variables an experiment needs to be performed in order
to know if there is a casual relationship between the two
variables.
A variable is a characteristic or attribute that can assume
different values.
In a question that involves 1.1.6 and example needs to be
mentioned to support the points made by the IB biology SL
or HL student. On the next slide is an excellent example that
could be used.
Africanized honey bees
“The story of Africanized honey bees (AHBs) invading the USA includes an
interesting correlation. In 1990, a honey bee swarm was found outside a small
town in southern Texas. They were identified as AHBs. These bees were brought
from Africa to Brazil in the 1950s, in the hope of breeding a bee adapted to the
South American tropical climate. But by 1990, they had spread to the southern
US. Scientists predicted that AHBs would invade all the southern states of the US,
but this hasn’t happened. Look at Figure 1.5: the bees have remained in the
southwest states (area shaded in yellow) and have not travelled to the southeastern states. The edge of the areas shaded in yellow coincides with the point at which
there is an annual rainfall of 137.5cm (55 inches) spread evenly throughout the year. This
level of year-round wetness seems to be a barrier to the movement of the bees and
they do not move into such areas.” The experiment shows that the existence of the presumed correlation did not prove that there was a
casual relationship between the two variables, which is the most important aspect of 1.1.6.
Key vocabulary list
1.
2.
3.
4.
5.
6.
7.
Correlation
Casual relationship
Completely positive correlation
Completely negative correlation
Experiment
Africanized honey bees
Variable



IBO makes reference to the use of spread sheet programs in the
topic 1 detailed syllabus. Good spread sheet programs are
Microsoft Excel and Minitab. Minitab is a statistical program that
can be downloaded online. There is a free 30 day trail for it. It is
must better than Microsoft excel. If you ever require use of
Minitab as a spreadsheet program go to
http://www.minitab.com/Downloads/
If you are unsure about how to use Minitab you can use its help
feature that is very detailed. One of the examples in this
PowerPoint presentation was created by using Minitab, which is
the SHOW confidence interval data example for 1.1.5. By using the
help feature for Minitab you should be able to do anything that
you need to do for IB Biology SL or HL that involves the use of a
spreadsheet. The help feature for Microsoft Excel can also be
utilized, but Minitab is much better software than Microsoft Excel.
Microsoft Excel free trails can be downloaded at
http://us1.trymicrosoftoffice.com/default.aspx?WT.srch=1&WT.
mc_id=78C4B07A-6906-484D-B4DD-47E2084740A6