Download STATISTICS FOR PSYCH MATH REVIEW GUIDE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
STATISTICS
FOR
PSYCH
MATH REVIEW
GUIDE
ORDER OF OPERATIONS
Although remembering the order of operations as BEDMAS may seem simple, it is definitely
worth reviewing in a new context such as statistics formulae.
Each letter in the word “BEDMAS” stands for a mathematical operation. When an expression
involves more than one mathematical operation, they must be done in the correct order to obtain
the right answer. Perform multiple operations in the following order:
B-
compute all expressions inside brackets first. ie. (3 + 4) should be simplified to 7 before
performing any other operations. Note that even though the expression (4)(3) uses
brackets, they refer in this case to multiplying and should not be treated as brackets in
BEDMAS
E-
the second operation to be performed is to simplify any exponents in the expression
ie. 53 = 125.
D/M - the next step is to complete any division or multiplication in the question. Do these in
The order (left to right) in which they appear in the question.
A/S - the last step is to complete any addition or subtraction in the question. Do these in the
order in which they appear in the question.
In Statistics, common BEDMAS errors can result in incorrect results. For example, it is
important to note that
(∑ x)
2
means add all the x values first and then square the final answer
while
∑x
2
means square all of the individual x values first and then add the squares.
COMMON SYMBOLS USED IN STATISTICS
Greek letters are commonly used in mathematics to denote special values. Some of the most
common ones and their meanings are listed below.
∑
The greek symbol for upper case “sigma” is used to indicate that values should be summed
or added.
4
∑x
i =1
i
This is “sigma notation”. It is a short form commonly used by mathematicians to show
that the indicated values should be added together. The notation shown at left means
x 1 + x 2 + x 3 + x 4.
means the sum of x1 + x2 + x3 + x4 + x5.
5
∑x
i
i
µ
The greek letter “mu” indicates the mean for an entire population of data.
σ
The greek symbol for lower case “sigma” indicates the standard deviation for an entire
population of data.
σ2
Sigma squared represents the variance for an entire population of data.
λ
“Lambda”. Used in Poisson distributions, lambda is the average rate of events. The
mean of the distribution, µ, is equal to λt.
Other symbols commonly used in statistics are:
The mean for a sample from a population of data.
s
The standard deviation for a sample from a population of data.
z
The “z score” is the number of standard deviations a specific data value is from the mean.
MEASURES OF CENTRAL TENDENCY
Measuring central tendency means that we are interested in finding the centre of the data (the
value around which most of the data are located). There are three common measures of central
tendency; the MEAN, the MEDIAN and the MODE. The calculation of each is described
below.
MEAN ( ) = Sum of all the data
The number of data
Symbolically,
==
∑x
x
n
MEDIAN
The median is the middle number of the
data when the numbers are written in order from lowest to highest. The steps to
find the median are as follows:
i) rewrite the data in order from lowest to highest
ii) If there is an odd number of data, the median is the middle number
iii) If there are and even number of data, the median is the mean of the two middle
numbers (add the two middle numbers together and divide by 2).
MODE
The mode is the number that occurs the most frequently in the list of data
(remember MOST and MODE both begin with the same two letters). If there are
two different numbers that occur with the same frequency, the data has two modes
and is referred to as “bimodal”. If none of the data occur more than once, the data
set has no mode.
FINDING THE WEIGHTED MEAN
Used when;
i) some data are more significant than others (for example, if test marks count for 30% of the
term mark while quizzes count for 20%)
or
ii)when there are multiple data with the same value (for example, a class of 25 students with a
mean of 70% can be considered the same as 25 students each with a mark of 70%).
In case i, if a student scores 76% on a test worth 30% (.3) of the term mark and 82% on a quiz
with quizzes worth 20% (.2) of the term mark, her mean mark would be calculated as follows:
weighted mean = (.3 x 76) + (.2 x 82) = 78.4
.3 + .2
Sum of the weightings
In case ii, if class A has 20 students and a mean of 75% and class B has 30 students and a mean
of 80%, then the weighted mean of the two classes is:
weighted mean = (20 x 75) + (30 x 80) = 78
50
total number of students
QUARTILES, DECILES & PERCENTILES
Quartiles, Deciles, and Percentiles are used to measure the spread of data.
QUARTILES -
Divide the data into 4 equal parts. These values are denoted by Q1, Q2,
and Q3. Quartiles are found in a similar way to finding the median (as a
matter of fact, the median is Q2). To find these values;
i) Re-write the data in order.
ii) Find the median (refer to page 3). This is Q2.
iii) Now find the median of the first half of the data. This is Q1.
iv) Find Q3 by finding the median of the upper half of the data.
DECILES -
Divide the data into 10 equal parts (D1, D2, ....D9). Find deciles in a
similar way to quartiles. The third decile (D3) is the number
3
10
of the way through the data (when written in order) and so on.
ie. In a set of 150 numbers, find the position of D3 by multiplying
150 x 3 = 45.
10
th
D3 is the 45 number when the numbers are ordered from lowest to
highest
(or arranged in a cumulative frequency table).
PERCENTILES -
Divide the data into 100 equal parts (P1, P2, ...P99). ie. P65 is the number
65
100
of the way through the data (when written in order) and so on.
NOTE: P50 = Q2 = D5 = median
MEAN DEVIATION, STANDARD DEVIATION & VARIANCE
Both mean deviation and standard deviation are used to describe the spread of data about its
central location.
MEAN DEVIATION - A deviation is a difference. In this calculation, the deviation tells us
how far each of the original numbers is from the mean of the data.
Mean deviation tells us to take the mean of the deviations for all of the
data. To calculate mean deviation:
i) find the mean of the data
ii) subtract each number of the data from the mean. If the value is
negative, change it to positive (this is the “absolute value” of that
number).
iii) add all of the values (deviations) from step ii)
iv) divide this answer by n (the number of data)
Ex. For the data 3, 4, 5, 4, 9
i) the mean is 3 + 4 + 5 + 4 + 9 = 5
5
ii) deviations from the mean are 5-3 = 2
5-4=1
5-5=0
5-4=1
5-9= -4
Change to 4
iii) sum of the deviations = 2 + 1 + 0 + 1 + 4 = 8
iv) 8 = 8 = 1.6
n 5
STANDARD DEVIATION - The standard deviation (s) is calculated by;
i) find the average of the squares of each of the data
( )
∑x 2
x
n
ii) subtract the square of the mean of the data
(n x )
2
iii) take the square root of the answer from ii)
VARIANCE -
The variance (var) is calculated by squaring the standard deviation.
CREATING GRAPHS FOR STATISTICS
FREQUENCY TABLE -
organizes data by “tallying” the number of times each number
appears in the list of data.
HORIZONTAL AXIS -
the bottom axis of a graph. This axis is also called the independent
axis because the data displayed on it do no depend on any other
variable. For example, time always appears on the horizontal axis
because time will continue to move forward without being affected
by any other variable.
VERTICAL AXIS -
the up-and-down axis of a graph. This axis is used to display the
frequency from the frequency table and is referred to as the
dependent axis. ie. If you were measuring the amount of rainfall
in each month of the year, the rainfall measurement would appear
on the vertical axis since the amount changes depending on which
month of the year is being considered.
RELATIVE FREQUENCY
The relative frequency is calculated by dividing the frequency
from the frequency table (described above) by the
total number of observations. This is often converted to a
percentage by multiplying the answer by 100.
CUMULATIVE
FREQUENCY -
Cumulative frequencies are calculated by successively adding the
previous frequencies in the table.
Ex.
INTERVAL -
Attempts at
Bar Exam
1
2
3
4
Frequency
5
3
4
2
Cumulative Frequency
5
8
12
14
When creating a frequency diagram (graph), it is useful to group
the data into intervals (ie. 10-20, 20-30, etc). To choose intervals
for a frequency diagram:
i) choose to group the data into between 5 and 20 intervals.
ii) make all the intervals the same length
iii) choose intervals so that there are no gaps between them and so
none of the data lies on an interval boundary. (ie. 9.5 - 10.5,
10.5 - 11.5, etc.)