Download S1 - NLCS Maths Department

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
FOR ALL CALCULATIONS WRITE OUT FULL CALCULATOR DISPLAY AS YOUR ANSWER. DON’T ROUND.
Graphical Representation
Averages & Measures of Dispersion:
f
HISTOGRAMS:
Area Bar

X 
MEAN:
Frequency
FrequencyDensity 
Frequency
ClassWidth
 X or  fx
n
f
 x  x or  fx

n
f
2

STD DEV:
x
2
ADDITION LAW: P( A  B)  P( A)  P( B)  P( A  B)
P( A  B)
MULTIPLICATION LAW: P( A B) 
P( B)
MUTUALLY EXCLUSIVE:
2
 x2
NO INTERSECTION
P( A  B)  P( A)  P( B)
P( A  B)  0
MEDIAN & QUARTILES for raw data:
n 1
th value ALWAYS.
Median =
2
Used for CONTINUOUS DATA only.
BOX & WHISKER DIAGRAMS:
Lower Quartile = median of the values BELOW the median.
Upper Quartile = median of the values ABOVE the median.
f
Highest Value
Lowest Value
Median
LQ
MEDIAN by INTERPOLATION: Median  b 
UQ
10
20
30
40
50
60
%
CODING: used to simplify the arithmetic of finding the mean / standard
deviation of a Grouped Frequency Distribution:
-VE SKEW.
Q2  Q1  Q3  Q2.
mean<median<mode.
+VE SKEW.
Q2  Q1  Q3  Q2.
mode<median<mean.
SYMMETRICAL
Q2  Q1  Q3  Q2.
c 1
( n  CF )
fc 2
where b = lower class boundary; c = class width; f c = class frequency; CF =
cumulative frequency up to ‘median class’
x
Advantages:
Highlights trends.
Illustrates skewness.
Highlights average
Highlights outliers.
Disadvantages:
Time consuming to draw.
Does not retain original data.
y
classmidpo int  midpo int  of  middle  class
(uniform)classwidth
To decode - Mean: reverse code
Probability:

multiply by class width,
add on midpoint of middle class.
- Standard Deviation: ONLY multiply by class width.
PROOF OF INDEPENDENCE: P( A B)  P( A)
Random Variables:
E( X )   xp( x) E( X 2 )   x2 p( x) VAR( X )  E ( X 2 )  [ E ( X )]2
THE DISCRETE UNIFORM DISTRIBUTION:
X
1
STEM & LEAF DIAGRAMS:
Advantages:
Retains original data.
Highlights trends
(resembles a Bar Chart).
Illustrates skewness.
Disadvantages:
Time consuming to draw.
Does not highlight any
averages.
EQN of REGRESSION LINE:
s
y  bx  a where b  xy , a  y  bx and…
sxx
x
y
  s   x 2  ( x ) 2 s   y 2  ( y ) 2
sxy   xy 
xx
yy
n
n
n
…
n
E( X ) 
p(x)
n 1
2
PMCC:
sxy
r
sxx s yy
where -1≤r≤1
Perfect negative correlation
Very Strong negative correlation
Strong negative correlation
Moderate negative correlation
Weak negative correlation
No correlation
Weak positive correlation
Moderate positive correlation
Strong positive correlation
Very Strong positive correlation
Perfect positive correlation
1
(n  1)( n  1)
12
SCALING RANDOM VARIABLES:
Normally used when a Discrete Uniform Distribution has been ‘scaled’ ie.
x=1,2,3,…,n changed to y=3,5,7,…2n+1.
Var(aX  b)  a 2Var( X )
E (aX  bY )  aE ( X )  bE (Y ) Var (aX  bY )  a 2Var ( X )  b 2Var (Y )
E (aX  b)  aE ( X )  b
The Normal Distribution:
STANDARDISING:
X 
2
Z 
where Z ~ N (  , ) NB. Mean & Variance are parameters.

Useful Regions
1 - Ф(z)
TABLES:
Ф(z)
REMEMBER WHEN ASKED YOU MUST ALWAYS INTERPRET THE
GRADIENT IN CONTEXT.
-1
-1 to -0.9
-0.9 to -0.7
-0.7 to -0.4
-0.4 to -0.2
-0.2 to 0.2
0.2 to 0.4
0.4 to 0.7
0.7 to 0.9
0.9 to 1
1
Var ( X ) 
X values (outcomes) MUST begin at 1 and be consecutive
Regression & Correlation:
mode=median=mean.
2
Ф(z1) + Ф(z2) -1
1 - Ф(z)
Ф(z1) – Ф(z2)
Values in tables are areas to the
LEFT of z values (LESS THAN)
THINK about the sign of Z –
negative if left of centre.
WORKING BACKWARDS:
Remember to ADD any simultaneous equations so that μ’s cancel.
Lower tail z values cannot be looked up directly – 1minus and
remember to make z negative.