Download RESEARCH & DATA ANALYSIS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
RESEARCH &
DATA ANALYSIS
SCIENTISTS COLLECT
STATISTICAL DATA
FROM EXPERIMENTS
 STATISTICAL OR
NUMERICAL DATA ALLOWS
FOR MORE ACCURATE
ANALYSIS & EVALUATION OF
THE RESULTS FROM
EXPERIMENTS
STATISTICS DEAL WITH
COLLECTING, ANALYZING,
AND INTERPRETING
INFORMATION OR RESULTS
TYPES OF DATA:
QUANTITATIVE DATA –
AMOUNTS, MEASUREMENTS
OR NUMERICAL DATA
QUALITATIVE DATA – NON-NUMERICAL
IN NATURE (CHARACTERISTICS –
COLOR, SHAPE, ETC.)
TYPES OF DATA COLLECTED:
• POPULATION: 100% OF DATA ARE
COLLECTED (CAN BE EXACT)
(GREEK LETTERS USED TO
ABBREVIATE QUANTITIES)
• SAMPLE: A SMALLER ESTIMATE OR
REPRESENTATION IS COLLECTED
(ENGLISH LETTERS STAND FOR
QUANTITIES SURVEYED)
EXPERIMENTS ARE
CONDUCTED USING
MULTIPLE REPLICATIONS
• REPLICATION
INSURES MORE
RELIABLE /
ACCURATE
RESULTS
DATA IS COLLECTED FROM
MULTIPLE EXPERIMENTS AND
AN AVERAGE IS DETERMINED
FROM THOSE RESULTS
• THE LARGER THE
SAMPLE SIZE THE
BETTER
REPRESENTATION
OF THE TRUE
VALUE.
MEASURES OF CENTRAL
TENDENCY INCLUDE:
 MEAN AVERAGE *
 MEDIAN
 MODE
* USUALLY THE BEST CHOICE FOR
GETTING CENTRAL TENDENCY
THE AVERAGE FOR A SET OF
DATA / NUMBERS IS ALSO
CALLED THE MEAN
EXAMPLE:
10
12
8
11
+ 12
53
5 = 10.6
MEAN (AVG)
MEDIAN VALUE IS THE
MIDDLE VALUE IN A SAMPLE
OF VALUES
 THERE MUST
BE THE SAME
NUMBER ABOVE
AND BELOW THE
MEDIAN
SAMPLE: 6, 9, 10, 11, 12, 14, 18
MEDIAN VALUE = 11
MEAN / AVERAGE = 11.43
MODE IS THE VALUE THAT
OCCURS MOST OFTEN IN A
SAMPLE
• SAMPLE: 10, 8 11, 12, 14, 8, 11, 11
MODE = 11
Which Example Below is a More
Accurate Mean Average ?
MODE
?
MEAN
?
MEDIAN ?
WHY IS IT MORE
ACCURATE?
MEASURES OF VARIATION
RANGE = HIGHEST SCORE – LOWEST SCORE
in the set of numbers
_
STANDARD = Sq. Root of E(x – x)
DEVIATION *
n-1
* BEST CHOICE - USES ALL NUMBERS IN THE LIST
2
RANGE OF A SET OF DATA=
HIGHEST VALUE – LOWEST VALUE
EXAMPLE:
6, 7, 8, 11,12,14,14,15,15, 16, 19, 20
20 – 6 = 14 (RANGE)
RANGE HAS LIMITED USE
10% RULE:
Some researchers
consider data to be valid
and representative or
significant within the
10% range
{
EXAMPLE:
10%RULE:
10
0.6
12
1.4
8
2.6
11
0.4
12
1.4
53 (10.6 MEAN)
10% of 10.6 (mean) is + / - 1.06
or a range of
9.54 – 11.66
THE RANGE (9.54 – 11.66)
REPRESENTS A VALID RANGE
FOR ACCEPTING THE DATA
EXAMPLE:
10
12
8
11
12
53
NOTE: USING THE 10% RULE
& THE RANGE (9.54 – 11.6)
WHICH VALUES WOULD BE
CONSIDERED OUT OF RANGE?
8 & 12
STANDARD DEVIATION (SD)
IS A MEASUREMENT OF THE
VARIATION FROM THE MEAN
SD CONSIDERS THE #
THAT ARE OUT
OF RANGE AND
HOW FAR OUT OF
RANGE THEY ARE
STANDARD DEVIATION
REPRESENTS HOW CLOSELY
DATA ARE
CLUSTERED
AROUND
THE MEAN
STANDARD DEVIATION TERMS:
_
X = MEAN
X = INDIVIDUAL SCORES IN THE SET
EX = SUM OF ALL SCORES / VALUES
n = TOTAL NUMBER OF SCORES OR
VALUES IN THE SET
Calculating a Standard Deviation
Take a sample problem with the following values:
There are eight data points total, with a mean (or average) value of 5:
To calculate the standard deviation, compute the difference of each data point from the mean,
then square the result:
Next divide the sum of these values by the number of values, then take the square root to get the
standard deviation:
The standard deviation of this example is 2.
FINDING STANDARD DEVIATION
CAN BE CONFUSING &
DIFFICULT IN SOME SITUATIONS
• PROCEDURES VARY DEPENDING ON THE
PURPOSE & TYPE OF DATA RECORDED
• COMPUTER PROGRAMS & SCIENTIFIC
CALCULATORS WILL MAKE
THE TASK EASIER
Use of Standard Deviation:
One standard deviation away from the mean in either direction
represents around 68 % of the population in this group. Two standard
deviations away from the mean account for roughly 95 % of the
population. And three standard deviations account for about 99 % of the
population.
If the curve were flatter and more spread out, the standard deviation
would be larger in order to account for 68 % of the population. So
standard deviation can tell you how spread out the examples in a set are
from the mean.
This is useful if you are comparing results for different things (drugs,
equipment, etc.). Standard deviation will tell you how diverse the test
scores are for each specific thing being measured.
NORMAL DISTRIBUTION OR
A BELL CURVE
MEAN
Each colored band has a width of one standard deviation.
GAUSSIAN CURVE
• SCORES ARE PLOTTED ON A
GRAPH
• ALSO KNOWN AS:
NORMAL DISTRIBUTION CURVE
NORMAL DISTRIBUTION OR
Gaussian Curve
Shows Normal Frequency:
– 68.26% of the values are within 1 standard
deviations from the mean.
– 95.44% of the values are within 2 standard
deviations from the mean. Common Choice
– 99.73% of the values are within 3 standard
deviations from the mean.
STANDARD DEVIATION:
EXAMPLE:
2 SD:
10
12
10
12
13
9
} SD FROM
11
MEAN
14
53
5 = 10.6 MEAN
• MOST RESEARCHERS CONSIDER +/- 2 SD
DATA VALID / ACCEPTABLE DATA
ACCURACY IS HOW CLOSE A
RESULT IS TO THE TRUE VALUE
WHERE AS
PRECISION REFERS TO THE
REPRODUCIBILITY OF RESULTS,
OR HOW CLOSE THE RESULTS ARE
TO EACH OTHER
LABORATORY INSTRUMENTS
MUST BE PRECISE AS WELL
AS ACCURATE
CLOSE
TRUE
Coefficient of Variation
• Precision of a new instrument will be compared
to the precision of old instrument
• CV = STANDARD DEVIATION
MEAN AVERAGE
X 100%
OR
% DIFFERENCE = LOW # - HIGH #
HIGH
X 100%
COEFFICIENT OF VARIATION (CV) IS THE STANDARD
DEVIATION RELATIVE TO THE
MEAN OF THAT SAMPLE
• MAYBE
EXPRESSED AS A
% OF THE MEAN
CV = s x 100
x
PURPOSE OF DETERMINING
COEFFICIENT OF VARIATION
IS TO COMPARE VARIATION OF
TWO DIFFERENT SAMPLES OR
PRECISION OF TWO DIFFERENT
INSTRUMENTS OR METHODS
Chi Square Analysis
• A STATISTICAL MEASURING INSTRUMENT THAT
DETERMINES HOW WELL A SET OF DATA
SUPPORT THE HYPOTHESIS OR EXPECTED
VALUES
• [MAJOR USE IS IN GENETICS]
• [EMPLOYS THE PUNNETT SQUARE]
• PREDICTIONS ARE BASED ON PROBABILITY
Chi Square Analysis
• TESTS WHETHER ITEMS IN VARIOUS
CATEGORIES DEVIATE OR ARE THE SAME
• NULL HYPOTHESIS MEANS IT MEETS
EXPECTATIONS OR LITTLE DIFFERENCE
• A PROBABILITY OF 0.05 OR LESS SHOWS
AN EXTREME DIFFERENCE FROM
EXPECTATED OBSERVATION / HYPOTHESIS
• THE SMALLER THE # THE GREATER THE
LIKELYHOOD IT SUPPORTS
THE HYPOTHESIS
IN SUMMARY:
• THERE ARE A VARIETY OF DATA
ANALYSIS INSTRUMENTS
• EACH INSTRUMENT IS BEST SUITED
TO MEASURE CERTAIN PARAMETERS
OF DATA
• SCIENTISTS AND RESEACHERS USE
THE INSTRUMENTS TO
INTERPRET TEST
RESULTS