Download Final Exam Review

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
FINAL EXAM REVIEW
STATISTICS
WHEN TO USE EACH TEST
• REGRESSION:
• CHECKS IF ONE VARIABLE INFLUENCES, PREDICTS, OR IS RELATED TO THE OTHER VARIABLE.
• TWO-SIDED T TEST
• COMPARES TWO AVERAGES TO SEE IF THERE IS A DIFFERENCE.
• ONE-SIDED T TEST
• COMPARES TWO AVERAGES TO SEE IF ONE AVERAGE IS HIGHER THAN, OR LOWER THAN, THE OTHER
ONE.
• CONFIDENCE INTERVAL
• COMPARES THE AVERAGE OF THE DATA TO A SPECIFIC NUMBER.
HOW TO DO A REGRESSION TEST
• GET THE DATA
• GO TO DATA ANALYSIS
• CHOOSE REGRESSION
HOW TO DO A TWO-SIDED TTEST
• ORGANIZE DATA (IF NEEDED)
• TO GET P-VALUE:
• TTEST(ARRAY1,ARRAY2,2,3)
HOW TO DO A ONE-SIDED TTEST
• ORGANIZE DATA (IF NEEDED)
• CHECK IF THE AVERAGES OF THE DATA AGREE WITH
• IF THEY DO NOT, DO NOT PERFORM THE TTEST
• IF THEY DO, GO AHEAD AND FIND THE P-VALUE
• TO GET P-VALUE:
• TTEST(ARRAY1,ARRAY2,1,3)
THE CLAIM.
HOW TO DO A CONFIDENCE INTERVAL
• ORGANIZE DATA (IF NEEDED)
• FIND THE STANDARD DEVIATION
• =STDEV(DATA)
• FIND THE SIZE
• =COUNT(DATA)
• FIND THE MARGIN OF ERROR
• ALPHA = 0.05
• =CONFIDENCE.T(ALPHA, STDEV, SIZE)
• FIND AVERAGE
• =AVERAGE(DATA)
• FIND THE CONFIDENCE INTERVAL [LOWER BOUND, UPPER BOUND]
• LB:
= AVERAGE – MARGIN OF ERROR
• UB:
= AVERAGE + MARGIN OF ERROR
IN THE STATE OF MISSOURI, THE SALARIES OF 20 RANDOMLY
SELECTED BASKET WEAVERS WAS RECORDED. THE
AVERAGE OF THESE SALARIES WAS FOUND TO BE $28,000,
WITH A STANDARD DEVIATION OF $4,000. CONSTRUCT A
95% CONFIDENCE INTERVAL FOR THE TRUE AVERAGE
SALARY OF BASKET WEAVERS IN MISSOURI. ROUND YOUR
ANSWER TO TWO DECIMAL PLACES. (HINT: YOUR ANSWER SHOULD BE
EXACTLY THE SAME AS THE ONE GIVEN)
[ 26127.94, 29872.06 ]
IT IS BELIEVED THAT MERCEDES MODELS
HAVE LESS HIGHWAY GAS MILEAGE (ON
AVERAGE) THAN TOYOTA MODELS. GIVEN
THE DATA YOUR JOB IS TO CONFIRM OR
DISPROVE THIS ASSERTION. (USE CARS04-1 DATA)
WHAT TEST TO PERFORM?
One-sided TTest
IT IS BELIEVED THAT MERCEDES MODELS
HAVE LESS HIGHWAY GAS MILEAGE (ON
AVERAGE) THAN TOYOTA MODELS. GIVEN
THE DATA YOUR JOB IS TO CONFIRM OR
DISPROVE THIS ASSERTION. (USE CARS04-1 DATA)
WHAT IS THE P-VALUE/MARGIN OF ERROR?
8.406E-06
IT IS BELIEVED THAT MERCEDES MODELS
HAVE LESS HIGHWAY GAS MILEAGE (ON
AVERAGE) THAN TOYOTA MODELS. GIVEN
THE DATA YOUR JOB IS TO CONFIRM OR
DISPROVE THIS ASSERTION. (USE CARS04-1 DATA)
STATISTICAL INTERPRETATION?
Since P-value is small, we are
confident that the Mercedes average
is lower than the Toyota average.
IT IS BELIEVED THAT MERCEDES MODELS
HAVE LESS HIGHWAY GAS MILEAGE (ON
AVERAGE) THAN TOYOTA MODELS. GIVEN
THE DATA YOUR JOB IS TO CONFIRM OR
DISPROVE THIS ASSERTION. (USE CARS04-1 DATA)
CONCLUSION?
We are confident that this was a
reasonable claim.
IT IS BELIEVED THAT THE MORE BEDROOMS A
HOUSE HAS, THE HIGHER THE SELLING PRICE IS
GOING TO BE. GIVEN THE DATA YOUR JOB IS
TO CONFIRM OR DISPROVE THIS ASSERTION.
(USE HOUSE_PRICE5VARIABLES DATA)
WHAT TEST TO PERFORM?
Regression
IT IS BELIEVED THAT THE MORE BEDROOMS A
HOUSE HAS, THE HIGHER THE SELLING PRICE IS
GOING TO BE. GIVEN THE DATA YOUR JOB IS
TO CONFIRM OR DISPROVE THIS ASSERTION.
(USE HOUSE_PRICE5VARIABLES DATA)
WHAT IS THE P-VALUE/MARGIN OF ERROR?
0.000166
IT IS BELIEVED THAT THE MORE BEDROOMS A
HOUSE HAS, THE HIGHER THE SELLING PRICE IS
GOING TO BE. GIVEN THE DATA YOUR JOB IS
TO CONFIRM OR DISPROVE THIS ASSERTION.
(USE HOUSE_PRICE5VARIABLES DATA)
STATISTICAL INTERPRETATION?
Since P-value is small, we are confident
that the amount of bedrooms influences
the selling price of the house.
IT IS BELIEVED THAT THE MORE BEDROOMS A
HOUSE HAS, THE HIGHER THE SELLING PRICE IS
GOING TO BE. GIVEN THE DATA YOUR JOB IS
TO CONFIRM OR DISPROVE THIS ASSERTION.
(USE HOUSE_PRICE5VARIABLES DATA)
CONCLUSION?
We are confident that the above
assertion is correct.
TEST IF THE HIGHER THE ASSAULTS THAT
OCCUR IN FLORIDA INCREASES THE
AMOUNT OF MURDERS THAT ARE
EXPECTED IN FLORIDA.
(USE US_CRIME DATA)
WHAT TEST TO PERFORM?
Regression
TEST IF THE HIGHER THE ASSAULTS THAT
OCCUR IN FLORIDA INCREASES THE
AMOUNT OF MURDERS THAT ARE
EXPECTED IN FLORIDA.
(USE US_CRIME DATA)
WHAT IS THE P-VALUE/MARGIN OF ERROR?
1.998E-05
TEST IF THE HIGHER THE ASSAULTS THAT
OCCUR IN FLORIDA INCREASES THE
AMOUNT OF MURDERS THAT ARE
EXPECTED IN FLORIDA.
(USE US_CRIME DATA)
STATISTICAL INTERPRETATION?
Since P-value is small, we are confident
that the slope of the regression line is not
zero.
TEST IF THE HIGHER THE ASSAULTS THAT
OCCUR IN FLORIDA INCREASES THE
AMOUNT OF MURDERS THAT ARE
EXPECTED IN FLORIDA.
(USE US_CRIME DATA)
CONCLUSION?
We are confident that the above
assertion is correct.
(USE DATA_HURICANES_COMPREHENSIVE DATA)
One-sided TTest
(use Data_Huricanes_Comprehensive data)
0.10033
(use Data_Huricanes_Comprehensive data)
Since P-value is too large, the test is
inconclusive.
(use Data_Huricanes_Comprehensive data)
We are not confident that this was a
reasonable claim.
IS IT REASONABLE TO CLAIM THAT
STUDENTS WITH 8 OR MORE VISITS TO OPEN
LAB SESSIONS HAVE AVERAGE FINAL
GRADE GREATER THAN 78? (USE LABVISITS DATA)
WHAT TEST TO PERFORM?
Confidence Interval
IS IT REASONABLE TO CLAIM THAT
STUDENTS WITH 8 OR MORE VISITS TO OPEN
LAB SESSIONS HAVE AVERAGE FINAL
GRADE GREATER THAN 78? (USE LABVISITS DATA)
WHAT IS THE P-VALUE/MARGIN OF ERROR?
3.622127
IS IT REASONABLE TO CLAIM THAT
STUDENTS WITH 8 OR MORE VISITS TO OPEN
LAB SESSIONS HAVE AVERAGE FINAL
GRADE GREATER THAN 78? (USE LABVISITS DATA)
STATISTICAL INTERPRETATION?
The prediction interval for the true
average final grade of students with 8 or
more visits to open lab is [78.64, 85.89].
IS IT REASONABLE TO CLAIM THAT
STUDENTS WITH 8 OR MORE VISITS TO OPEN
LAB SESSIONS HAVE AVERAGE FINAL
GRADE GREATER THAN 78? (USE LABVISITS DATA)
CONCLUSION?
We are confident that students with 8 or
more visits to open lab sessions have
average final grade greater than 78.
(USE
MANBODYNEW21116 DATA)
Confidence Interval
(USE
MANBODYNEW21116 DATA)
1.10354
(USE
MANBODYNEW21116 DATA)
The predicted interval is [18.844,
21.051].
(USE
MANBODYNEW21116 DATA)
No, we cannot claim that the above
assertion is correct.
(USE PRELAW_NURSING DATA)
Two-sided TTest
(USE PRELAW_NURSING DATA)
0.000629
(USE PRELAW_NURSING DATA)
Since P-value is small, we are confident
that the averages are different.
(USE PRELAW_NURSING DATA)
We are confident that this was a
reasonable claim.
Double Bell
855
55-60
Stdev < Median < Average
Bachelor’s Degree
Professional Degree
Professional Degree
High School Completion
Professional Degree
•
•
•
•
•
•
•
•
•
•
A RANDOM EXPERIMENT WAS
CONDUCTED WHERE A PERSON A TOSSED
SEVEN COINS AND RECORDED THE
NUMBER OF “HEADS”. PERSON B ROLLED
FOUR DICE AND RECORDED THE SMALLER
NUMBER OUT OF THE FOUR DICE. SIMULATE
THIS SCENARIO (USE 10000 LONG
COLUMNS) AND ANSWER THE QUESTIONS.
WHICH OF THE TWO PERSONS (A OR B) IS MORE LIKELY TO GET THE NUMBER 2?
Person B
WHAT IS THE PROBABILITY THAT PERSON B OBTAINS THE NUMBER “3” OR “4”?
Around 18%
WHICH OF THE TWO PERSONS WILL HAVE HIGHER VARIATION IN THEIR OUTCOMES?
Person A
WHICH OF THE TWO PERSONS WILL ON AVERAGE GET A HIGHER NUMBER?
Person A
WHAT IS THE PROBABILITY OF PERSON A GETTING A NUMBER BETWEEN 3 AND 6?
Around 44%
WHICH OF THE PERSONS HAS A HIGHER PROBABILITY OF GETTING THE NUMBER 3 OR SMALLER?
Person B
• 65% OF DATA
• AVERAGE ± STDEV
• 95% OF DATA
• AVERAGE ± 2*STDEV
• 99% OF DATA
• AVERAGE ± 3*STDEV
IN THE STATE OF FLORIDA, THE
STARTING SALARY OF MECHANICAL
ENGINEERS FOLLOWS A NORMAL
DISTRIBUTION WITH MEAN 62,000 AND
STANDARD DEVIATION 5,000. WITH THE
ABOVE INFORMATION SIMULATE 10000
STARTING ENGINEERS.
WHAT WOULD BE A RANGE [A TO B], WHICH WOULD CONTAIN 95% OF THE STARTING
SALARIES OF MECHANICAL ENGINEERS?
Between 52,000 and 72,000
What is the approximate probability that a randomly picked starting
engineer will have a salary below 57,000?
Around 15%
WHAT IS THE APPROXIMATE PROBABILITY THAT A RANDOMLY PICKED STARTING
ENGINEER WILL HAVE A SALARY OF 73,000 AND ABOVE?
Around 1%
What is the approximate probability that a randomly picked starting
engineer will have a salary between 65,000 and 72,000?
Around 25%
WHAT VALUES ARE DESIGNED TO DESCRIBE THE CENTER OF THE
VARIABLE?
Mean and Median
Which number describes the variation of the
variable?
Standard Deviation
Which two variables are influenced by outliers?
Mean and Standard Deviation