Download Sheet20,Dr.Mahmoud

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Community lec no. 22
 In Today’s lecture we will start discussing “Descriptive Statistics”
 We have already taken 3 lectures as an introduction to Biostatistics.
 We will Divide Descriptive Biostatistics into: Stat 1, Stat2 and Stat 3,
Stat 4.
 Stat 1  We will explain All the Descrptive Statistics in general.
 Stat 2  We will explain the Shapes and Normal distribution.
 Stat 3 we will explain Realtive Risk, Arboration (Not Sure about
this word exactly) and there will be answer for sample statistic.
 Stat 4  We will give a type from each different analysis.
 By this we have given Nominal test for 2 groups, Ordinal Test for 2
groups and Relational Interval test for 2 groups.
Now what is Biostatistics and what does it mean?
Biostatistics: It is a branch of applied math that deals with collecting,
organizing and interpreting data using well-defined procedures and
techniques.
 We have 2 types of Biostatistics:
1. Descriptive Statistics
 It involves organizing, summarizing & displaying data to make
them more understandable.
2. Inferential Statistics
 It reports the degree of confidence of the sample statistic that
predicts the value of the population parameter.
Here we take a sample, and through this sample we can
generalize on the population.
)‫(نأخذ عينة و من خالل هذه العينة نستطيع أن نعمم على باقي العينات او السكان‬
 There are certain things that we have to take in consideration while
we are collecting the data when doing a STAT study:
1. Accuracy
The data must be accurate, by the following:
 Look for the missing data
 Investigate the results
1
 Consider the results of other studies
 Perform data analysis
Hypotheses (‫ )الفرضيات‬can be either Null hypothesis or Alternative
hypothesis
 Null Hypothesis
Means that the hypothesis has no relationship between the
variables (whether dependent variable or independent variable)
and in Arabic it means (‫)الفرضية الصفرية‬
In other words we can say that it is only stated by chance ( ‫أطلقت‬
‫)بالصدفة‬
A simple example about this hypothesis “Your attendance to a
lecture does not reflect your mark in the exam”(No Relation).
It is presumed to be true until statistical evidence nullifies it for
an alternative hypothesis.
Type 2 Errors are considered with this hypothesis. {We will
discuss the Errors shortly}
The Null hypothesis can also be called “Statistical Hypothesis”
 Alternative Hypothesis
It means that there is a relationship between the variables.
Example on this hypothesis: “When you study the biostatistics
subject and how it will reflect your performance in the exam”.
There is a relationship.
There are two types of errors that the researcher may encounter during
his research (Type 1 Error and Type 2 Error):
Type 1 Error
o It either rejects or not rejects the Null Hypothesis according to the
Alpha.
o Alpha is the term used to express the level of significance we will
accept.
2
o Alpha is considered as the Limit of chance that through it we can
accept or reject the null hypothesis (Ex: the researcher considered
the probability is 5% or 1%)
o The researcher results appear by chance. (The Dr demonstrated it
as Flipping a coin and by chance we can get one side)
o Each time we flip the coin we will get the chance to have 50%
head and 50% tail.
o If we flip the coin twice, the probability will be 25% and so on,
(Each time we flip the coin we will multiply the probability by 0.5)
o The Null hypothesis is related to probabilities.
o Usually the researcher sets the α according to certain value in
order to eliminate the “chance” in where probabilities play a role.
o Suppose the Alpha was 1% {The researcher determines his own
value}
o If the probability was 1% or more  The researcher cannot reject
the Null hypothesis, because the result is by chance.
If the probability was less than 1% The result is for the tweet
{Sorry, I didn’t know what the dr meant by this}
o the null hypothesis: that there is no relationship between two
measured phenomena
o U have a medicine u need to try on a group of people there is a
slight chance that some will get better by chance (not because of
the medicine)
o Type 1 error (alpha error) : is the incorrect rejection of a true null
hypothesis
o Type 2 error (beta) : is the failure to reject a false null hypothesis
o The alternative hypothesis (research hypothesis) : the results you’re
getting from the research.
o Null and the alternative are two rival hypotheses, in ur research u
should include the null hypothesis by writing the percentage under
the name “data that do not support the research hypothesis” while
if u r rejecting the null hypothesis u say that “data support the
research hypothesis”
o Example : (u will get confused bas I tried to write what the doctor
said aha mesh last 2 lines)
3
o Testing 1000 bulb if they are working properly assuming that they
should work 1000 hrs, we took 100 for testing and we put chance
of error (sample error) 5% the results gave us 850 hrs not 1000 hrs
, in this case we are not sure that the mean for the 100 is the same
for the 1000 so we sth called standard error of the mean to check if
the sample mean equal the population mean, null hypothesis says
that there is no different between the population mean and the
sample while the alternative hypothesis says they are different now
u did ur calculation and and the null hypothesis is 6% which is
bigger than 5% rejecting it is type 1 error but if it was less than 5%
rejecting is type 2 error
o 1-beta= power of the study
Alfa the leveled of significant use for establishing F1F , Beta the
probability of type 2 to R
( 1 – beta ) = power of the study ..
Power of the study is the independent variable cause for the dependent
variable .. Big size , how much we can say that the intervention with the
significant difference ?
Meaning of the result >> translation result from number to words and
make them understandable .. importance translation the significant
finding into practicable finding >> how we translation and we say less
than 0.5 and we do rejection to non-hypothesis and so on ..
generalization , how can we make the foiling to use for all population (
‫ ) كيف ممكن نعمم‬, implication , what have we learned related to what has
been used during study ,, and all this will be explained point by point in
details later on ..
Definitions :
Data: is any type of information ..
First step : Raw data is a data collected as they receive .. ( ‫) ملء االستبيان‬
Second step : Organize data is the data that organized either in
ascending, descending or in a grouped data ..
4
Example : ( In the slide 4 )
Weight in pounds of 57 school children at a day-care center : these
numbers are not arranged, this information like that called >> raw data ,
but if we arranged them from largest to smallest, we will called them >>
arranged data..
Descriptive Statistics:
-In this lecture we will discuss using descriptive statistics, as opposed to
inferential statistics..
-Here we are interested only in summarizing the data in front of us,
without assuming that it represents anything more.. as like when we
talking about students outcome on exam of dental students , we don’t
generalize to all students of the University of Jordan ..
-We will look at both quantitative and graphical techniques .. (high
charge , medium charge , low charge)
-The basic overall idea is to turn data into information ..
So what we will taking is Numerical data may be summarized
according to several characteristics .. numbers that collected from
questionnaires or interviews , we will summarized in several ways
:: measures of location or measures of dispersion or measures of
shape or skewness ..
5
Measures of location: we take
-Measures of central tendency: we take Mean; Median; Mode..
-Measures of non-central we take: tendency – Quantiles which is
Quartiles; Quintiles; Percentiles..
Measures of dispersion: we take
Range, Interquartile range, Variance, Standard Deviation,
Coefficient of Variation..
Measures of shape: we take
positive score, negative score, Beta, normal distribution, normal
standardize distribution and standardizing data as scores ..
First :: Measures of location: ( Mean , Mode , Median )
-Measures of location place the data set on the scale of real
numbers.
-It refers to the location of a typical data value-the data value
around which other scores to cluster..
-Measures of central tendency (i.e., central location) help find the
approximate center of the dataset.. We want to see how the data
focus in the center.. so we are taking about the average but we
cannot say the average because meaning of the word average is
not specific ..
-Researchers usually do not use the term average, because there
are three alternative types of average. Average = mean , median
and mode ..
-These include the mean, the median, and the mode. In a perfect
world, if the data is perfect normally distributed the mean,
median & mode would be the same ( equal ) ..
-However, the world is not perfect & very often, the mean,
median and mode are not the same ..
Let us to talk about Summation signs ( segment if we want to sum
from 1 to 10 )
Now we will taking about the Mean ..
Mean:
6
Is the sum of varies of the data divided by the number of
observations .. if you have data ( 1,2,2,4,5,10)
Mean =( 1+2+2+4+5+10 )/6 = 4
Mean sensitive to extreme values
 e.g. : if the DR gave us an exam and the data as the
following :
- two of us got 100
- one of us got 90
- recent got between 70-80 .
 The mean will be 98 because we have two extreme values 100
(y3ni bykon el mean a8rab ll extreme ) .
But if all of student got 70-80 and someone got 10 , the mean will
go down to 60 .
So we cant calculate the mean from skewed data .
 E.g. : if the average income of some people is between 500
– 800 , but we have 2% have income from 5000-10000 , if
we took the people whose their income more than 5000
with people whose their income from 500-800 , the mean
will be skewed positively to the right , so the mean will be
bigger than median .
 Extreme value pulled the mean to a larger value so the data called
skewed .
 E.g. : the mean of death at age of 75 , if there is people die
with age below of one year , the mean will decrease to the
left will go negatively , so the mean will be smaller than
median and it will be skewed negatively .
 The data not normally distributed , it means that the mean and
the median and the mode are the same .
7
- 34% >>>>> one standard deviation above the mean .
- 34% >>>>> one slandered deviation below the mean .
- 96% means that we have two standard deviation .
- if it not like this it will be positively (+) or negatively (-) skewed .
 Sample Mean Also called sample average( although it's a wrong
name ) or arithmetic mean , it differs from geometric mean (
advanced formation for data , the data not normally distributed ,
we will make transform data , it differ in calculation from the
arithmetic mean ) .
 Mean for the sample = X or M, Mean for population = mew (μ)
 Sensitive to extreme values which means One data point could
make a great change in sample mean.
 E.g. : We have a sample :
1 1 1 1 51
- the mean = 55/5 = 11 ( but its not actual mean because
there is extreme values in analytic researches , these values
called outliners we should make adjustment mean , if the
data not normally distributed .
Good luck and sorry for any mistakes or misunderstood points
Done by:
Sumaya Abuodeh
Khaldoon AlQaddumi
Tareq Al-Amad
Alaa Ali
Alaa Mohammed Yousif
8