Download Nominal Scale of Measurement

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Granular computing wikipedia , lookup

Data analysis wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Renormalization group wikipedia , lookup

Data assimilation wikipedia , lookup

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Transcript
Sampling Techniques
Types of samples
•Different sampling techniques
Sampling Terms
• Population
• The entire group of people of interest from
whom the researcher needs to obtain
information.
• it depends on the objective of your research
• It should be identified properly
• Individual units/elements
• Appropriate inclusion-exclusion criterion should
be identified
• Defined target population
• A population can be defined as set or
collection all people or items with the
characteristic one wishes to understand.
• Because there is very rarely enough time or
money to gather information from everyone
or everything in a population, the goal
becomes finding a representative sample (or
subset) of that population.
Contd….
• Note also that the population from which the sample is
drawn may not be the same as the population about
which we actually want information. Often there is
large but not complete overlap between these two
groups due to frame issues etc .
• Sometimes they may be entirely separate - for
instance, we might study rats in order to get a better
understanding of human health, or we might study
records from people born in 2008 in order to make
predictions about people born in 2009
Contd….
• Define target population – target population
could be much larger than the study
population.
• Sampling frame – the complete list of the
population units( in finite population case)
• Sampling units – the elements or units
considered for inclusion in the sample
Sampling process
• The sampling process comprises several stages:
– Defining the population of concern
– Specifying a sampling frame, a set of items or events
possible to measure
– Specifying a sampling method for selecting items or
events from the frame
– Determining the sample size
– Implementing the sampling plan
– Sampling and data collecting
– Reviewing the sampling process
Sampling
•
The process of obtaining information from a subset
(sample) of a larger group (population)
•
The results for the sample are then used to make estimates
of the larger group
•
Faster and cheaper than asking the entire population
•
Two keys
•
Selecting the right people
•
•
Have to be selected scientifically so that they are
representative of the population
Selecting the right number of the right people
•
To minimize sampling errors I.e. choosing the wrong people by
chance
Characteristics of a good sample
•
•
•
•
•
•
Representative of the population
Accessible
Cost effective
Of the right size
Obtained with minimum sampling error
It should be suitable for analysis as per the
study design
Types of samples/Sampling procedures
• Probability sampling:
• Scientific approach to select representative part
of the population.
• Every possible sample has a probability of
selection which could be equal or unequal,but
predetermined .
• Inclusion probabilities of sampling units is
defined.
• Prejudiced selection/biased selection of units is
avoided
• A probability sampling scheme is one in which
every unit in the population has a chance (greater
than zero) of being selected in the sample, and
this probability can be accurately determined.
• . When every element in the population does
have the same probability of selection, this is
known as an 'equal probability of selection' (EPS)
design. Such designs are also referred to as 'selfweighting' because all sampled units are given
the same weight.
Contd….
– simple random sampling
– systematic sampling
– stratified sampling
– cluster sampling
--Multistage and multi-phase sampling
Note: Prepare assignment which should cover
details of the technique, when it is used,
examples and relative comparison of the
different techniques ( To submit on 15/09 )
Simple random sampling-illustration
• Select a random sample of 15 students from a
class of 100 students
• Using random number table
• A university is testing the effectiveness of two
different medications. They have 20 volunteers.
To conduct the study, researchers randomly
assign a number from 1 to 2 to each volunteer.
Volunteers who are assigned number 1 get
Treatment 1 and volunteers who are assigned
number 2 get Treatment 2. (random number)
Self study exercise
•
•
•
•
Non-probability sampling
When do you use?
What are the drawbacks?
Can we use such sample data for detailed
statistical analysis?
• Explain the different non-probability sampling
techniques
For future discussion
Sample size
determination
Data and types of data
• Data are facts/information collected together
in raw or unorganized form , usually as
numbers, that refer to or represent the
observations on characteristics of
interest/study.
• Data – plural, datum - singular
• Ex:
• BP of patients90,110,110,140,120,110,90,130,…
Contd…
•
•
•
•
Marks: 87,45,35,65,68,58,30,…
Height( in cms.): 167,158,148,152,160,145,…
Eye color: black,blue,black,grey,blue,black,….
Income status :
high,low,low,midle,low,midle,high
• Pain: mild,
mild,severe,mild,moderate,moderate,severe
Types of data
• We get data by making observations
/measurements on ‘characteristics’ of interest
• Can you identify the various characteristics of
interest in your research study?
• In Statistics, all such characteristics are called
‘variables’
• Ex: height, weight, BMI, eyecolour, level of
pain, duration of recovery , concentration of a
chemical, …….
Contd…
DISCRETE
QUANTITATIVE
VARIABLE
CONTINUOUS
QUALITATIVE
Scales of measurement
• Measurement scales are used to categorize and/or quantify
variables.
• four scales of measurement that are commonly used in
statistical analysis: nominal, ordinal, interval, and ratio
scales.
• Each scale of measurement satisfies one or more of the
following properties of measurement
• Identity. Each value on the measurement scale has a unique
meaning.
Magnitude. Values on the measurement scale have an
ordered relationship to one another. That is, some values are
larger and some are smaller.
• Equal intervals. Scale units along the scale are
equal to one another.
• This means, for example, that the difference
between 1 and 2 would be equal to the
difference between 19 and 20.
• A minimum value of zero. The scale has a true
zero point, below which no values exist.
• Nominal Scale of Measurement
• The nominal scale of measurement only satisfies the
identity property of measurement. Values assigned to
variables represent a descriptive category, but have no
inherent numerical value with respect to magnitude.
• Gender is an example of a variable that is measured on
a nominal scale. Individuals may be classified as "male"
or "female", but neither value represents more or less
"gender" than the other.
• Religion , political affiliation, marital status ,eye color
are other examples of variables that are normally
measured on a nominal scale.
• Ordinal Scale of Measurement
• The ordinal scale has the property of both
identity and magnitude. Each value on the
ordinal scale has a unique meaning, and it has
an ordered relationship to every other value
on the scale.
• An example of an ordinal scale: stage of a
disease,severity of pain, level of satisfaction
• We call such variables as ‘caterogical variables,
• Interval Scale of Measurement
•
The interval scale of measurement has the properties of identity, magnitude,
and equal intervals.
•
A perfect example of an interval scale is the Fahrenheit scale to measure
temperature. The scale is made up of equal temperature units, so that the
difference between 40 and 50 degrees Fahrenheit is equal to the difference
between 50 and 60 degrees Fahrenheit.
• With an interval scale, you know not only whether different values are
bigger or smaller, you also know how much bigger or smaller they are.
For example, suppose it is 60 degrees Fahrenheit on Monday and 70
degrees on Tuesday. You know not only that it was hotter on Tuesday,
you also know that it was 10 degrees hotter.
• Ratio Scale of Measurement
• The ratio scale of measurement satisfies all four of the
properties of measurement: identity, magnitude, equal
intervals, and a minimum value of zero.
• The weight of an object would be an example of a ratio
scale. Each value on the weight scale has a unique
meaning, weights can be rank ordered, units along the
weight scale are equal to one another, and the scale
has a minimum value of zero.
• Weight scales have a minimum value of zero because
objects at rest can be weightless, but they cannot have
negative weight.
Organisation of data
• Data may be available in raw/unorganized
form, which need to be arranged in a
systematic way. This task is called organisation
of data.
• It may involve preprocessing/cleaning.
• Coding the variables , if required
• Preparing meta data ( data on data )
• Preparation of tables/cross tabulation
How to prepare tables?
• Simple tables:
– Just one variable( qualitative/quantitative)
– Listing the values with variable description
– Preparing frequency distributions
– Cross tabulation ( bivariate data)
___________________________________
– Stem and leaf plots ( Read & workout!)
___________________________________
Preparing frequency distributions
• Frequency distribution of categorical
variable/discrete variable
• It is a table of frequencies of different values
of the categorical variable/discrete.
• Ex: The data below present the level of
pain(coded) experienced by patients. The
codes are : mild=0,moderate=1,severe=2
• 1,2,0,0,2,0,1,2,2,0,2,1,1,0,0,2,1,0,2,0,0,1,2,0,1,
2,0,2,1,0,2,1,0,2,1,1,2,1,0,2,2,1,1,2,0,0,0,0,1,0
Distribution of pain levels experienced
by patients
Pain level
No. of patients
0
19 (38%)
1
15 (30%)
2
16 (32%)
Total
50
0=mild,1=moderate,2=severe
Remember this…
• Table should be neatly drawn.
• It should have a title, table no.,row and
/column headings. Total of col./row should be
shown ( depending on the problem).
• A footnote can be added to give details of
codes and any other special features noted.
Contd…
• Usually when we prepare a frequency
distribution for categorical data, we show the
% values along with the frequencies.
• Suppose we have to consider the gender of
the patient along with pain level, then we
cross-tabulate gender v/s pain level. How will
you do this?
Frequency distribution of continuous data
• Following formulae can be used to decide the no. of
class intervals( bins )
• Determine the range of the sample data –
R= Max. – min.
• Square root formula: k= √n , where n is the number of
observations and k is the no. of class intervals.
• k= R/ h , where R is the range and h is the suggested bin
width . k is approximated to the nearest integer.
• ( i.e., for R=43,h=5 k=8.6 ,approximated to 9 )
Contd…
• Sturge’s formula: k= 1+ log 2 n or
k = 1 + 3.322 log n
10
This formula works well for n > 30. For n ≤ 30 ,it fails to
reflect any trend. It is poor if data are non-normal.
Ex: If n= 100 , then k= 1 + 3.322 x 2 = 7.644 ~ 8
After finding the number of bins, we determine the class width
( bin width ) using the formula w= R/k, where w is the bin
width and R is the range. Usually we take w adjusted to
convenient round figures.
Example
• In a data set min. value is 18.7 and max. value
is 68.8. If there are 180 observations ,
determine the number of classes and class
intervals using the different formulae.
• Using square root formula, k= √ 180 = 13.42 ~
13. class width,w = R/k = ( 68.8-18.7 )/13= 50.1/13
= 3.85 ~ 4. Hence the class intervals are 18-22,
22-26, 26-30,30-34,34-38,38-42,42-46,46-50,5054,54-58,58-62,62-66,66-70
Contd…
• Sturge’s formula: k=1+3.322x log 180 = 7.49~7
w= 50.1/7 ~ 7
The classes are : 18 –25, 25-32,32-39,39-46,46-53,53-60,6067,67-74
You can consider: 18 – 25, 26 – 33, 34 – 41, 42 – 49, 50 – 57,
58 – 65, 66 - 73
Take bin width w=6: then k= 8.51 ~ 9.
Classes are: 18-24,24-30,30-36,36-42,42-48, 48-54,54-60,6066,66-72
Know these concepts : inclusive and exclusive classes, class
limits, class boundaries, frequency , cumulative frequency ,
relative frequency
Points to remember
• A thumb rule for deciding the no. of class intervals is to
consider not less than six classes and not more than 15
classes. With less than six classes there will be too much of
summarisation and more than 15 classes would mean not
enough summarisation
• The number of class intervals (k) given by different formula
need not be taken as final but only as a guidance value. The
actual no. of class intervals may be taken around that
guidance value.
• When it is appropriate,we can select classes with class width 5
or 10 and use class limits beginning and ending with 5 and
itsmultiples( or multiples of 10 ) ex.: 0 -5, 5 – 10, 10 – 20, etc.
Home work ( for presentation and discussion )
• Select a data set, preferably related to biological or
medical example and construct frequency tables using
the different formulae.
• Obtain relative frequency and cumulative frequencies.
• Prepare a brief report on the major observations you
can highlight.
• NOTE: Do not select the
common data set.
Diagrams /charts
• Bar charts/diagrams
– Simple , multiple , component , percentage
• Pie chart
• Home work: prepare detailed notes on the above
topics based on the discussions held in the class.
Coverage : need for different types of charts,
context of use, interpretation ,do’s and don’t‘s
etc. Learn how to use EXCEL to draw the
charts/diagrams
Histogram
• Used to summarize continuous data.
• Visualisation of a frequency distribution
• Vertical bars representing frequencies are
drawn over class intervals.
• Usually equal width classes are considered
• Histograms are useful to understand
symmetry or asymmetry of data
• Prepare detailed notes and workout examples.
Histogram- examples
• Interactive histogram
Examples
• People were asked to state the number of
hours they exercise in a seven day period. The
results of the survey are listed below. Make a
frequency table and histogram to display the
data.
• 8, 2, 4, 7.5, 10, 11, 5, 6, 8, 12, 11, 9, 6.5, 10.5,
13 ,8.5 ,3 , 4.5, 6, 5.5, 7.5, 8, 10, 11, 10.5, 4.5,
8, 7, 5, 6.5, 4.5, 6.5 ,7.5, 8, 10,13, 9.5, 3.5,
4.5,5, 5.5 , 6.5, 7.5, 8, 10
Stem and leaf plot
• Bears strong resemblance to a histogram and
serves the same purpose
• Used to show the distributional structure of
quantitative data
• It preserves the information contained in
individual observations
• It can be constructed along with the tallying
process when a frequency distribution is
constructed
Method of construction
• Each measurement( observation ) is
partitioned into two parts – a stem and a leaf
• Arrange the observations horizontally in
increasing order against a stem value. All such
values are the leaves. Do this for all the stem
values .Such an arrangement looks like a
histogram with horizontal bars of numbers.
• Learn the construction using an example