Download CHAPTER_1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 1: INTRODUCTION TO STATISTICS
What is statistics?
Statistics represent scientific procedures and methods for collecting, organizing,
summarizing, presenting and analyzing data, as well as obtaining useful information,
drawing valid conclusions and making effective decisions based on the analysis.
Types of statistics
(a) Descriptive statistics – covers compiling data, organized, summarized and presented
in suitable visual forms which are easy to understand and suitable for use. Various
tables, charts and diagrams are used to exhibit the information obtained from the data.
(b) Inferential statistics – make generalizations about a population by analyzing
samples. The procedure is to select a sample from the population, measure the variables of
interest, analyze the data, interpret the output and draw conclusions based on the data
analysis.
Terms and definitions
Parameter
A numerical measurement describing some characteristic of a population
Statistic
A numerical measurement describing some characteristic of a sample
Population
A population is any entire collection of objects we want to study about, from which we may
collect data. This could be people, animals, and plants and so on. For example: All new
students enrolled in 2008/2009 intake at Twintech College Sarawak is the population.
Sample
A sample is a group of units that is a subset of the population. For example: First year student
who are in the Multimedia Program are chosen.
Random
Randomness means unpredictability. One of the requirements in a sampling process is to
conform to randomness. Hence the variable being measured is called a random variable.
Data
Basically numbers are derived from measuring or observing outcomes of random variables.
For example: Random variable height we get data values such as 152 cm, 163.5 cm and etc.
1
On the other hand data can also be non-numeric. For example: data on previous school could
be SMK St Teresa, SMK St Joseph and etc.
(a) Primary data – collect data from primary sources or from samples. For example:
a researcher may go the supermarket and observe the buying habits of the publics during
festive seasons. Normally, primary data are more accurate and consistent with the objectives
of the research.
(b) Secondary data – normally published data collected by other parties. For
example: Bank Negara, the Department Of Statistics, and other agencies publish their data
regularly and provide secondary sources of data to researchers. In addition, bulletins,
journals, newspapers and other publications also provide useful secondary data to
researchers.
Variable
A variable is a particular characteristic of the object being studied. This characteristic can
take on different values as we measure/gather it from one object to another. For example:
The new students have to provide information about their weight, height, previous school and
parent’s income. These are variables.
Types of variables
(a) Quantitative random variables – numerical data


Continuous random variables – numerical response which arises from a measuring
process, can take any values including fractions, decimals, and irrational numbers.
For example: weight, height, atmospheric pressure, time.
Discrete random variables – numerical response which arises from a counting
process produces data that are whole numbers. For example: Number of student in a
class, number of children in a family.
(b) Qualitative random variables – non numeric data (categorical)
For example: In a survey you might give an answer of Yes or No if asked the question “Did
you come to the class yesterday?” or the outcome of experiment in the Chemistry Laboratory
might be Yellow, Orange, Blue and etc from a chemical reaction between two enzymes.
Measurement Scales
Nominal scale – categorical data, classify data into various distinct categories such as the
types of school you went to (urban, rural), your favourite soft drink (coke, pepsi) or your
gender (male, female). Numbers can be assigned to these data as a presentation for example
male = 1, female = 2. The number in the data cannot be manipulated arithmetically where it
cannot be added or subtract. Means male plus female is not equal to male.
Ordinal scale – represents levels or order and inequality signs can be used when comparing
the values of the variable. These are values such as the first, second and third place in a
2
competition (1, 2, 3) or ratings on the canteen operators on campus such as bad = 1,
satisfactory = 2, good = 3 and excellent = 4.
Interval scale – involves numerical data but it does not have a true zero point. The data
cannot be manipulated by multiplication or division. For example: the temperature of 30°C is
warmer than 15°C but it is not twice warmer than 15°C.
Ratio scale – involves a true zero point, covers most numerical measures such as salary,
height, weight, etc. A person has RM100 has twice as much as someone who only has RM50.
And a person who has zero ringgit in his pocket truly has no money!
Data Collection
Data do not just appear, we have to collect them. We have to plan how to collect data and we
must be clear about what we wan to investigate in our study.
If data is collected from every unit in a population, then that is called a census. This is
normally performed by government agencies only as population is usually very large and the
data collection process requires a lot of time and energy to conduct plus high cost. Most of
the time data is collected from samples only.
Data can be collected from an experimental or observational study
Experimental Study – two different teaching methods are conducted and student’s
performance from the two methods is compared.
Observational Study – data is collected through observation without applying any treatment
on the object. This can be through surveys or by just observing the behavior of customers in a
hypermarket to find out how they choose what to buy.
For both experimental and observational studies, proper sampling plans must be used to make
sure the sample represent the population.
Sampling methods:
(a) Simple Random Sample – a sample is chosen randomly from the population. This can be
done by using random numbers generated using software, a table or just your calculator.
(b) Systematic Sample – every kth element from a population is chosen, starting from a
randomly selected element. This method can be used if every element can be sequentially
numbered.
(c) Stratified Random Sample – the population is divided into various strata based on some
condition. Then, subsamples are taken from each strata using simple random or systematic
method. The subsamples are then combined to form the sample. For example we want to do a
survey on students about campus facility. We divide students according to their Academic
Year, 1st, 2nd and 3rd year and then take subsamples from each group.
3
(d) Cluster Sample – when the population can be divided into clusters which most often occur
naturally, we take subsample from clusters and this is called cluster sampling. Sometimes not
all available clusters are sampled. For example we want to study teaching and learning skills
in schools. We might choose five states, Pahang, Sarawak, Kelantan, Johor and Perak. Then
we randomly select a few schools from each state. The states in Malaysia are the clusters.
4