Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 1: INTRODUCTION TO STATISTICS What is statistics? Statistics represent scientific procedures and methods for collecting, organizing, summarizing, presenting and analyzing data, as well as obtaining useful information, drawing valid conclusions and making effective decisions based on the analysis. Types of statistics (a) Descriptive statistics – covers compiling data, organized, summarized and presented in suitable visual forms which are easy to understand and suitable for use. Various tables, charts and diagrams are used to exhibit the information obtained from the data. (b) Inferential statistics – make generalizations about a population by analyzing samples. The procedure is to select a sample from the population, measure the variables of interest, analyze the data, interpret the output and draw conclusions based on the data analysis. Terms and definitions Parameter A numerical measurement describing some characteristic of a population Statistic A numerical measurement describing some characteristic of a sample Population A population is any entire collection of objects we want to study about, from which we may collect data. This could be people, animals, and plants and so on. For example: All new students enrolled in 2008/2009 intake at Twintech College Sarawak is the population. Sample A sample is a group of units that is a subset of the population. For example: First year student who are in the Multimedia Program are chosen. Random Randomness means unpredictability. One of the requirements in a sampling process is to conform to randomness. Hence the variable being measured is called a random variable. Data Basically numbers are derived from measuring or observing outcomes of random variables. For example: Random variable height we get data values such as 152 cm, 163.5 cm and etc. 1 On the other hand data can also be non-numeric. For example: data on previous school could be SMK St Teresa, SMK St Joseph and etc. (a) Primary data – collect data from primary sources or from samples. For example: a researcher may go the supermarket and observe the buying habits of the publics during festive seasons. Normally, primary data are more accurate and consistent with the objectives of the research. (b) Secondary data – normally published data collected by other parties. For example: Bank Negara, the Department Of Statistics, and other agencies publish their data regularly and provide secondary sources of data to researchers. In addition, bulletins, journals, newspapers and other publications also provide useful secondary data to researchers. Variable A variable is a particular characteristic of the object being studied. This characteristic can take on different values as we measure/gather it from one object to another. For example: The new students have to provide information about their weight, height, previous school and parent’s income. These are variables. Types of variables (a) Quantitative random variables – numerical data Continuous random variables – numerical response which arises from a measuring process, can take any values including fractions, decimals, and irrational numbers. For example: weight, height, atmospheric pressure, time. Discrete random variables – numerical response which arises from a counting process produces data that are whole numbers. For example: Number of student in a class, number of children in a family. (b) Qualitative random variables – non numeric data (categorical) For example: In a survey you might give an answer of Yes or No if asked the question “Did you come to the class yesterday?” or the outcome of experiment in the Chemistry Laboratory might be Yellow, Orange, Blue and etc from a chemical reaction between two enzymes. Measurement Scales Nominal scale – categorical data, classify data into various distinct categories such as the types of school you went to (urban, rural), your favourite soft drink (coke, pepsi) or your gender (male, female). Numbers can be assigned to these data as a presentation for example male = 1, female = 2. The number in the data cannot be manipulated arithmetically where it cannot be added or subtract. Means male plus female is not equal to male. Ordinal scale – represents levels or order and inequality signs can be used when comparing the values of the variable. These are values such as the first, second and third place in a 2 competition (1, 2, 3) or ratings on the canteen operators on campus such as bad = 1, satisfactory = 2, good = 3 and excellent = 4. Interval scale – involves numerical data but it does not have a true zero point. The data cannot be manipulated by multiplication or division. For example: the temperature of 30°C is warmer than 15°C but it is not twice warmer than 15°C. Ratio scale – involves a true zero point, covers most numerical measures such as salary, height, weight, etc. A person has RM100 has twice as much as someone who only has RM50. And a person who has zero ringgit in his pocket truly has no money! Data Collection Data do not just appear, we have to collect them. We have to plan how to collect data and we must be clear about what we wan to investigate in our study. If data is collected from every unit in a population, then that is called a census. This is normally performed by government agencies only as population is usually very large and the data collection process requires a lot of time and energy to conduct plus high cost. Most of the time data is collected from samples only. Data can be collected from an experimental or observational study Experimental Study – two different teaching methods are conducted and student’s performance from the two methods is compared. Observational Study – data is collected through observation without applying any treatment on the object. This can be through surveys or by just observing the behavior of customers in a hypermarket to find out how they choose what to buy. For both experimental and observational studies, proper sampling plans must be used to make sure the sample represent the population. Sampling methods: (a) Simple Random Sample – a sample is chosen randomly from the population. This can be done by using random numbers generated using software, a table or just your calculator. (b) Systematic Sample – every kth element from a population is chosen, starting from a randomly selected element. This method can be used if every element can be sequentially numbered. (c) Stratified Random Sample – the population is divided into various strata based on some condition. Then, subsamples are taken from each strata using simple random or systematic method. The subsamples are then combined to form the sample. For example we want to do a survey on students about campus facility. We divide students according to their Academic Year, 1st, 2nd and 3rd year and then take subsamples from each group. 3 (d) Cluster Sample – when the population can be divided into clusters which most often occur naturally, we take subsample from clusters and this is called cluster sampling. Sometimes not all available clusters are sampled. For example we want to study teaching and learning skills in schools. We might choose five states, Pahang, Sarawak, Kelantan, Johor and Perak. Then we randomly select a few schools from each state. The states in Malaysia are the clusters. 4