Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5 Doing secondary research Suggested solutions to questions and exercises 1. What are secondary data? Secondary data are data that already exist in some form. They have not been created specifically for the purpose at hand (unlike primary research data) but were originally collected for another purpose and are now being put to a ‘second’ use. 2. What is meant by the term ‘secondary data analysis’? Secondary data can be used in further or secondary analysis. Hakim’s (1982) definition of secondary data analysis is ‘any further analysis of an existing dataset which presents interpretations, conclusions or knowledge additional to, or different from, those presented in the first report on the inquiry and its main results’. The aim of secondary analysis therefore is to extract new findings and insights from existing data. 3. Describe the main uses of secondary data. To answer the research problem without the need for primary research To get a better understanding of the issues and the wider context of the problem To help define the problem To help in the development and formulation of hypotheses To help determine the nature of the evidence required to address the problem To help formulate an effective research design To enrich the interpretation of the primary data To set the findings from the primary research into a wider context 4. What questions would you ask in evaluating the usefulness of secondary data? Take a look at the ‘Internet Detective tutorial’ at the Social Science Information Gateway (SOSIG) website (www.Sosig.ac.uk/desire/internet-detective.html). The aim of the tutorial is to raise awareness of the quality of information found on the Internet and to encourage you to evaluate it critically before using it in your work. Why were the data collected? What were the original research objectives? Who commissioned the research? Who conducted the research? How accurate are the data? What quality standards were employed in the research process? What was the research design? What sampling procedure was used? What was the sample size? What method(s) of data collection were used? What was the response rate? 1 How good was the design of the questionnaire or discussion guide? How were the parameters or variables defined? (for example, definition of social class, income and family as well as other key variables may vary) How were the data processed and analysed? Are the data weighted? If so, what is the basis of the weighting procedure? How were missing values handled? How old is the data? When was it first collected? Is it out of date? How useful is the information? 5. Describe the main sources of secondary data useful to the market or social researcher. There are two main sources of secondary data: internal and external secondary data. Internal data are those generated by the organisation, for example, data from previous research, financial data and, crucial to the marketing function, sales data. External data are data gathered by those outside the organisation. Internal secondary data includes those captured at the point of interaction with the customer. Internal secondary data can be stored in and retrieved from databases and data warehouses designed to function as Management Information Systems (MIS) or Marketing Information Systems (MkIS). Such systems are often referred to as Decision Support Systems. They are structured in a way that allows users to search for and retrieve secondary data. External data, which can also be integrated into an organisation’s Decision Support System, are data generated by those outside the organisation. External secondary data are produced by government departments, agencies and related bodies, and sometimes referred to as official statistics. Those produced by trade bodies, commercial research organisations and business publishers, are sometimes called unofficial statistics. Most of these data are available in hard copy format, from the publisher or source or from a library, or via online and offline (CD-ROM) databases. 6. What is a data warehouse? What are the key characteristics of a well-designed data warehouse? A data warehouse is a repository for data. In effect, it is a very large database that contains data usually from more than one source. It is a central storage facility that takes the concept of a data archive one step further, in that different datasets within the warehouse are integrated and elements in one set can be related to elements in another set (known as a relational database). Data stored in the warehouse are data that are useful for supporting management decision-making, for example, for marketing and sales management, or customer relationship management. Data warehouses (and the tools used to extract information from them) are sometimes called Decision Support Systems (DSS). The data warehouse is designed or structured, and data in it given context, in order to enhance this decision support role and to make access to the data in the warehouse fast and efficient. The technology used in data warehousing allows quick and easy retrieval of data derived from different internal and external sources, even those using different formats or platforms. As with a data archive, data can be retrieved 2 remotely from the warehouse via a networked workstation and interrogated and analysed using tools designed to deal with very large volumes of data. The key characteristics of a well-designed data warehouse are that: it can store ever-increasing volumes of data without affecting processing performance it is user-friendly everyone has access to it regardless of location lots of users can use it at once with little effect on processing speeds it facilitates analysis of data from a variety of perspectives the speed of analysis and query answering is so fast that the search does not get in the way of thinking about the problem. 7. Why are databases and data warehouses useful to market and social researchers? Databases and data warehouses – effectively storage facilities for data – are useful to researchers primarily because they allow access to data previously collected, enabling them to be put to further use. The value or usefulness of a set of data is rarely exhausted on its initial or primary application; the data may be useful in the same context at a later date, or in a different context. Data stored in warehouses and databases are often useful for supporting management decision-making, for example, for marketing and sales management, or customer relationship management. Data warehouses and databases (and the tools used to extract information from them) are sometimes called Decision Support Systems (DSS) for this reason. Databases and data warehouses are a rich source of secondary data, providing detailed current and historic information about consumer behaviour, helping the researcher and the decision-maker reach a different view of the market than that provided by traditional market research. For example, databases can be analysed in order to identify sales patterns by different outlet types and by different regions and patterns of buying behaviour among customers. Analysis can also reveal the characteristics, demographic or geodemographic, for example, that are associated with different behaviour patterns. These patterns and characteristics can be used to build profiles of customers and outlets and to identify market segments, and gaps in the market. ‘Shopping basket analysis’ can show what sets of products or brands are bought together among the different segments, for example, and which ones rarely occur together. By examining trends in behaviour over time, the researcher can build models to predict behaviour, and sales volumes and revenue. This information can be used to understand, for example, how profitable different groups of customers or different types of outlet are, and what type of promotions works best for which group. Data stored in a database or warehouse may be retrieved or interrogated, for example, to help the researcher do the following: Answer the research problem without the need for primary research Develop a better understanding of the issues and the wider context of the problem Help define the problem Help in the development and formulation of hypotheses Help determine the nature of the evidence required to address the problem 3 Help formulate an effective research design Enrich the interpretation of primary research data Set the findings from primary research into a wider context Using data already collected can be much cheaper than carrying out primary research. Data stored in databases or warehouses are relatively quick and easy to get hold of, unlike primary data they are already available and relatively easy to access. 8. What is data fusion? What are some of potential problems you might come across in fusing data from different sources? Data fusion is the merging of data from different sources, for example, data held in databases with data derived from other sources, including surveys and consumer panel data, or data derived from different surveys. The aim of data fusion is to obtain insights that could not be obtained from the sources individually. The process of merging or integrating data from separate sources is known as data fusion. The process of data fusion depends on being able to match individual records in one dataset, usually according to demographic or geodemograhic details, with comparable records in another dataset. The idea is that data collected from person X1 about attitudes or buying behaviour, say, can be combined with data collected from person X2 on media usage, who is similar in his or her demographic or geodemographic characteristics to person X1. The fused data record (X1 plus X2) contains details of attitudes or behaviour and media usage for what is assumed to be the same person. Some of the potential problems in fusing data from different sources include problems of format and software platform incompatibility and problems arising from the content of the datasets, which can be harder to overcome than technological differences. If two sets of data are to be fused it is essential that there are variables common to each set. Common variables, say on demographics or product purchase, should be defined in the same way and coded in the same way, so that they are measuring the same thing – the analysis program takes them to mean the same thing. This has implications at the research design stage, in particular for the design of the data collection instrument. If you know that two sets of data may be merged, it is important to identify and define common variables before data collection starts. If this is not possible, variables can sometimes be manipulated and redefined at the processing or analysis stage. 9. What is geodemographic classification? Why might such a classification be useful? A geodemographic classification is a classification based on a combination of geographic data – derived, for example, from postal addresses – and demographic data, derived from the Census of Population. Consumers can be classified according to their geodemographic characteristics. A geodemographic classification can provide a better understanding of consumer behaviour than demographic data alone. They are often used as the basis of market segmentation systems, for targeting marketing activity, planning store locations and distribution patterns and location of public services. 10. What is data mining? What are the main uses of data mining? Give examples. 4 Data mining, also known as Knowledge Discovery in Databases (KDD), is the process by which information and knowledge useful to decision-makers is mined or extracted from very large databases using automated techniques and parallel computing technology. Some of the techniques used in data mining are similar to those used in standard and multivariate data analysis. A data mining program can manipulate the data, combining variables, for example, and allowing the user to select elements or sections of the database for analysis; it can provide basic descriptive statistics, look for associations and relationships between variables, and perform cluster analysis. Where data mining differs from other data analysis techniques is in the volume of data it can process and analyse, and in its ability to discover patterns and relationships that cannot be detected with standard analysis techniques. And it does this at high speed, producing answers to queries or searches almost immediately, by using parallel computing technology. The data mining system divides the workload between a set of parallel processors, enabling streams of data to be processed simultaneously, in parallel. Speed of processing can be further enhanced if the database is structured in a particular way, for example, if it is divided up or ‘partitioned’ into smaller units or packets; the data mining program works on each partition in parallel. There are two approaches to data mining: verification and discovery. In the verification approach, you already have an idea about patterns of behaviour or relationships between variables – you have formulated a hypothesis, and you want to test the hypothesis in the data. You take the discovery approach, on the other hand, if you have no clear idea about patterns and you want to find out what hidden treasures exist amongst the mass of data. You get the computer to search and explore the database in order to find patterns and relationships. The computer program searches the database for these patterns and relationships by getting to know the data, and learning the rules that apply within the database, identifying how all the elements relate to each other and what networks exist within the data. Data mining can be used to uncover information and insight about buying behaviour. For example, it could be used to answer the following: How many units of brands X and Y did we sell in UK, India and Australia in the last financial year? What was the split between pack sizes in each market for each brand? What is the customer profile of each brand in each market? What is the customer profile of each brand pack size in each market? If a customer buys brand X what is the likelihood that he or she also buys brand Y? What other brands does the customer buy besides X or Y? 5