Download Suggested solutions to questions and exercises

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 5
Doing secondary research
Suggested solutions to questions and exercises
1. What are secondary data?
Secondary data are data that already exist in some form. They have not been created
specifically for the purpose at hand (unlike primary research data) but were originally
collected for another purpose and are now being put to a ‘second’ use.
2. What is meant by the term ‘secondary data analysis’?
Secondary data can be used in further or secondary analysis. Hakim’s (1982) definition
of secondary data analysis is ‘any further analysis of an existing dataset which presents
interpretations, conclusions or knowledge additional to, or different from, those presented
in the first report on the inquiry and its main results’. The aim of secondary analysis
therefore is to extract new findings and insights from existing data.
3.








Describe the main uses of secondary data.
To answer the research problem without the need for primary research
To get a better understanding of the issues and the wider context of the problem
To help define the problem
To help in the development and formulation of hypotheses
To help determine the nature of the evidence required to address the problem
To help formulate an effective research design
To enrich the interpretation of the primary data
To set the findings from the primary research into a wider context
4. What questions would you ask in evaluating the usefulness of secondary data?
Take a look at the ‘Internet Detective tutorial’ at the Social Science Information Gateway
(SOSIG) website (www.Sosig.ac.uk/desire/internet-detective.html). The aim of the
tutorial is to raise awareness of the quality of information found on the Internet and to
encourage you to evaluate it critically before using it in your work.
 Why were the data collected? What were the original research objectives?
 Who commissioned the research?
 Who conducted the research?
 How accurate are the data?
 What quality standards were employed in the research process?
 What was the research design?
 What sampling procedure was used?
 What was the sample size?
 What method(s) of data collection were used?
 What was the response rate?
1







How good was the design of the questionnaire or discussion guide?
How were the parameters or variables defined? (for example, definition of social
class, income and family as well as other key variables may vary)
How were the data processed and analysed?
Are the data weighted? If so, what is the basis of the weighting procedure?
How were missing values handled?
How old is the data? When was it first collected? Is it out of date?
How useful is the information?
5. Describe the main sources of secondary data useful to the market or social researcher.
There are two main sources of secondary data: internal and external secondary data.
Internal data are those generated by the organisation, for example, data from previous
research, financial data and, crucial to the marketing function, sales data. External data
are data gathered by those outside the organisation.
Internal secondary data includes those captured at the point of interaction with the
customer. Internal secondary data can be stored in and retrieved from databases and data
warehouses designed to function as Management Information Systems (MIS) or
Marketing Information Systems (MkIS). Such systems are often referred to as Decision
Support Systems. They are structured in a way that allows users to search for and
retrieve secondary data.
External data, which can also be integrated into an organisation’s Decision Support
System, are data generated by those outside the organisation. External secondary data are
produced by government departments, agencies and related bodies, and sometimes
referred to as official statistics. Those produced by trade bodies, commercial research
organisations and business publishers, are sometimes called unofficial statistics. Most of
these data are available in hard copy format, from the publisher or source or from a
library, or via online and offline (CD-ROM) databases.
6. What is a data warehouse? What are the key characteristics of a well-designed data
warehouse?
A data warehouse is a repository for data. In effect, it is a very large database that
contains data usually from more than one source. It is a central storage facility that takes
the concept of a data archive one step further, in that different datasets within the
warehouse are integrated and elements in one set can be related to elements in another set
(known as a relational database). Data stored in the warehouse are data that are useful for
supporting management decision-making, for example, for marketing and sales
management, or customer relationship management. Data warehouses (and the tools
used to extract information from them) are sometimes called Decision Support Systems
(DSS). The data warehouse is designed or structured, and data in it given context, in
order to enhance this decision support role and to make access to the data in the
warehouse fast and efficient. The technology used in data warehousing allows quick and
easy retrieval of data derived from different internal and external sources, even those
using different formats or platforms. As with a data archive, data can be retrieved
2
remotely from the warehouse via a networked workstation and interrogated and analysed
using tools designed to deal with very large volumes of data.
The key characteristics of a well-designed data warehouse are that:
 it can store ever-increasing volumes of data without affecting processing performance
 it is user-friendly
 everyone has access to it regardless of location
 lots of users can use it at once with little effect on processing speeds
 it facilitates analysis of data from a variety of perspectives
 the speed of analysis and query answering is so fast that the search does not get in the
way of thinking about the problem.
7. Why are databases and data warehouses useful to market and social researchers?
Databases and data warehouses – effectively storage facilities for data – are useful to
researchers primarily because they allow access to data previously collected, enabling
them to be put to further use. The value or usefulness of a set of data is rarely exhausted
on its initial or primary application; the data may be useful in the same context at a later
date, or in a different context. Data stored in warehouses and databases are often useful
for supporting management decision-making, for example, for marketing and sales
management, or customer relationship management. Data warehouses and databases
(and the tools used to extract information from them) are sometimes called Decision
Support Systems (DSS) for this reason.
Databases and data warehouses are a rich source of secondary data, providing detailed
current and historic information about consumer behaviour, helping the researcher and
the decision-maker reach a different view of the market than that provided by traditional
market research. For example, databases can be analysed in order to identify sales
patterns by different outlet types and by different regions and patterns of buying
behaviour among customers. Analysis can also reveal the characteristics, demographic or
geodemographic, for example, that are associated with different behaviour patterns.
These patterns and characteristics can be used to build profiles of customers and outlets
and to identify market segments, and gaps in the market. ‘Shopping basket analysis’ can
show what sets of products or brands are bought together among the different segments,
for example, and which ones rarely occur together. By examining trends in behaviour
over time, the researcher can build models to predict behaviour, and sales volumes and
revenue. This information can be used to understand, for example, how profitable
different groups of customers or different types of outlet are, and what type of
promotions works best for which group.
Data stored in a database or warehouse may be retrieved or interrogated, for example, to
help the researcher do the following:
 Answer the research problem without the need for primary research
 Develop a better understanding of the issues and the wider context of the problem
 Help define the problem
 Help in the development and formulation of hypotheses
 Help determine the nature of the evidence required to address the problem
3



Help formulate an effective research design
Enrich the interpretation of primary research data
Set the findings from primary research into a wider context
Using data already collected can be much cheaper than carrying out primary research.
Data stored in databases or warehouses are relatively quick and easy to get hold of, unlike
primary data they are already available and relatively easy to access.
8. What is data fusion? What are some of potential problems you might come across in
fusing data from different sources?
Data fusion is the merging of data from different sources, for example, data held in
databases with data derived from other sources, including surveys and consumer panel
data, or data derived from different surveys. The aim of data fusion is to obtain insights
that could not be obtained from the sources individually. The process of merging or
integrating data from separate sources is known as data fusion.
The process of data fusion depends on being able to match individual records in one
dataset, usually according to demographic or geodemograhic details, with comparable
records in another dataset. The idea is that data collected from person X1 about attitudes
or buying behaviour, say, can be combined with data collected from person X2 on media
usage, who is similar in his or her demographic or geodemographic characteristics to
person X1. The fused data record (X1 plus X2) contains details of attitudes or behaviour
and media usage for what is assumed to be the same person.
Some of the potential problems in fusing data from different sources include problems of
format and software platform incompatibility and problems arising from the content of
the datasets, which can be harder to overcome than technological differences. If two sets
of data are to be fused it is essential that there are variables common to each set.
Common variables, say on demographics or product purchase, should be defined in the
same way and coded in the same way, so that they are measuring the same thing – the
analysis program takes them to mean the same thing. This has implications at the
research design stage, in particular for the design of the data collection instrument. If you
know that two sets of data may be merged, it is important to identify and define common
variables before data collection starts. If this is not possible, variables can sometimes be
manipulated and redefined at the processing or analysis stage.
9. What is geodemographic classification? Why might such a classification be useful?
A geodemographic classification is a classification based on a combination of geographic
data – derived, for example, from postal addresses – and demographic data, derived from
the Census of Population. Consumers can be classified according to their
geodemographic characteristics. A geodemographic classification can provide a better
understanding of consumer behaviour than demographic data alone. They are often used
as the basis of market segmentation systems, for targeting marketing activity, planning
store locations and distribution patterns and location of public services.
10. What is data mining? What are the main uses of data mining? Give examples.
4
Data mining, also known as Knowledge Discovery in Databases (KDD), is the process by
which information and knowledge useful to decision-makers is mined or extracted from
very large databases using automated techniques and parallel computing technology.
Some of the techniques used in data mining are similar to those used in standard and
multivariate data analysis. A data mining program can manipulate the data, combining
variables, for example, and allowing the user to select elements or sections of the
database for analysis; it can provide basic descriptive statistics, look for associations and
relationships between variables, and perform cluster analysis. Where data mining differs
from other data analysis techniques is in the volume of data it can process and analyse,
and in its ability to discover patterns and relationships that cannot be detected with
standard analysis techniques. And it does this at high speed, producing answers to
queries or searches almost immediately, by using parallel computing technology. The
data mining system divides the workload between a set of parallel processors, enabling
streams of data to be processed simultaneously, in parallel. Speed of processing can be
further enhanced if the database is structured in a particular way, for example, if it is
divided up or ‘partitioned’ into smaller units or packets; the data mining program works
on each partition in parallel.
There are two approaches to data mining: verification and discovery. In the verification
approach, you already have an idea about patterns of behaviour or relationships between
variables – you have formulated a hypothesis, and you want to test the hypothesis in the
data. You take the discovery approach, on the other hand, if you have no clear idea about
patterns and you want to find out what hidden treasures exist amongst the mass of data.
You get the computer to search and explore the database in order to find patterns and
relationships. The computer program searches the database for these patterns and
relationships by getting to know the data, and learning the rules that apply within the
database, identifying how all the elements relate to each other and what networks exist
within the data.
Data mining can be used to uncover information and insight about buying behaviour. For
example, it could be used to answer the following:
 How many units of brands X and Y did we sell in UK, India and Australia in the last
financial year?
 What was the split between pack sizes in each market for each brand?
 What is the customer profile of each brand in each market?
 What is the customer profile of each brand pack size in each market?
 If a customer buys brand X what is the likelihood that he or she also buys brand Y?
 What other brands does the customer buy besides X or Y?
5