Download Reference Report

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining 1
Data Mining Industry Report
Section 1: General information about the data mining industry
Professionals in the data mining industry are in demand now and will be for years to
come simply because of the data mining is one of the primary building blocks of the customer relationship management revolution. Employment prospects will be particularly
attractive if you combine the statistical techniques you learned in school with the "data
detective" skills that can only come from extensive, in-the-trenches experience.
The data mining analyst is the person who understands the information contained in the
data and can evaluate whether the output of the analytical or mining stages truly makes
sense in the specific business domain. The proper use of predictive models must be
carefully integrated into the actual business processes so the models can be properly
evaluated and updated. The various data mining tools have certain strengths and certain weaknesses. The tool and its use must be properly matched to the expertise of the
user and that person’s objectives in using the tool.
Different levels of analysis:
•
Artificial neural networks: Non-linear predictive models that learn through training
and resemble biological neural networks in structure.
•
Genetic algorithms: Optimization techniques that use processes such as genetic
combination, mutation, and natural selection in a design based on the concepts of natural evolution.
•
Decision trees: Tree-shaped structures that represent sets of decisions. These
decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for
classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments
a dataset by creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART typically requires less data preparation than CHAID.
•
Nearest neighbor method: A technique that classifies each record in a dataset
based on a combination of the classes of the k record(s) most similar to it in a historical
dataset (where k 1). Sometimes called the k-nearest neighbor technique.
•
Rule induction: The extraction of useful if-then rules from data based on statistical significance.
•
Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.
There are two basic kinds of data miners.
One has just enough programming proficiency to execute the statistical steps, or procedures, required for analytical projects. The other is able to manipulate data in complex
and sophisticated ways. Consider, for example, a predictive modeling project where the
Data Mining 2
data miner wishes to create a derived field to act as a potential predictor variable and
where the analysis file must be manipulated in a complex way to achieve this end.
A data miner with weak programming skills either will have to forgo this variable or depend on others for assistance. A data miner with strong skills, however, will meet this
challenge with ease.
Often, analysts with weak programming skills work at companies with large statistical
staffs, proprietary data mining systems and rigid processes. Some analysts equate data
mining with pushing buttons in the prescribed sequence indicated by the company
manual. They have little appreciation of the "eureka moments" that occur when laborious digging unearths a paradigm-shifting fact or market segment.
You do not want to become one of these individuals.
Section 2: Factors to consider which will influence your career
2.1 The size of the company
At a small company, you will have a greater effect on the organization. However, opportunities for growth might be more limited. Also, there may be fewer chances to latch on
to experienced mentors who will push you to achieve your potential.
2.2 Your ability to communicate effectively
Effective communication and an understanding of direct marketing. As an ambitious data miner, you must develop an appreciation of how the results of analytical projects are
leveraged by marketers and fit into the company’s overall strategy. The astute analyst
soon will realize that it takes much more than just statistics and programming virtuosity
to break into the elite of the industry.
It will be critical to think of yourself as a quantitatively grounded direct marketer rather
than as just a technician. As you evolve into a well-rounded business professional, you
must develop the ability to communicate clearly and concisely. This will allow you to
work effectively with experienced professionals in marketing, sales and business development, many of whom will have MBAs and more years of experience than you have.
Do not be discouraged if you find it difficult to master the business and communications
side of direct marketing. It is understandable because you accustomed to focusing on
numbers and code.
Section 3: Types of career paths for data mining analysts
Typically, data miners follow one of two very different career paths.
Data Mining 3
Some remain on the technical side and eventually either move up the ranks to manage
an entire staff of analysts or transition into the related field of data warehousing and
processing. Others evolve into generalists and become senior-level marketers or strategy consultants.
Establish a solid foundation of statistical techniques and programming skills. Regardless, you first must establish grounding in the basics of data mining. Strive to develop
deep expertise in core analytical techniques such as clustering and predictive modeling.
You should also work to become an excellent programmer in the widely used analytical
packages such as SAS. Below is a list of some types of data mining job opportunities.
3.1 Junior Data Mining Analyst
A junior analyst searches for the appropriate data and provide sufficient material to be
analyzed. They have to prepare statistical diagrams and flowcharts.
Duties and responsibilities
1. Understand internal client business needs and how Data Mining fits into their success: work with team members to identify business issues and research needs; support
business planning efforts.
2. Gain proficiency of data sources, tools (such as SAS), and methodologies: work with
team members to identify research objectives and applicable methodologies. Gain
hands-on experience with all appropriate data sources.
3. Implement projects and report findings: analyze database information in response to
data analysis requests. Activities include, but are not limited to:
•
Provide business intelligence through data analysis and standard reporting
both ongoing and ad-hoc.
•
Compile, interpret, and analyze data. Deliver work output including written
reports and presentations to clients and team members as needed to help facilitate
business decision making.
•
Conduct other data analysis as requested including business metrics, client profiles, and behavior statistics, campaign results and other data mining research.
•
Aide in data management by actively monitoring data sources to ensure
completeness and accuracy.
•
Support sales & marketing efforts by building processes to analyze,
measure, track results of efforts.
Administrative
•
Maintain project activity logs, regularly provide project updates, seek client
and team member feedback about the effectiveness and quality of work output, and
proactively participate in team meetings.
3.2 Senior Data Mining Analyst
A senior analyst has to consult and communicate with the client. He/she has to prepare
the final report and present it.
Duties and responsibilities
Data Mining 4
Identify cost of business opportunities via data mining. Investigate outliers and research
potential operational efficiencies.
Highly collaborative role working with network contract manager, regional directors and
provider relations to review cost of industry trends. Operating as the analytical leader for
a geo region to understand delivery of care, spend, etc.
Able to interview business experts to understand operational challenges and develop
potential solutions. Conducts independent analysis of high complexity under minimal
supervision and guidance.
Clearly communicates analytical results in presentations, abstracts, graphs or summaries with minor editing and input from manager to various levels of management.
Using SAS and other tools, designs, builds and enhances data systems and analysis
methods so they the serve complex, high level reporting requirements.
Critically reviews and revises existing analytical processes for efficiency.
Assesses business risks associated with analytical processes and data systems and
develops strategies to mitigate risks.
Manages projects and develop workplans which may coordinate activities of lower level
analysts and collaborate with other analytical teams. Provides technical guidance to
less experienced staff.
3.3 Data Analysis Project Manager
Duties include project organization and methodology. The project manager has to audit
the reports and make sure that they fulfill the needs of the client
Duties and responsibilities
•
Led and developed full scale of project plans and executions.
•
Responsible for more than one cross-company project at a time.
•
Define the project scope of work, financial plan, its goals and deliverable.
•
Managed all aspects of the project business plan and budget.
•
Managed the operational, financial and technological aspects of projects
based on time-lines and work plans.
•
Identified resources requirement, assigned responsibilities and coordinated directly and indirectly project staff to ensure successful completion of the project.
•
Tracked project deliveries using project management tools.
•
Managed the design of the project documents to monitor project performance and data stored.
•
Reported on project progress and communicated relevant information to
superiors.
Data Mining 5
•
Resolved, traced and escalated critical issues to minimize project risk fac-
tors.
•
Prepared the QA procedure of the project.
•
Directed, supervised, supported and coordinated the project staff. .
•
Communicated intensively with clients, sub-contractors and vendors to establish cordial/effective working relationship.
•
Followed up with clients to verify satisfaction.
3.4 Data Warehouse Analyst
If you think you would like eventually to branch out beyond data mining but want to remain on the technical side of the business, you will be in an ideal position to transition
into the exploding field of data warehousing and processing. As a successful analyst,
you will have honed your logic and data detective skills. Also, you will have become an
accomplished programmer.
Stories of data warehousing disasters circulate throughout this industry. You will be in an
ideal position to avoid these pitfalls, and your employer and clients will recognize this.
An estimated 30,000 new jobs will be created in the direct marketing industry in the next
five years. Many will be in data warehousing and processing. Do not be concerned if
you need additional training to learn a new programming language, for example. There
is such a shortage of experienced personnel that many employers will contribute toward, or even pay all of, your tuition.
Duties and responsibilities
•
Understand the business users’ requirements for information and communicate
them to the rest of the data warehouse team;
•
Lead and conduct interviewing task;
•
Lead interview documentation;
•
Assist DW data analyst in analyzing existing reports and identifying iteration metrics;
•
Lead preparation of data warehouse requirements document
•
Assist data analyst in mapping task;
•
Analyze existing reports;
•
Lead the identification and documentation of business metrics ;
•
Determine systems of record with the assistance of appropriate source system
experts;
•
Help identify potential sources of data for the data warehouse;
•
Oversee testing of data acquisition processes and their implementation into production;
•
Act as consultant to the ETL and front-end programmers.
Depending on how technical a business analyst is, he or she may also:
•
Help data modelers prepare models, and
•
Review models to ascertain that requirements are met.
Data Mining 6
3.5 Marketing Analyst
The data mining analyst, who thoroughly understands, from a business perspective,
what the client wants to accomplish and assists in translating those business objectives
into technical requirements to be used in the subsequent development of the data mining model(s).
Duties and responsibilities
As a marketing analyst, you'll gather consumer information and examine buying trends
to create marketing plans for companies. One of your primary job duties in this career is
to design surveys that identify consumer preferences and prospective markets for products. You'll conduct these surveys over the phone, on the Internet, through the mail and
in focus groups. A marketing analyst usually oversees a team that helps with the surveying process.
Once this research has been completed, you'll evaluate the feedback and organize it
into reports for company use. You'll also advise your employer on what products will be
most beneficial to produce, as well as on the design, distribution and promotion of these
products. With the information you provide, your employer is able to target the most
profitable markets in order to generate the maximum amount of revenue possible.
3.6 Data Mining Research Analyst
The Research Analytics team supports the Research department and executive management with strategic planning, business & market intelligence and data mining/modeling services.
Duties and responsibilities
•
Perform market data research and analysis to identify and resolve data issues
using advanced data mining techniques.
•
Develop proprietary data mining tools and applications.
•
Develop predictive models.
3.7 Data Mining Analyst Consultant
Help your clients develop quantitative models for creating strategies for addressing
good process for approaching a problem. With these models and strategies your client
can solve problems quickly and effectively. As a data mining analyst consultant, you can
help your clients to meet challenges effectively and capitalizing on the possible opportunities.
Duties and responsibilities
Modeling and Forecasting
Build predictive models using advanced statistical techniques making use of the highvolume data available with the bank.
Data Mining 7
Business Strategy
Businesses are under tremendous pressure to generate revenues and increase profit.
As a consultant you can provide consulting for portfolio of initiatives that drives longterm performance.
Market Research solutions
Market research enables the companies to identify opportunities for growth and understand how to most effectively position themselves in the market so as to take full advantage of the opportunities.
Miscellaneous solutions
Converting business data into highly effective and insightful reports and presentations.
•
Customer design and support
•
Customer reporting
•
Development of Automation tool
Section 4: Types of industries which hire data mining analysts
Data mining jobs are found primarily in the technology, finance, healthcare and pharmaceutical fields. They can range from social media and digital media analysts who focus
on enterprise-level data mining to PhD-level quantitative analysts who mine millions of
data units for investment banks and hedge funds. In the pharmaceutical industry, data
mining analyst jobs tend to focus on statistical work involving analysis of pharmaceutical
marketing information and sales.
The ability to effectively cultivate product development capabilities is an important skill
to have for anyone considering data mining jobs in the technology field. Particularly in
the Internet realm, jobs in data mining are highly valued. Professionals in these positions support the immense data mining work that must be in effect for a consumerfacing technology company to succeed.
Many search engine companies and technology companies that build on search and
web crawler technologies, such as social media analytics firms, offer critical data mining
job opportunities for those who are qualified. Experience working with web analytics
platforms and databases built using Structured Query Language (SQL) constitute the
bulk of the data mining jobs found in companies offering search engine technology.
Finance firms all over the world are also places where people with data mining skills are
in increasingly high demand. In finance, data mining professionals or, quants as they
are more commonly called, are charged with creating better ways to visualize prediction
curves, valuation models, and other important aspects of financial quantitative analysis.
The data mining job description for quants typically involves a great deal of programming work in C++, a popular computer programming language used in banking and enterprise information technology systems. In addition, a quantitative professional in finance or banking must have a strong grasp of Visual Basics for Applications (VBA) to
use in Excel modeling and analysis.
Data Mining 8
Although not thought of as a particularly quantitative field, healthcare firms and large
pharmaceutical companies oftentimes present opportunities for data mining jobs. Using
statistics to predict future sales or to calculate the amount of risk involved in a product
launch or a branding change are some of the tasks that quantitative analysts in pharmaceutical companies are required to do. Usually a master's or a PhD degree in mathematics, statistics, economics, or another quantitative-based discipline is required for this
position.
Quantitative analysts for pharmaceutical firms provide much needed insight into which
drugs perform best on the market. Their work can also demonstrate why one product
performs better than another. Through analyzing product distribution channels as well
as constructing financial valuation models, the pharmaceutical quantitative analyst is
able to use data mining techniques to serve the firm's interests.
Below is a list of industries that employ data mining analysts.
Casinos
Communications
Education
Financial Services - especially banking, fraud detection, credit scoring, investment/stocks
Government/ Military/ Security/ Anti-terrorism
Health Care Providers
Health Insurance
Hotels
Insurance
Life Sciences
Manufacturing
Media Advertising
Oil & Gas
Retail
Social Policy/ Survey Analysis
Travel & Transportation
Utilities
Web usage mining
Section 5: Examples of the duties and responsibilities of data mining analysts
Data mining is primarily used today by companies with a strong consumer focus - retail,
financial, communication, and marketing organizations. It enables these companies to
determine relationships among "internal" factors such as price, product positioning, or
staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer
satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary
information to view detail transactional data.
Data Mining 9
With data mining, a retailer could use point-of-sale records of customer purchases to
send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and
promotions to appeal to specific customer segments.
For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its
cardholders based on analysis of their monthly expenditures.
WalMart is pioneering massive data mining to transform its supplier relationships.
WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and
continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse.
WalMart allows more than 3,500 suppliers, to access data on their products and perform
data analyses. These suppliers use this data to identify customer buying patterns at the
store display level. They use this information to manage local store inventory and identify new merchandising opportunities. In 1995, WalMart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining application that
can be used in conjunction with image recordings of basketball games. The Advanced
Scout software analyzes the movements of players to help coaches orchestrate plays
and strategies. For example, an analysis of the play-by-play sheet of the game played
between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals
that when Mark Price played the Guard position, John Williams attempted four jump
shots and made each one! Advanced Scout not only finds this pattern, but explains that
it is interesting because it differs considerably from the average shooting percentage of
49.30% for the Cavaliers during that game.
By using the NBA universal clock, a coach can automatically bring up the video clips
showing each of the jump shots attempted by Williams with Price on the floor, without
needing to comb through hours of video footage. Those clips show a very successful
pick-and-roll play in which Price draws the Knick's defense and then finds Williams for
an open jump shot.
These general forms illustrate what data mining can do.
Anomaly detection : In a large data set it is possible to get a picture of what the data
tends to look like in a typical case. Statistics can be used to determine if something is
notably different from this pattern. For instance, the IRS could model typical tax returns
and use anomaly detection to identify specific returns that differ from this for review and
audit.
Association learning: This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail
shaker and a cocktail recipe book also often buy martini glasses. These types of find-
Data Mining 10
ings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations.
Cluster detection: one type of pattern recognition that is particularly useful is recognizing
distinct clusters or sub-categories within the data. Without data mining, an analyst would
have to look at the data and decide on a set of categories which they believe captures
the relevant distinctions between apparent groups in the data. This would risk missing
important categories. With data mining it is possible to let the data itself determine the
groups.
This is one of the black-box type of algorithms that are hard to understand. But in a
simple example - again with purchasing behavior - we can imagine that the purchasing
habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly
from each other.
Classification: If an existing structure is already known, data mining can be used to
classify new cases into these pre-determined categories. Learning from a large set of
pre-classified examples, algorithms can detect persistent systemic differences between
items in each group and apply these rules to new classification problems. Spam filters
are a great example of this - large sets of emails that have been identified as spam have
enabled filters to notice differences in word usage between legitimate and spam messages, and classify incoming messages according to these rules with a high degree of
accuracy.
Regression: Data mining can be used to construct predictive models based on many
variables. Facebook, for example, might be interested in predicting future engagement
for a user based on past behavior. Factors like the amount of personal information
shared, number of photos tagged, friend requests initiated or accepted, comments, likes
etc. could all be included in such a model. Over time, this model could be honed to include or weight things differently as Facebook compares how the predictions differ from
observed behavior. Ultimately these findings could be used to guide design in order to
encourage more of the behaviors that seem to lead to increased engagement over time.
The patterns detected and structures revealed by the descriptive data mining are then
often applied to predict other aspects of the data. Amazon offers a useful example of
how descriptive findings are used for prediction. The (hypothetical) association between
cocktail shaker and martini glass purchases, for instance, could be used, along with
many other similar associations, as part of a model predicting the likelihood that a particular user will make a particular purchase. This model could match all such associations with a user's purchasing history, and predict which products they are most likely to
purchase. Amazon can then serve ads based on what that user is most likely to buy.
Section 6: Career Outlook for data mining analysts
Data Mining 11
The U.S. Bureau of Labor Statistics (BLS) states that data communication analyst positions would increase 53% from 2008-2018. Data analysts in business settings are reported to have had a median salary of $55,053 in 2011, according to Salary.com. The
website also reports that experience as a data analyst might slightly increase annual
salaries. With less than one year of experience, data analysts had a salary of $51,681$55,532 in 2011, while four or more years of experience raised the salary range from
$53,704-$56,764 per year.With experience and specialization a data mining analyst can
earn $120,000 per year.
The use of competitive intelligence by data scientists can pay big dividends to businesses who invest in these services. A May 2011 study by McKinsey Global Institute
suggests that retailers analyzing large data sets to their fullest could increase operating
margins by 60 percent and the health care industry could reduce annual costs by 8 percent or $200 billion.
However, the study also warns there is a significant shortage of qualified workers to analyze these data sets adequately. According to the report, a shortfall of about 140,000 to
190,000 individuals with analytical expertise is projected by 2018. The study also predicts a need for an additional 1.5 million managers and analysts by that same date to
fully engage the true potential of the currently available data.
While it may be conventional wisdom that data is growing exponentially, the actual
amount of that growth can be staggering to consider. A 2003 study conducted by the
University of California Berkley found that worldwide information production increased
30 percent each year from 1999 until 2002. In 2010, then-Google CEO Eric Schmidt
turned heads at the 2010 Techonomy Conference when he said people currently create
as much data every two days as was previously created in all of history up to 2003.
Section 7: Consultant
You may opt to freelance or establish a consulting firm that delivers analytical and statistical solutions and expert consulting services to identify new insights, drive strategic decisions, and create measurable results for your clients. To show clients your value consider providing ‘proof-of-concept’. To ensure that the most important business objectives
are being met and to ensure the investment in data mining is done in the most costeffective manner.
The proof-of-concept period is used to answer the following questions.
· What is data mining?
· What do the data mining tools really do?
· How should my raw operational data be structured to be compatible with data
mining?
· Which data-mining tool, or suite of tools, is best suited to meet my business
objectives?
· Is there hard evidence that can be generated by mining my data that shows that my
company should invest in data mining and deploy it in my business?
Data Mining 12
The proof-of-concept process is as follows.
1. Define the business objectives. Start with at most three objectives in order to focus
the study.
2. Identify the corporate data that contains information related to those business
objectives.
3. Create a sample data set that contains all relevant information.
4. Identify a domain expert(s) to work with a group experienced in knowledge
discovery systems.
5. Install the data in a facility that has the computational power to handle the size of
the data being examined and which has a suite of knowledge discovery tools suitable
to meet the business objectives.
6. The domain expert(s) works with the data mining expert(s) to determine which data
mining tool(s) are best suited to meet the business objectives.
7. Extract relationships and patterns from the business data set.
8. The domain expert(s) works with the data mining expert(s) to determine which
patterns and relationships are really relevant to the business objectives. Experience
in the CDI on a number of data mining projects has shown that surprising results
may occur at this stage. Underlying assumptions about how a business works, how
the market works, or how the customer behaves may change.
9. Develop models to predict how data mining results can assist in meeting business
objectives.
10. The company then decides what level of investment to make in data mining
consistent with their business plan.
At this point, a company will have significant evidence of how data mining can be employed to achieve a competitive advantage, training in data mining, and the skeleton of
a development plan for using data mining in a cost effective manner.
Section 8: See Emerging Trends & Opportunities PDF
Section 9: Data Mining Analyst Job Boards
1. KD Nugget http://www.kdnuggets.com/jobs/
2. Analytic Talent http://www.analytictalent.com/
3. iCrunchData http://www.icrunchdata.com/Statistician-Jobs.aspx
4. StatVista http://www.statvista.com/jobs/default.asp
Summary:
There's now an intellectual consensus in business that the only way to run an enterprise
is to use analytics with data scientists to find opportunities. Because of the immense
opportunity for strategic insight buried in all that data corporations now have an unlimited demand for people with background in quantitative analysis.