Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Understanding the field & setting expectations ANALYTICS AND DATA SCIENCE BACKGROUND Personal Academic International UNT Alumni (Mathematics) Economics & Mathematics Professional Academic Research, Hilton, Ansira, Sabre ANALYTICS & DATA SCIENCE DEFINED Analytics: Discovery and communication of meaningful patterns in Data Data Science: The novel application of algorithms and statistical techniques to solve business problems. Reality: Different meanings at different companies A relatively new field The culture of the company determines the nature of work that you do Most Companies are in the process of defining their analytics strategy Titles common to the field: Data Scientist, Analytics Consultant, Statistical Modeler, Risk Analyst, Statistician. TYPE OF PROBLEMS TYPICALLY ENCOUNTERED Forecasting “Predictive Analytics”: Classification Customer Retention/ Churn Modeling Who is likely to leave for a competitor Recommendation Engines Logistic Regression, SVM, Random Forest, Gradient Boosting Fraud, Customer Acquisition Netflix Challenge Customer Choice Modeling What will people buy Multinomial Logit Model Optimization Market Mix Modeling Clustering/ Market Basket Analysis DATABASES & BIG DATA Most Companies house their data in relational databases Hadoop -An open source distributed framework for storing and processing large amounts of data Oracle, Teradata, IBM DB2, Microsoft SQL SQL queries used to retrieve data SQL: a basic entry level requirement to work in this field Most of tasks require significant amounts of time and energy combining tables and data Petabytes Java based Map-Reduce Pig, Hive-SQL syntax-Facebook, Impala-SQL syntax, Spark Spark – UTD offers a Spark Course HTML JSON PROGRAMMING LANGUAGES Statistical Programming Languages R- Open Source, easy to learn, unparalleled no. of packages and functionality, Memory Limitations. SAS – Very Common in Businesses but losing popularity, expensive, losing market share to R, handles large data sets well. Python – Versatile, reasonable no. of packages, R’s biggest competitor. Matlab – More common in Engineering field. General Programming Languages JAVA – Not knowing java has cost me at least 4 jobs. C/ C++ - For writing faster R programs Scala – Spark more common among people on the forefront of development INTERNATIONAL STUDENTS Search for positions you are overqualified for. State your status as soon as possible Some companies have policies against hiring international students. myvisajobs.com More likely to sponsor you See companies that are sponsoring See salaries for negotiation purposes Others. THINGS YOU MUST HAVE UNDER YOUR BELT SQL Experience with Large Data Sets Get exposure JAVA Specialize in something Linux Experience Take courses Free courses at UNT Very Strong in at least one area (Optimization, Forecasting, Classification) 10k records is no large SAS/ R Fundamental Requirement Learn it. Multiple Projects (At least 3)- Code Research Paper, Apply a technique to company data, participate in Kaggle, do internship. RECRUITING Universities Companies UTD – School of Management/ Operations Research OSU (Oklahoma) – Analytics and Data Mining Programs UNT-Economics SMU- Statistics Economics, Mathematics, Statistics, Operations Research, Computer Science, Engineering. AT&T, Sabre, Epsilon, Amazon, AnalyticRecruiting.com (lots of Phone Interviews), Kforce.com (Very Promising and takes care of Visa issues) MISCELLANEOUS Kaggle.com Internships are extremely important The Home of Data Science Company recruiting & Pays winners Many Kaggle winners manage Analytics teams Compete! Get recognized. AT&T, Sabre, Epsilon, Amazon, Santander, Capital One in Plano Companies prefer to hire Mathematicians Never accept first offer Jumping around vs. Staying at one company They always divide by 2 Dallas R user group- Network Meetup.com – Network Informs local chapter BOOKS The Elements of Statistical Learning: Data Mining, Inference and Prediction. The Art of R Programming The Theory and Practice of Revenue Management THANK YOU!