Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Text Mining SEC Filings for Fraud Detection Fletcher Glancy ISQS 7342 Research Issues 1. Can fraud be detected from SEC filings? 2. Can text mining provide a methodology for detection of potential fraud? 3. If text mining can provide an indication of potential fraud, which algorithm gives the best performance? 12/2/2008 Fletcher Glancy Brief Background • Corporate governance fraud has been a major concern, i.e., Enron, WorldCom, HealthSouth. • Detection has been after many years of abuse. • Most techniques involve ratio analysis. • Churyk et al. used Context Analysis to detect fraud in MDA of 10K filings. 12/2/2008 Fletcher Glancy Potential Strengths of Text Mining • TM can be automated. • The results can be used for further data mining. • TM eliminates researcher bias that is potentially present in Context Analysis. 12/2/2008 Fletcher Glancy Potential Problems/Weakness • There is no context in text mining, only statistics. • It is difficult to understand the relationships with a document-term matrix. • Unable to handle negatives or punctuation. 12/2/2008 Fletcher Glancy Narrow the Focus - Negatives • Antonyms – Word Opposites. • Negatives – not good = bad. • Interference by articles. Not a good day. • Interference by modifiers. Not highly motivated. 12/2/2008 Fletcher Glancy Possible Data Preparation Options • Preprocessing to remove articles. • Convert punctuation to text. Replace ‘;’ with semicolon. • Combine following noun with “not”. Not highly motivated becomes highly not_motivated. • Create not_noun and replace with antonym. not_dead is replaced with alive. 12/2/2008 Fletcher Glancy Testing Data Preparation Options • Select/Create text database. – 10K Notes and MDA. – Firms that have received AAER. • Preprocess with each alternative individually and cumulative. • Create document text matrix and SVD. 12/2/2008 Fletcher Glancy Testing Data Preparation Options • Calculate variance of document set using SVD. • Create logistic regression using set SVD and calculate variance. • Test for predictability using validation set. 12/2/2008 Fletcher Glancy Questions? Welcome to my potential dissertation topic! 12/2/2008 Fletcher Glancy