Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 679: Advanced NLP Lecture #1: Introduction to Text Mining Objectives for Today 1. 2. 3. 4. 5. Quick course info. Overview of Text Mining Discuss your applications of Text Mining Elements of Text Mining Introduce course objectives Course Info. Office Hours: Tue & Thu. 3-4pm (without appointment) OR by appointment TA: TBD Web page: https://facwiki.cs.byu.edu/cs679 Syllabus Regularly updated schedule: Due dates, Reading assignments, Projects guidelines, Lecture Notes Google Group “BYU CS 679” Email: ringger AT cs DOT byu DOT edu Grades: http://gradebook.byu.edu Assignments Readings – with max. one page reports Mostly research papers (see course web page for all hyperlinks) Usually one reading report per week Intro. Projects Presentation Report Semester Project Proposal Presentation Report Course Policies Early Late Grades Other See Syllabus for details Text Mining The process of discovering previously unknown information in large text collections Paraphrased from M. Hearst Other Definitions Looking for patterns in unstructured text (Nahm) Text mining applies the same analytical functions of data mining to the domain of textual information (Doore( “Search” versus “Discover” Structured Data Unstructured Data (Text) Search (goal-oriented) Discover (opportunistic) Data Retrieval Data Mining Information Retrieval Text Mining Credit: adapted from slide by Nathan Treloar, AvaQuest Your Exciting Applications F2011: Your Exciting Applications W2011: Exciting Applications 2010: Exciting Applications 2009: Exciting Applications Additional Applications News Mining Sentiment Detection Summarization Trend Analysis Association Detection Course Objectives Acquire experience conducting exploratory data analysis on large collections of text Gain in-depth experience with and understanding of approaches to document classification sentiment classification feature engineering feature selection document clustering unsupervised topic identification visualization, including document summarization Build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis Course Objectives (2) Obtain experience with techniques for evaluating and visualizing the results of unsupervised learning processes Independent investigation of methods of your choice! Application of your methods to learn something important from a significant text corpus of your choice Simplistic Text Mining Process Credit: NCSA Methods Feature Engineering Feature Selection Information Extraction Categorization (Supervised) Clustering (Unsupervised) Topic Identification / Topic Modeling Visualization Some Available Data Sets 20 Newsgroups -- Usenet Reuters (1990s) newswire Del.icio.us bookmarked web pages Enron Email Movie Reviews Gamespot game reviews General Conference State of the Union Campaign Speeches … Yours! Assignment Reading for next time: Course Syllabus "Tapping the Power of Text Mining" by Fan et al. (CACM 9/2006) "Text-Mining the Voice of the People" by Evangelopoulos et al. (CACM 2/2012) Skim: Alta Plana Text Analytics Report Reading Report #1 % Completed Questions