Download Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction
LING 575
Week 1: 1/08/08
1
Plan for today
• General information
• Course plan
• HMM and n-gram tagger (recap)
• EM and forward-backward algorithm
2
Before next time
• Select papers that you’d like to present
– Reply to the 1st message at GoPost by noon
Saturday
• Read M&S 9.3.3
– Remember to hand in your questions next
time.
3
General information
4
General info
• Course url: http://courses.washington.edu/ling575x
– Syllabus (incl. slides, assignments, and papers):
updated every week.
– GoPost:
– Collect it:
• Please check your emails at least once per day.
5
Office hour
• Email:
– Email address: [email protected]
– Subject line should include “ling575”
– The 48-hour rule: it works both ways
• Office hour:
– Time: Fr: 10:30-11:30am
– Location: Padelford A-210G
6
Slides
• The slides will be online before class if
possible.
• The final version will be uploaded a few
hours after class.
7
Prerequisites
• CS 326 (Data Structures) or equivalent:
• Stat 391 (Prob. and Stats for CS) or equivalent: Basic concepts in
probability and statistics
• Programming in Perl, C, C++, Java, or Python
• LING570
• LING572
• Being comfortable with formulas
8
Grades for LING575
No midterm or final exams.
Graded:
• Assignments (5):
• Presentation:
45-60%
15-25%
Not graded:
• Reading:
5-10%
• Class participation: 10-20%
9
Assignments
• Assignments:
– Due at 2:30pm on Tuesdays
– 1% penalty for each hour after the due date. Nothing
accepted after 4 days.
– Submit via CollectIt
• Reading:
– Papers should be read before class.
– Bring at least two questions to class.
– Your answers will be checked but not graded.
10
Presentation
• Select your week by noon this Saturday (1/12) by
replying to the GoPost message:
– first come, first service
• If later for whatever reason, the week you selected no
long works for you, it is your responsibility to find
someone to switch.
• For your week, email Fei the slides by noon the Monday
(i.e., the day before your presentation).
– 1% penalty for each hour after the due date.
11
Patas
• If you need to have a patas account, you need to email
[email protected] right away to get an
account.
• The directory for LING575:
~/dropbox/07-08/575x/
– hw1/, hw2/, ….: Assignments and solution
– hmm/: A pre-existing HMM package
– misc_slides/: Solution to exams and misc slides that
are not on the course url.
12
Course plan
13
ML learning
• Supervised learning: LING572
• Semi-supervised learning:
– Some annotated data, plus a large amount of
annotated data
– Ex: self-training, co-training, transductive SVM
• Unsupervised learning:
– There are no annotated data
– Ex: EM
14
Unsupervised learning
• No annotated data
• But the knowledge has to come from somewhere.
– Dictionary / lexicon
– Seed examples
–…
 We choose unsupervised POS tagging as a case to
study.
15
Supervised POS tagging
• It is a sequence labeling problem.
• Statistical approach:
– Sequence labeling algorithms: HMM, MEMM, CRF,
…
– Classification algorithms: decision tree, naïve Bayes,
MaxEnt, SVM, Boosting, ….
• Most unsupervised POS tagging algorithms use EM to
estimate HMM parameters.
16
Major approaches to
unsupervised tagging
• All assume a large amount of unannotated data
• Approach #1: use EM to estimate HMM
– No lexicon
– With full lexicon
– With filtered lexicon
17
Major approaches (cont)
• Approach #2: clustering the words based
on
– distributional cues
– morphological cues
• Approach #3: cross-lingual approach:
– It requires parallel data
– Seeds are created by projecting POS info
from one language to the other.
18
Major approaches (cont)
• Approach #4: Prototype learning:
– It requires a small number of prototypes: e.g.,
“book” is a noun, “the” is a determiner.
– Prototypes would help to label other words.
19
In this course
• We will
– discuss the papers in each category
– explore various methods aiming at improving
the start of the art.
• Compared to last year’s ling573, this
course focuses
– more on machine learning
– less on search and rule writing
20