Download Lec 8 slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Social Bonding and Nurture Kinship wikipedia , lookup

Social Darwinism wikipedia , lookup

Sociological theory wikipedia , lookup

Social psychology wikipedia , lookup

Unilineal evolution wikipedia , lookup

Social theory wikipedia , lookup

Social constructionism wikipedia , lookup

History of social work wikipedia , lookup

Social network analysis wikipedia , lookup

Social computing wikipedia , lookup

Social perception wikipedia , lookup

Third Way wikipedia , lookup

Community development wikipedia , lookup

Social group wikipedia , lookup

History of the social sciences wikipedia , lookup

Tribe (Internet) wikipedia , lookup

Transcript
Computational Extraction of Social and
Interactional Meaning from Speech
Dan Jurafsky and Mari Ostendorf
Lecture 8:
Positioning & Authority
Mari Ostendorf
Announcements
 Reminder that course evaluations need to be done
today.
 HW 4 grades & comments will be emailed tonight.
 Lecture notes will be posted later
Outline
 Positioning: what’s the objective?
 Feature learning, selection & evaluation
 Partisan positioning -- visualization
 Social positioning – classification
Outline
 Positioning: what’s the objective?
 Feature learning, selection & evaluation
 Partisan positioning -- visualization
 Social positioning – classification
Types of Positioning: Authority
 Institutional roles vs. emerging roles
 Institutional: assigned by a higher authority
 CEO, project manager, technical staff, …
 News anchor, reporter, invited guest, talk show host, …
 Professor vs. post-doc vs. grad student vs. undergrad student
 Wikipedia discussions: administrator, registered user, …
 Emerging: earned within the group
 Linguistics institute: happy hour organizer
 Student project: leader, note-taker, …
 Wikipedia discussions: page expert, mediator, …
 Managerial vs. expertise vs. social
 Managerial: running meetings, assigning tasks
 Expertise: thought leader, topic/skill expert
 Social: family roles, social class
Types of Positioning: Group Association
 Politics:
 democrat vs. republican
 pro-life vs. pro-choice
 Field politics
 pro-Chomsky vs. anti-Chomsky
 generative vs. discriminative modeler
Positioning: Roles vs. Relationships
 Roles:
 Actively establish one’s identity in the group
 Characteristic talk
 Relationships
 Confirm relative positions (politeness,
upspeak/downspeak)
Applications
 Classification/detection of…
 people in terms of roles
 power/status relationships
 successful control or persuasion
 Visualization
 Identifying traits of groups
 Position on a scale (e.g. how far “right”?)
 In either case, we might want to track changes over time
Applications (cont.)
 Detecting social roles/relationships
 Provides a mechanism for identifying leaders and group
structure
 Identifying features that indicate that information
provides groundwork to help our understanding of
 How people establish their identities in online
communities
 How respect is earned (or lost) in social networks
 Communication barriers across roles
Outline
 Positioning: what’s the objective?
 Feature learning, selection & evaluation
 Social positioning
 Partisan positioning
Feature Terminology
 Extraction:
 Counting and optionally weighting word n-grams, POS n-
grams, phrases, syntactic constituents
 Learning phrase patterns
 Transformation: mapping raw features to new space
 Linear transformations (e.g. PCA, SVD, CCA, …)
 Kernels
 Selection: in/out decision on features
 Evaluation: feature score
 General (roughly task motivated) measures
 Direct impact on task objective
Features: What’s the Objective?
 Classifier/detection performance
 Goal of feature extraction is to have separable classes
 Goal of feature selection is to avoid over-training
(generalize to new domains) & reduce computational cost
 Evaluation in terms of classifier performance (objective)
 Visualization (conceptual or actual)
 Goal of feature extraction and selection is data reduction
(get a small number of human-interpretable factors)
 Evaluation in terms of utility to human analysts (subjective,
so can be difficult to measure)
More terminology
 Monroe et al. critique of classification is not about
classification models but rather the objective of
classification (minimum error)
 Generative models (which they advocate) can be used
for classification
Outline
 Positioning: what’s the objective?
 Feature learning, selection & evaluation
 Partisan positioning -- visualization
 Social positioning – classification
 More on feature learning
Fighting Words (Monroe et al.)
 Search for an intuitively meaningful score of words as
characteristic of groups
 Reinterpretation from a statistical learning perspective:
search for a good generative classification model
 Use a generative statistical model with smoothing
 Reduce impact of estimation error by sharing parameters
 Note: the issue of holding out independent test data is
not addressed here!
No-Model
Heuristic:
Relative diff in
frequencies
Estimate topicgroup-dependent
word frequencies
Why is this foolish?
• High frequency
words dominate;
• Substantial
estimation errors
when data divided
by topic&group
Unigram:
Log likelihood
ratio (LLR)
LLR is better at
ruling out function
words (standard
detection statistic)
EX: (.11,.10) v (.01,.008)
Diff= .01 v .002
LLR= 1.1 v 1.25
Estimation errors a
problem for
infrequent words
Bayesian
model:
unigram w/
Dirichlet prior
Prior provides:
• Cross-group
smoothing;
• a form of parameter
sharing
Bayesian
model:
unigram w/
Laplace prior
Laplace prior:
L1 regularization,
drives small sub-group
parameters to zero
Visualizing
temporal
dynamics of
partisan
language
Comment
 Log likelihood ratio is a decision criterion – key
advantage of the approach is better estimation of
p(w|class)
 The class-conditional distributions could be used with
other criteria, e.g. information gain
Outline
 Positioning: what’s the objective?
 Feature learning, selection & evaluation
 Partisan positioning -- visualization
 Social positioning – classification
 Power relationships
 Institutional role
Social Power Relationships (Bramsen et al.)
 Enron email
 Detect “lects” associated with power relationships
 UpSpeak: directed to someone with greater authority
 DownSpeak: directed to someone with less authority
 PeerSpeak: equal authority (save for future work)
 Relevant sociolinguistic research
 Fairclough, 1989 – Language and Power
 Brown & Levinson, 1987 (recall politeness study on the
same data)
Register as Formality
 Brown & Levinson (1987)
 Petersen et al. (2010)
model of politeness
 Factors that influence
communication techniques
mapping for Enron email
 Influencing factors
 Symmetric social distance
 Symmetric social distance
 Personal vs. business
between participants
 Asymmetric power/status
difference between
participants
 Weight of an imposition
 Frequency of social contact
 Asymmetric power/status
 Rank difference
(CEO>pres>VP>director…)
 Weight of an imposition
 Automatic request classifier
 Size of audience
Formality Detection (Peterson et al. 2011)
 Features:
 Informal words, including: interjections, misspellings and
words classified as informal, vulgar or offensive by
Wordnik
 Punctuation: !, …, absence of sentence-final punctuation
 Case: various measures of lower casing
 Classifier: maximum entropy
 Results: 81% acc, 72% F
 Punctuation is single most useful feature; informal
words and case are lower on recall
Formality ≠ Rank Difference
UpSpeak/DownSpeak Detection
 Data: Enron emails (selecting only those with rank
difference)
 Features:
 mixed word, POS and word-POS n-grams (frequency >
min, class ratio > min)
 n-gram binning by relative frequency ratios & length
 Polite imperatives (human-engineered features)
 Classifier: SVM (also tried Adaboost, MaxEnt)
Up/DownSpeak Results
Features
# of features Acc (%)
Word unigrams + bigrams
7639
80.7
Above + tag unigrams/bigrams
9014
77.2
Binned n-grams
8
77.2
Above + polite imperatives
9
78.9
Notes:
Observations:
• Most likely class baseline = 74%
• Too much binning
•F-score omitted, since not defined
• Imperatives help
• Weighted test omitted, since not
matched to training
• Too many features lead
to over training
Aside…
Classifiers: What’s the Problem?
 Accuracy: all errors are equal
 Train/test with a representative class distribution
 Known cost of different error types
 Incorporate cost (or weighting) in training and eval criteria
 Detection of infrequent events (unknown cost)
 Training strategies vary, may use sampled data
 Evaluate with precision-recall or detection-error tradeoff,
optionally w/ area under the curve to get 1 number
 Evaluate with F-measure (or micro/macro-F)
 Best results when train/test objectives are matched
Useful n-grams for power diffs
 UpSpeak
 Let me know if you need anything
 I’ll let you know the final results soon
 We’re confident we’ll be successful
 DownSpeak
 Read this over and give me a call
 But that is not my problem
Up/DownSpeak is about more than formality.
Social Role Detection
 Social role or institutional role?
 Most examples are better referred to as “institutional”
 Prior work has looked at multiple speech source types
 Broadcast news (anchor, reporter, soundbite)
 Talkshows (host, reporter, invited guest, soundbite, call-in)
 Movies (hero, friends of hero)
 ICSI meetings (actual: professor, senior researcher, student)
 AMI meetings (acted, details shortly)
 Also some work on text (e.g. wiki discussions)
Talk Show Example
(in American English)
Spoken Language Role Features
 Lexical (keywords & phrases, dialog acts)
 Host: “coming up,” “out of time,” questions
 Reporter: “back to”
 Call-in guest of person on the street: disfluencies
 Structural (Turn duration, show time, overlaps,
sequential speaker probabilities)
 Soundbites have short show time, host has long
 Host has short turns, invited guest has long turns
 Guests have high probability of talking after host
 Prosodic (pitch range, pausing)
 Social network features (who talks when, in Garg et al.)
Structural Features
Who talks to who
How much does someone talk
3-D projection using LDA
A small number of roles can be learned
from unsupervised clustering
English: 85% accurate
Spectral clustering graph:
Similarity based on
structural & lexical factors
Social Network Features
Who talks when:
Actors
participating in
“events”
(discussion topics)
Social Role in Project Meetings (Garg et al.)
 Data: AMI meetings
 Role-prescribed project meetings
 138 meetings (roughly 15 min)
 4 actors: project manager, market expert, user interface
expert, industrial designer
 Features + classifier
 Lexical (word n-grams) + Boostexter
 Social network + Bernoulli model
 Classifier combination (linear log probs)
Results
Summary
 Words and interaction patterns provide a lot of
information about social roles and relations (and
affiliations and emotions and …)
 Machine learning is very useful, but more so if used
with the right objectives
 Annotating social stuff is not always easy, which
presents evaluation challenges