Download Dia 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Agglutination wikipedia , lookup

Untranslatability wikipedia , lookup

Pleonasm wikipedia , lookup

Stemming wikipedia , lookup

Diff wikipedia , lookup

Morphology (linguistics) wikipedia , lookup

Malay grammar wikipedia , lookup

Transcript
An introduction to
CHILDES
Rianne Schippers
[email protected]
Outline
• What is CHILDES?
• Where do you find CHILDES?
• Why would you use CHILDES?
• How do you use CHILDES?
What is CHILDES?
• Child Language Data Exchange System
• Brain MacWhinney
• Online database of first and second language acquisition in
children.
– Written transcripts
– Audio
– Video
• Also contains data from not typically developing children.
What is CHILDES?
• Recorded, natural speech.
– Recorded in home setting
– Recorded at regular intervals
– Longitudinal data
• Typological variation.
–
–
–
–
Germanic
Romance
Slavic
Asian
Where is CHILDES?
Link: http://childes.psy.cmu.edu/
• “Databases”, i.e. the datasets.
• “Database manual” describing each dataset.
• Programs you can use to browse the databases
• Manuals that explain how to use the programs
Why use CHILDES?
• Answer questions about language acquisition
• Experimental studies
– Does child at age X know Y?
– Do 3-year-olds know passives?
– Do 2-year-olds know inflectional morphology?
– What interpretation do children at age X assign to Y?
– Do 4-year-olds understand binding?
– Do 5-year-olds understand scope freezing?
Why use CHILDES?
• Questions experiments cannot easily answer:
–
–
–
–
Role played by input
Order of acquisition
Manner of acquisition
Causality
• Longitudinal study
Big, universal questions
– Lexical categories
– Inflectional morphology
– Argument structure
Why use CHILDES?
• Does the interaction between language type and
pronoun omission match the predictions of parametersetting models?
• Are children with Down syndrome responsive to
maternal requests?
• How do children first learn mental state verbs such as
“remember” or “know”?
Why use CHILDES?
• Smaller, language specific questions
– Verb second
– Subjects (EPP)
– Particle verbs
• Comparative studies
– Acquisition of determiners
• Exploration
– Mean Length of Utterance, frequencies
How to use CHILDES?
• Download and install the dataset(s) you are interested in.
The “database manual” describes
• Language
• Age(s)
• Number of children
• Download and install CLAN (Computerized Language
Analysis):
– A search and statistics engine for CHILDES.
• OR use the NLTK’s CHILDES module.
How to use CHILDES?
• All files are transcribed in CHAT format
– Codes for the Human Analysis of Transcripts
• Format
– Files start with @-headers: information about
participants and setting
– The rest of the file contains *-tiers and %-tiers
– *-tiers: specify the speaker (*CHI = child)
– %-tiers: are related to the previous *-tier and give
extralinguistic information
How to use CHILDES?
• %-tiers are also used for coding
• %pho for phonology
• %mor for morphology
*CHI:
%mor:
I have a ball
PRO|I&1S V|have-PRES DET|a&INDEF N|ball
• %syn for syntax
How to use CHILDES?
• Some more annotations
#
6
&
xxx
[/]
[//]
< >["]
unfilled pause between words
schwa
phonological fragment
unintelligible speech
retracing without correction, e.g..: then [/] then
retracing, with correction, e.g.: then [//] but
quotation mark, used when the child literally repeats
something
• All notation can be found in the CHAT manual
How to use CHILDES?
• Go to the command window
• Every search starts with a command
–
–
–
–
kwal: word search
combo: combined search for 2 or more words
freq: frequency counts
mlu: mlu counts
• A command is followed by search parameters
How to use CHILDES?
• Some standard CLAN parameters
+t
+s
+u
+r
+f
selects the utterances of a specified speaker
selects a word to be searched
specifies that all search results are stored in one file
deals with the treatment of material between parentheses
output is stored in the (specified) file(s)
• Not all commands have the same search parameters
– Type the command in the command window and hit enter
How to use CHILDES?
• Searching with kwal
– Speaker(s)
– Word
– File(s)
• Command must come first, the order in which the search
parameters are given is irrelevant
• Every search parameter and the command must be separated
from each other by a space
How to use CHILDES?
• Setting the speaker parameter
– Identify the speaker(s)
+t = look for that specific speaker
-t = look for everyone but that specific speaker
• We are interested in the child
– command parameter-speaker-child
kwal +t*CHI
How to use CHILDES?
• Setting the word parameter
– Decide what word you want to look for
+s = look for that specific word
-s = look for everything except that specific word
• Let’s say we want to know whether the child has used the
auxiliary ‘want’.
– command speaker parameter-word-want
kwal +t*CHI +s”want”
How to use CHILDES?
• Specifying the file
• Two ways:
– Using the ‘file in’ button
– Specifying the file in the command line
• Let’s say we want to start our search in file sarah023.cha
– Command speaker word file
kwal +t*CHI +s”want” sarah023.cha
How to use CHILDES?
Exercise:
– Discover whether the mother uses the auxiliary ‘want’ in file
sarah023.cha
How to use CHILDES?
Exercise:
– Discover whether the mother uses the auxiliary ‘want’ in file
sarah023.cha
Steps to take:
–
–
–
–
Determine the command
Identify the speaker
Decide on the word
Specify the file
How to use CHILDES?
Exercise:
– Discover whether the mother uses the auxiliary ‘want’ in file
sarah023.cha
Steps to take:
–
–
–
–
Determine the command
Identify the speaker
Decide on the word
Specify the file
kwal +t*MOT +s”want” sarah023.cha
How to use CHILDES?
• Searching for several words
– Make a list in .txt format
– Enter the list as the word you are looking for
• For example:
– A list with all auxiliaries
– Named auxiliary.txt
– Parameter: [email protected]
kwal +t*CHI [email protected] sarah023.cha
How to use CHILDES?
• Output screen is limited
• Store the data in a separate file
– Parameter: +f
– File name has three letters
– For example: aux
• Command speaker word parameter-store-filename file
kwal +t*CHI +s”want” +faux sarah023.cha
How to use CHILDES?
• Retype the command: kwal +t*CHI +s”want” sarah023.cha
• Notice: some material is in between brackets
*CHI: wan(t) do (a)gain
• What does this mean?
– Child actually said ‘wan’ instead of ‘want’.
• CLAN will standardly include the material in between brackets.
– CLAN will look for ‘want’
How to use CHILDES?
• What does this mean?
– A search for ‘want’ will give you both ‘wan(t)’ and ‘want’.
• Control whether the search includes material in between brackets.
• +r parameter
+r1 = default, include material in brackets
+r2 = exclude material in brackets
+r5 = exclude rephrased material
How to use CHILDES?
• Try out: kwal +t*CHI +s”want” +r2 sarah023.cha
• +r5 allows for exclusion of rephrased material
• What is rephrased material?
*CHI:
I wanna [: want to] eat cereal
• In the default setting, CLAN will look for rephrased material
• The +r5 option allows you to look for ‘wanna’.
How to use CHILDES?
• Searching with both +s and –s
• CLAN only allows you to specify either +s or -s
• Imagine you want to look for all the conjugations of one verb, but
are not interested in any other, identical words
• For example: all the verbal forms of ‘go’
• First of all: wild card
– Wild card *, allows you to look for anything
How to use CHILDES?
• Adding the * to the word search
+s”go*”
• Words that this search will find are: go, gone, goes, going
• But also words such as: got, good, goat, god etc.
• Ideally, you want to specify both +s and –s
• Piping option
How to use CHILDES?
• Piping: the second command operates on the output of the first
command
• First command: look for ‘go*’  second command: exclude ‘good’,
‘got’, etc.
• In order for the second command to be able to operate on the
first, the first command must give an output in CHAT format
• +d option
How to use CHILDES?
• First command:
–
–
–
–
Look for ‘go*’
For the speaker *CHI
Output must be in CHAT format
In file sarah040.cha
kwal +t*CHI +s”go*” +d sarah040.cha
• Second command: exclude ‘got’
kwal –s”got”
How to use CHILDES?
• Piping the first and the second command
first command piping-operation second command
kwal +t*CHI +s”go*” +d sarah040.cha | kwal –s”got”
How to use CHILDES?
• Looking for more than one word at a time
• Searching with combo
– Speaker(s)
– Words
– File(s)
• Boolean operators:
^ = immediately followed by
* = any character
+ = or
! = not
How to use CHILDES?
• Setting the speaker parameter
combo +t*CHI
• Setting the word parameter
– Let’s look for the combination of ‘want’ and ‘to’
– ‘want’ immediately followed by ‘to’
combo +t*CHI +s”want^to”
How to use CHILDES?
• Specifying the file
– Let’s look in file sarah034.cha
combo +t*CHI +s”want^to” sarah034.cha
• Combo looks for the words in sequence by default
• The +x parameter allows you to look for two or more words in any
order
How to use CHILDES?
• Searching for ‘want’ directly followed by ‘to’ without +x only gives
‘want to’
combo +t*CHI +s”want^to” sarah034.cha
• Searching for ‘want’ directly followed by ‘to’ with +x gives both
‘want to’ and ‘to want’
combo +t*CHI +s”want^to” +x sarah034.cha
Pitfalls and limitations
• Cannot test for acceptability or ungrammaticality
• Be aware of:
–
–
–
–
Routines
Imitations
Speech errors
Mistranscriptions
Protocol
• CHILDES transcripts were collected with great effort and
are now freely available. In return for using them, you
reward the creators with citations.
• Cite latest copy of MacWhinney’s book:
MacWhinney, B. (2000). The CHILDES project: Tools for
analyzing talk. Third Edition. Mahwah, NJ: Lawrence
Erlbaum Associates.
• Cite the publication selected by the creator(s) of the
database(s) you have used.
– References can be found in the ‘database manuals’ on
the site