Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An introduction to CHILDES Rianne Schippers [email protected] Outline • What is CHILDES? • Where do you find CHILDES? • Why would you use CHILDES? • How do you use CHILDES? What is CHILDES? • Child Language Data Exchange System • Brain MacWhinney • Online database of first and second language acquisition in children. – Written transcripts – Audio – Video • Also contains data from not typically developing children. What is CHILDES? • Recorded, natural speech. – Recorded in home setting – Recorded at regular intervals – Longitudinal data • Typological variation. – – – – Germanic Romance Slavic Asian Where is CHILDES? Link: http://childes.psy.cmu.edu/ • “Databases”, i.e. the datasets. • “Database manual” describing each dataset. • Programs you can use to browse the databases • Manuals that explain how to use the programs Why use CHILDES? • Answer questions about language acquisition • Experimental studies – Does child at age X know Y? – Do 3-year-olds know passives? – Do 2-year-olds know inflectional morphology? – What interpretation do children at age X assign to Y? – Do 4-year-olds understand binding? – Do 5-year-olds understand scope freezing? Why use CHILDES? • Questions experiments cannot easily answer: – – – – Role played by input Order of acquisition Manner of acquisition Causality • Longitudinal study Big, universal questions – Lexical categories – Inflectional morphology – Argument structure Why use CHILDES? • Does the interaction between language type and pronoun omission match the predictions of parametersetting models? • Are children with Down syndrome responsive to maternal requests? • How do children first learn mental state verbs such as “remember” or “know”? Why use CHILDES? • Smaller, language specific questions – Verb second – Subjects (EPP) – Particle verbs • Comparative studies – Acquisition of determiners • Exploration – Mean Length of Utterance, frequencies How to use CHILDES? • Download and install the dataset(s) you are interested in. The “database manual” describes • Language • Age(s) • Number of children • Download and install CLAN (Computerized Language Analysis): – A search and statistics engine for CHILDES. • OR use the NLTK’s CHILDES module. How to use CHILDES? • All files are transcribed in CHAT format – Codes for the Human Analysis of Transcripts • Format – Files start with @-headers: information about participants and setting – The rest of the file contains *-tiers and %-tiers – *-tiers: specify the speaker (*CHI = child) – %-tiers: are related to the previous *-tier and give extralinguistic information How to use CHILDES? • %-tiers are also used for coding • %pho for phonology • %mor for morphology *CHI: %mor: I have a ball PRO|I&1S V|have-PRES DET|a&INDEF N|ball • %syn for syntax How to use CHILDES? • Some more annotations # 6 & xxx [/] [//] < >["] unfilled pause between words schwa phonological fragment unintelligible speech retracing without correction, e.g..: then [/] then retracing, with correction, e.g.: then [//] but quotation mark, used when the child literally repeats something • All notation can be found in the CHAT manual How to use CHILDES? • Go to the command window • Every search starts with a command – – – – kwal: word search combo: combined search for 2 or more words freq: frequency counts mlu: mlu counts • A command is followed by search parameters How to use CHILDES? • Some standard CLAN parameters +t +s +u +r +f selects the utterances of a specified speaker selects a word to be searched specifies that all search results are stored in one file deals with the treatment of material between parentheses output is stored in the (specified) file(s) • Not all commands have the same search parameters – Type the command in the command window and hit enter How to use CHILDES? • Searching with kwal – Speaker(s) – Word – File(s) • Command must come first, the order in which the search parameters are given is irrelevant • Every search parameter and the command must be separated from each other by a space How to use CHILDES? • Setting the speaker parameter – Identify the speaker(s) +t = look for that specific speaker -t = look for everyone but that specific speaker • We are interested in the child – command parameter-speaker-child kwal +t*CHI How to use CHILDES? • Setting the word parameter – Decide what word you want to look for +s = look for that specific word -s = look for everything except that specific word • Let’s say we want to know whether the child has used the auxiliary ‘want’. – command speaker parameter-word-want kwal +t*CHI +s”want” How to use CHILDES? • Specifying the file • Two ways: – Using the ‘file in’ button – Specifying the file in the command line • Let’s say we want to start our search in file sarah023.cha – Command speaker word file kwal +t*CHI +s”want” sarah023.cha How to use CHILDES? Exercise: – Discover whether the mother uses the auxiliary ‘want’ in file sarah023.cha How to use CHILDES? Exercise: – Discover whether the mother uses the auxiliary ‘want’ in file sarah023.cha Steps to take: – – – – Determine the command Identify the speaker Decide on the word Specify the file How to use CHILDES? Exercise: – Discover whether the mother uses the auxiliary ‘want’ in file sarah023.cha Steps to take: – – – – Determine the command Identify the speaker Decide on the word Specify the file kwal +t*MOT +s”want” sarah023.cha How to use CHILDES? • Searching for several words – Make a list in .txt format – Enter the list as the word you are looking for • For example: – A list with all auxiliaries – Named auxiliary.txt – Parameter: [email protected] kwal +t*CHI [email protected] sarah023.cha How to use CHILDES? • Output screen is limited • Store the data in a separate file – Parameter: +f – File name has three letters – For example: aux • Command speaker word parameter-store-filename file kwal +t*CHI +s”want” +faux sarah023.cha How to use CHILDES? • Retype the command: kwal +t*CHI +s”want” sarah023.cha • Notice: some material is in between brackets *CHI: wan(t) do (a)gain • What does this mean? – Child actually said ‘wan’ instead of ‘want’. • CLAN will standardly include the material in between brackets. – CLAN will look for ‘want’ How to use CHILDES? • What does this mean? – A search for ‘want’ will give you both ‘wan(t)’ and ‘want’. • Control whether the search includes material in between brackets. • +r parameter +r1 = default, include material in brackets +r2 = exclude material in brackets +r5 = exclude rephrased material How to use CHILDES? • Try out: kwal +t*CHI +s”want” +r2 sarah023.cha • +r5 allows for exclusion of rephrased material • What is rephrased material? *CHI: I wanna [: want to] eat cereal • In the default setting, CLAN will look for rephrased material • The +r5 option allows you to look for ‘wanna’. How to use CHILDES? • Searching with both +s and –s • CLAN only allows you to specify either +s or -s • Imagine you want to look for all the conjugations of one verb, but are not interested in any other, identical words • For example: all the verbal forms of ‘go’ • First of all: wild card – Wild card *, allows you to look for anything How to use CHILDES? • Adding the * to the word search +s”go*” • Words that this search will find are: go, gone, goes, going • But also words such as: got, good, goat, god etc. • Ideally, you want to specify both +s and –s • Piping option How to use CHILDES? • Piping: the second command operates on the output of the first command • First command: look for ‘go*’ second command: exclude ‘good’, ‘got’, etc. • In order for the second command to be able to operate on the first, the first command must give an output in CHAT format • +d option How to use CHILDES? • First command: – – – – Look for ‘go*’ For the speaker *CHI Output must be in CHAT format In file sarah040.cha kwal +t*CHI +s”go*” +d sarah040.cha • Second command: exclude ‘got’ kwal –s”got” How to use CHILDES? • Piping the first and the second command first command piping-operation second command kwal +t*CHI +s”go*” +d sarah040.cha | kwal –s”got” How to use CHILDES? • Looking for more than one word at a time • Searching with combo – Speaker(s) – Words – File(s) • Boolean operators: ^ = immediately followed by * = any character + = or ! = not How to use CHILDES? • Setting the speaker parameter combo +t*CHI • Setting the word parameter – Let’s look for the combination of ‘want’ and ‘to’ – ‘want’ immediately followed by ‘to’ combo +t*CHI +s”want^to” How to use CHILDES? • Specifying the file – Let’s look in file sarah034.cha combo +t*CHI +s”want^to” sarah034.cha • Combo looks for the words in sequence by default • The +x parameter allows you to look for two or more words in any order How to use CHILDES? • Searching for ‘want’ directly followed by ‘to’ without +x only gives ‘want to’ combo +t*CHI +s”want^to” sarah034.cha • Searching for ‘want’ directly followed by ‘to’ with +x gives both ‘want to’ and ‘to want’ combo +t*CHI +s”want^to” +x sarah034.cha Pitfalls and limitations • Cannot test for acceptability or ungrammaticality • Be aware of: – – – – Routines Imitations Speech errors Mistranscriptions Protocol • CHILDES transcripts were collected with great effort and are now freely available. In return for using them, you reward the creators with citations. • Cite latest copy of MacWhinney’s book: MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk. Third Edition. Mahwah, NJ: Lawrence Erlbaum Associates. • Cite the publication selected by the creator(s) of the database(s) you have used. – References can be found in the ‘database manuals’ on the site