Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Workable Models of Standard Performance in English and Spanish To justify a claim that a particular spoken performance has characteristics of a certain language as it is commonly spoken, one needs a model of that language that has been trained on an appropriate sample of speakers. This paper describes some workable methods for collecting and modeling such data and presents results of this approach for American English and Western Hemisphere Spanish. The methods have been used extensively in the development of automatically scored spoken language tests that are currently in operation. The model itself is implemented within HTK (the HMM Tool Kit of Young & Woodland), using data-driven rule-base language models. Working through an example, if one wishes to estimate the ‘correctness’ or ‘acceptability’ of a spoken response to a test prompt, the following steps define a workable path to support validation that the test measures a construction of colloquial educated spoken American English. An item (e.g. a sentence to be repeated or an question to be answered) is drafted within the top 7000 lemmas of the Switchboard corpus. The written form is then reviewed by two linguists from each of the US, UK, and Australia. Prompts that are judged colloquial (as written) in all three communities are recorded by a diverse sample of talkers in a recitation mode and then presented to several hundred speakers contacted by a marketing firm that can deliver specifically demographically targeted samples. In this case, it is a college graduate sample balanced for sex and representative in ethnicity (but oversampled for African Americans) that is geographically distributed in accordance with the US population density. Any item that is not repeated or answered correctly by over 90% of the sample respondents is excluded from further use. The reference sample responses are transcribed and vetted for quality, by judges working independently. The models derived from the reference sample of educated native responses are used to estimate the appropriateness and quality of the responses from non-native candidates who take the tests. Because HMM scoring is inherently disjunctive, the scoring of the candidates will accept candidate responses that share acoustic, lexical, and syntactic aspects with a Boston male, or a Chicago female, or an African American, on an approximately equal footing. This model building produces several kinds of data that may be of substantive or methodological interest. Examples of response distributions and score distributions will be presented and explained. The examples will include analyses of as many as 20,000 spoken responses to a single prompt. Also, modes of verifying the adequacy of the models for use on new geographic dialects will be illustrated. For example, the measurement of Iberian Spanish from a model trained only on Western Hemisphere Spanish. (see page 2). The paper focuses on the clearly describing the methods used and presenting the kinds of results possible from these methods. Figure: Cumulative Distributions of Spoken Spanish Performance for models trained only on Western Hemisphere Spanish. Notice that the sample of speakers from Spain are among the highest scoring groups. X is the model fit metric. Y is cumulative density.