Download Workable Models of Standard Performance in English and Spanish

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Spanish grammar wikipedia , lookup

History of the Spanish language wikipedia , lookup

Hispanophone wikipedia , lookup

Mexican Spanish wikipedia , lookup

Names given to the Spanish language wikipedia , lookup

Standard Spanish wikipedia , lookup

Transcript
Workable Models of Standard Performance in English and Spanish
To justify a claim that a particular spoken performance has characteristics of a certain
language as it is commonly spoken, one needs a model of that language that has been
trained on an appropriate sample of speakers. This paper describes some workable
methods for collecting and modeling such data and presents results of this approach for
American English and Western Hemisphere Spanish. The methods have been used
extensively in the development of automatically scored spoken language tests that are
currently in operation. The model itself is implemented within HTK (the HMM Tool Kit
of Young & Woodland), using data-driven rule-base language models.
Working through an example, if one wishes to estimate the ‘correctness’ or
‘acceptability’ of a spoken response to a test prompt, the following steps define a
workable path to support validation that the test measures a construction of colloquial
educated spoken American English. An item (e.g. a sentence to be repeated or an
question to be answered) is drafted within the top 7000 lemmas of the Switchboard
corpus. The written form is then reviewed by two linguists from each of the US, UK, and
Australia. Prompts that are judged colloquial (as written) in all three communities are
recorded by a diverse sample of talkers in a recitation mode and then presented to several
hundred speakers contacted by a marketing firm that can deliver specifically
demographically targeted samples. In this case, it is a college graduate sample balanced
for sex and representative in ethnicity (but oversampled for African Americans) that is
geographically distributed in accordance with the US population density. Any item that
is not repeated or answered correctly by over 90% of the sample respondents is excluded
from further use. The reference sample responses are transcribed and vetted for quality,
by judges working independently. The models derived from the reference sample of
educated native responses are used to estimate the appropriateness and quality of the
responses from non-native candidates who take the tests. Because HMM scoring is
inherently disjunctive, the scoring of the candidates will accept candidate responses that
share acoustic, lexical, and syntactic aspects with a Boston male, or a Chicago female, or
an African American, on an approximately equal footing.
This model building produces several kinds of data that may be of substantive or
methodological interest. Examples of response distributions and score distributions will
be presented and explained. The examples will include analyses of as many as 20,000
spoken responses to a single prompt. Also, modes of verifying the adequacy of the
models for use on new geographic dialects will be illustrated. For example, the
measurement of Iberian Spanish from a model trained only on Western Hemisphere
Spanish. (see page 2). The paper focuses on the clearly describing the methods used and
presenting the kinds of results possible from these methods.
Figure: Cumulative Distributions of Spoken Spanish Performance for models trained
only on Western Hemisphere Spanish. Notice that the sample of speakers from Spain are
among the highest scoring groups.
X is the model fit metric. Y is cumulative density.