Non-native speech database

{{Short description|Speech Database}}

A non-native speech database is a speech database of non-native pronunciations of English. Such databases are used in the development of: multilingual automatic speech recognition systems, text to speech systems, pronunciation trainers, and second language learning systems.M. Raab, R. Gruhn and E. Noeth, Non-Native speech databases, in Proc. ASRU, Kyoto, Japan, 2007.

List

__FORCETOC__

style="margin:1em auto;"

|+ Table 1: Abbreviations for languages used in Table 2

{| class="wikitable" style="text-align:right;"

| Arabic

| A

| Japanese

| J

Chinese

| C

| Korean

| K

Czech

| Cze

| Malaysian

| M

Danish

| D

| Norwegian

| N

Dutch

| Dut

| Portuguese

| P

English

| E

| Russian

| R

French

| F

| Spanish

| S

German

| G

| Swedish

| Swe

Greek

| Gre

| Thai

| T

Indonesian

| Ind

| Vietnamese

| V

Italian

| I

|  

|  

|}


The actual table with information about the different databases is shown in Table 2.

style="margin:1em auto;"

|+ Table 2: Overview of non-native Databases

{| class="wikitable sortable"

! Corpus

! Author

! Available at

! Languages

! #Speakers

! Native Language

! #Utt.

! Duration

! Date

! Remarks

AMI AMI Project, "AMI Meeting Corpus" [http://corpus.amiproject.org/].

|

| EU

| E

|

| Dut and other

|

| 100h

|

| meeting recordings

ATR-Gruhn R. Gruhn, T. Cincarek, and S. Nakamura, "A multi-accent non-native English database", in ASJ, 2004.

| Gruhn

| ATR

| E

| 96

| C G F J Ind

| 15000

|  

| 2004

| proficiency rating

BAS Strange Corpus 1+10 University Munich, "Bavarian archive for speech signals strange corpus", [http://www.phonetik.uni-muenchen.de/Bas/].

|  

| ELRA

| G

| 139

| 50 countries

| 7500

|  

| 1998

|  

Berkeley Restaurant Jurafsky et al., "The Berkeley Restaurant Project", Proc. ICSLP 1994.

|

| ICSI

| E

| 55

| G I H C F S J

| 2500

|

| 1994

|  

Broadcast News L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.

|  

| LDC

| E

|  

|  

|  

|  

| 1997

|  

Cambridge-Witt S. Witt, Use of Speech Recognition in Computer-Assisted Language Learning, Ph.D. thesis, Cambridge University Engineering Department, UK, 1999.

| Witt

| U. Cambridge

| E

| 10

| J I K S

| 1200

|  

| 1999

|  

Cambridge-Ye H. Ye and S. Young, Improving the speech recognition performance of beginners in spoken conversational interaction for language learning, in Proc. Interspeech, Lisbon, Portugal, 2005.

| Ye

| U. Cambridge

| E

| 20

| C

| 1600

|  

| 2005

|  

Children News L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.

| Tomokiyo

| CMU

| E

| 62

| J C

| 7500

|  

| 2000

| partly spontaneous

CLIPS-IMAG T. P. Tan and L. Besacier, A French non-native corpus for automatic speech recognition, in LREC, Genoa, Italy, 2006.

| Tan

| CLIPS-IMAG

| F

| 15

| C V

|  

| 6h

| 2006

|  

CLSU T. Lander, CSLU: Foreign accented English release 1.2, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2007.

|  

| LDC

| E

|  

| 22 countries

| 5000

|  

| 2007

| telephone, spontaneous

CMU Z. Wang, T. Schultz, and A. Waibel, Comparison of acoustic model adaptation techniques on non-native speech, in Proc. ICASSP, 2003.

|  

| CMU

| E

| 64

| G

| 452

| 0.9h

|  

| not available

Cross Towns S. Schaden, Regelbasierte Modellierung fremdsprachlich akzentbehafteter Aussprachevarianten, Ph.D. thesis, University Duisburg-Essen, 2006.

| Schaden

| U. Bochum

| E F G I Cze Dut

| 161

| E F G I S

| 72000

| 133h

| 2006

| city names

Duke-Arslan L. M. Arslan and J. H. Hansen, Frequency characteristics of foreign accented speech, in Proc. of ICASSP, Munich, Germany, 1997, pp. 1123-1126.

| Arslan

| Duke University

| E

| 93

| 15 countries

| 2200

|  

| 1995

| partly telephone speech

ERJ N. Minematsu et al., Development of English speech database read by Japanese to support CALL research, in ICA, Kyoto, Japan, 2004, pp. 577-560.

| Minematsu

| U. Tokyo

| E

| 200

| J

| 68000

|  

| 2002

| proficiency rating

Fischer Christopher Cieri, David Miller, Kevin Walker, The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text, Proc. LREC 2004

|

| LDC

| E

|

| many

|

| 200h

|

| telephone speech

Fitt S. Fitt, The pronunciation of unfamiliar native and non-native town names, in Proc. of Eurospeech, 1995, pp. 2227-2230.

| Fitt

| U. Edinburgh

| F I N Gre

| 10

| E

| 700

|  

| 1995

| city names

Fraenki G. Stemmer, E. Noeth, and H. Niemann, Acoustic modeling of foreign words in a German speech recognition system, in Proc. Eurospeech, P. Dalsgaard, B. Lindberg, and H. Benner, Eds., 2001, vol. 4, pp. 2745-2748.

|  

| U. Erlangen

| E

| 19

| G

| 2148

|  

|  

|  

Hispanic W. Byrne, E. Knodt, S. Khudanpur, and J. Bernstein, Is automatic speech recognition ready for non-native speech? A data-collection effort and initial experiments in modeling conversational Hispanic English, in STiLL, Marholmen, Sweden, 1998, pp. 37-40.

| Byrne

|  

| E

| 22

| S

|  

| 20h

| 1998

| partly spontaneous

HLTC Y. Li, P. Fung, P. Xu, and Y. Liu, Asymmetric acoustic modeling for mixed language speech recognition, in ICASSP, Prague, Czech, 2011, pp. 37-40.

|  

| HKUST

| E

| 44

| C

|  

| 3h

| 2010

| available on request

IBM-Fischer V. Fischer, E. Janke, and S. Kunzmann, Recent progress in the decoding of non-native speech with multilingual acoustic models, in Proc. of Eurospeech, 2003, pp. 3105-3108.

|  

| IBM

| E

| 40

| S F G I

| 2000

|  

| 2002

| digits

iCALL Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma, Haizhou Li, iCALL Corpus: Mandarin Chinese Spoken by Non-Native Speakers of European Descent, in Proc. of Interspeech, 2015.Nancy F. Chen, Vivaek Shivakumar, Mahesh Harikumar, Bin Ma, Haizhou Li. Large-Scale Characterization of Mandarin Pronunciation Errors Made by native Speakers of European Languages, in Proc. of Interspeech, 2013.

| Chen

| I2R, A*STAR

| C

| 305

| 24 countries

| 90841

| 142h

| 2015

| phonetic and tonal transcriptions (in Pinyin), proficiency ratings

ISLE W. Menzel, E. Atwell, P. Bonaventura, D. Herron, P. Howarth, R. Morton, and C. Souter, The ISLE corpus of non-native spoken English, in LREC, Athens, Greece, 2000, pp. 957-963.

| Atwell

| EU/ELDA

| E

| 46

| G I

| 4000

| 18h

| 2000

|  

Jupiter K. Livescu, Analysis and modeling of non-native speech for automatic speech recognition, M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999.

| Zue

| MIT

| E

| unknown

| unknown

| 5146

|  

| 1999

| telephone speech

K-SEC S-C. Rhee and S-H. Lee and S-K. Kang and Y-J. Lee, Design and Construction of Korean-Spoken English Corpus (K-SEC), Proc. ICSLP 2004

| Rhee

| SiTEC

| E

| unknown

| K

|  

|  

| 2004

|

LDC WSJ1 L. Tomokiyo, Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition, Ph.D. thesis, Carnegie Mellon University, Pennsylvania, 2001.

|  

| LDC

|  

| 10

|  

| 800

| 1h

| 1994

|  

LeaP Gut, U., Non-native Speech. A Corpus-based Analysis of Phonological and Phonetic Properties of L2 English and German, Frankfurt am Main: Peter Lang, 2009.

| Gut

| University of Münster

| E G

| 127

| 41 different ones

| 73.941 words

| 12h

| 2003

|  

MIST TNO Human Factors Research Institute, Mist multi-lingual interoperability in speech technology database, Tech. Rep., ELRA, Paris, France, 2007, ELRA Catalog Reference S0238.

|  

| ELRA

| E F G

| 75

| Dut

| 2200

|  

| 1996

|  

NATO HIWIRE J.C. Segura et al., The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication, 2007, [http://www.hiwire.org/].

|  

| NATO

| E

| 81

| F Gre I S

| 8100

|  

| 2007

| clean speech

NATO M-ATC S. Pigeon, W. Shen, and D. van Leeuwen, Design and characterization of the non-native military air traffic communications database, in ICSLP, Antwerp, Belgium, 2007.

| Pigeon

| NATO

| E

| 622

| F G I S

| 9833

| 17h

| 2007

| heavy background noise

NATO N4 L. Benarousse et al., The NATO native and non-native (n4) speech corpus, in Proc. of the MIST workshop (ESCA-NATO), Leusden, Sep 1999.

|  

| NATO

| E

| 115

| unknown

|  

| 7.5h

| 2006

| heavy background noise

Onomastica Onomastica Consortium, The ONOMASTICA interlanguage pronunciation lexicon, in Proc. Eurospeech, Madrid, Spain, 1995, pp. 829-832.

|  

|  

| align="CENTER" | D Dut E F G Gre I N P S Swe

|

|  

| (121000)

|  

| 1995

| only lexicon

PF-STAR C. Hacker, T. Cincarek, A. Maier, A. Hessler, and E. Noeth, Boosting of prosodic and pronunciation features to detect mispronunciations of non-native children, in Proc. of ICASSP, Honolulu, Hawai, 2007, pp. 197-200.

|  

| U. Erlangen

| E

| 57

| G

| 4627

| 3.4h

| 2005

| children speech

Sunstar C. Teixeira, I. Trancoso, and A. Serralheiro, Recognition of non-native accents, in Proc. Eurospeech, Rhodes, Greece, 1997, pp. 2375-2378.

|  

| EU

| E

| 100

| G S I P D

| 40000

|  

| 1992

| parliament speech

TC-STAR H. Heuvel, K. Choukri, C. Gollan, A. Moreno, and D. Mostefa, TC-STAR: New language resources for ASR and SLT purposes, in LREC, Genoa, 2006, pp. 2570-2573.

| Heuvel

| ELDA

| E S

| unknown

| EU countries

|  

| 13h

| 2006

| multiple data sets

TED L.F. Lamel, F. Schiel, A. Fourcin, J. Mariani, and H. Tillmann, The translanguage English database TED, in ICSLP, Yokohama, Japan, Sep 1994.

| Lamel

| ELDA

| E

| 40(188)

| many

|  

| 10h(47h)

| 1994

| eurospeech 93

TLTS N. Mote, L. Johnson, A. Sethy, J. Silva, and S. Narayanan, Tactical language detection and modeling of learner speech errors: The case of Arabic tactical language training for American English speakers, in Proc. of InSTIL, June 2004.

|  

| DARPA

| A

|  

| E

|  

| 1h

| 2004

|  

Tokyo-Kikuko K. Nishina, Development of Japanese speech database read by non-native speakers for constructing CALL system, in ICA, Kyoto, Japan, 2004, pp. 561-564.

|  

| U. Tokyo

| J

| 140

| 10 countries

| 35000

|  

| 2004

| proficiency rating

Verbmobil University Munich, The Verbmobil project, [http://www.phonetik.uni-muenchen.de/Forschung/Verbmobil/VerbOverview.html].

|  

| U. Munich

| E

| 44

| G

|  

| 1.5h

| 1994

| very spontaneous

VODIS I. Trancoso, C. Viana, I. Mascarenhas, and C. Teixeira, On deriving rules for nativised pronunciation in navigation queries, in Proc. Eurospeech, 1999.

|  

| EU

| F G

| 178

| F G

| 2500

|  

| 1998

| about car navigation

WP Arabic A. LaRocca and R. Chouairi, West point Arabic speech corpus, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2002.

| Rocca

| LDC

| A

| 35

| E

| 800

| 1h

| 2002

|  

WP Russian A. LaRocca and C. Tomei, West point Russian speech corpus, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2003.

| Rocca

| LDC

| R

| 26

| E

| 2500

| 2h

| 2003

|  

WP Spanish J. Morgan, West point heroico Spanish speech, Tech. Rep., LDC, Philadelphia, Pennsylvania, 2006.

| Morgan

| LDC

| S

|  

| E

|  

|  

| 2006

|  

WSJ Spoke I. Amdal, F. Korkmazskiy, and A. C. Surendran, Joint pronunciation modelling of non-native speakers using data-driven methods, in ICSLP, Beijing, China, 2000, pp. 622-625.

|  

|  

| E

| 10

| unknown

| 800

|  

| 1993

|  

|}

=Legend=

In the table of non-native databases some abbreviations for language names are used. They are listed in Table 1. Table 2 gives the following information about each corpus: The name of the corpus, the institution where the corpus can be obtained, or at least further information should be available, the language which was actually spoken by the speakers, the number of speakers, the native language of the speakers, the total amount of non-native utterances the corpus contains, the duration in hours of the non-native part, the date of the first public reference to this corpus, some free text highlighting special aspects of this database and a reference to another publication. The reference in the last field is in most cases to the paper which is especially devoted to describe this corpus by the original collectors. In some cases it was not possible to identify such a paper. In these cases a paper is referenced which is using this corpus is.

Some entries are left blank and others are marked with unknown. The difference here is that blank entries refer to attributes where the value is just not known. Unknown entries, however, indicate that no information about this attribute is available in the database itself. As an example, in the Jupiter weather databaseK. Livescu, Analysis and modeling of non-native speech for automatic speech recognition, M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1999. no information about the origin of the speakers is given. Therefore this data would be less useful for verifying accent detection or similar issues.

Where possible, the name is a standard name of the corpus, for some of the smaller corpora, however, there was no established name and hence an identifier had to be created. In such cases, a combination of the institution and the collector of the database is used.

In the case where the databases contain native and non-native speech, only attributes of the non-native part of the corpus are listed. Most of the corpora are collections of read speech. If the corpus instead consists either partly or completely of spontaneous utterances, this is mentioned in the Specials column.

References