Thèse de doctorat en Physique
Sous la direction de Jean-Luc Gauvain et de Martine Adda-Dekker.
Soutenue en 2007
à Paris 11 , en partenariat avec Université de Paris-Sud. Faculté des Sciences d'Orsay (Essonne) (autre partenaire) .
An automatic language identification system (LID) aims at identifying the identity of a spoken language using a short speech sample of an unknown speaker. The LID problem can be viewed as a stochastic process of language generation. An adaption to the LID problem, of the source-channel model commonly used for automatic language transcription is proposed. An important challenge, for the researchers in LID, consist in developing approaches and methods which make limited use of explicit knowledge concerning the languages to be processed. In order to guarantee an easy extension to additional languages, multilingual << phonemic>> symbol sets have been designed and multilingual or language-independent acoustic models have been estimated and experimented with using a restricted number of languages. The challenge is have them cover the acoustic space corresponding to the capacities of the human vocal apparatus. The definition and the use of multilingual phonemic inventories for acoustic modeling are major issues of our research. Within the framework of the phonotactic approaches to LID, the decision of the spoken language identity depends on an automatically decoded multilingual phoneme flow. To optimize the quality of this phoneme flow, two methods are explored: 1) increase the acoustic model accuracy by taking into account multilingual triphone contexts; 2) widen the scope of the units by considering a multilingual syllabic unit. Syllables are longer units and hence less subject to coarticulation effects than phonemes at the acoustic level. We validated our work by experiments of language identification, automatic transcription and by a detailed acoustic analysis of the vowels in eight languages (French, American English, German, Italian , Spanish, Portuguese, Arabic and Chinese Mandarin).
Multilingual acoustic modeling for the automatic language identification and the speech recognition
Pas de résumé disponible.