Incorporating prosody into neural speech processing pipelines

Öktem, Hamdi Alp

Incorporating prosody into neural speech processing pipelines

Öktem, Hamdi Alp

unter der Leitung von:

Antonio Bonafonte Cávez Doktorvater/Doktormutter
Mireia Farrús Cabeceran Co-Doktorvater/Doktormutter

Universität der Verteidigung: Universitat Pompeu Fabra

Fecha de defensa: 25 von Februar von 2019

Gericht:

David Escudero Mancebo Präsident
Francesc Alías Pujol Sekretär/in
Jordi Adell Mercado Vocal

Art: Dissertation

Teseo: 582833 DIALNET TDX editor

Zusammenfassung

In this dissertation, I study the inclusion of prosody into two applications that involve speech understanding: automatic speech transcription and spoken language translation. In the former case, I propose a method that uses an attention mechanism over parallel sequences of prosodic and morphosyntactic features. Results indicate an F1 score of 70.3% in terms of overall punctuation generation accuracy. In the latter problem I deal with enhancing spoken language translation with prosody. A neural machine translation system trained with movie-domain data is adapted with pause features using a prosodically annotated bilingual dataset. Results show that prosodic punctuation generation as a preliminary step to translation increases translation accuracy by 1% in terms of BLEU scores. Encoding pauses as an extra encoding feature gives an additional 1% increase to this number. The system is further extended to jointly predict pause features in order to be used as an input to a text-to-speech system.