Incorporating prosody into neural speech processing pipelines

Öktem, Hamdi Alp

Incorporating prosody into neural speech processing pipelines

Öktem, Hamdi Alp

Dirigida por:

Antonio Bonafonte Cávez Director/a
Mireia Farrús Cabeceran Codirector/a

Universidad de defensa: Universitat Pompeu Fabra

Fecha de defensa: 25 de febrero de 2019

Tribunal:

David Escudero Mancebo Presidente
Francesc Alías Pujol Secretario/a
Jordi Adell Mercado Vocal

Tipo: Tesis

Teseo: 582833 DIALNET TDX editor

Resumen

In this dissertation, I study the inclusion of prosody into two applications that involve speech understanding: automatic speech transcription and spoken language translation. In the former case, I propose a method that uses an attention mechanism over parallel sequences of prosodic and morphosyntactic features. Results indicate an F1 score of 70.3% in terms of overall punctuation generation accuracy. In the latter problem I deal with enhancing spoken language translation with prosody. A neural machine translation system trained with movie-domain data is adapted with pause features using a prosodically annotated bilingual dataset. Results show that prosodic punctuation generation as a preliminary step to translation increases translation accuracy by 1% in terms of BLEU scores. Encoding pauses as an extra encoding feature gives an additional 1% increase to this number. The system is further extended to jointly predict pause features in order to be used as an input to a text-to-speech system.