Incorporating prosody into neural speech processing pipelines

  1. Öktem, Hamdi Alp
Dirigida por:
  1. Antonio Bonafonte Cávez Director/a
  2. Mireia Farrús Cabeceran Codirector/a

Universidad de defensa: Universitat Pompeu Fabra

Fecha de defensa: 25 de febrero de 2019

Tribunal:
  1. David Escudero Mancebo Presidente
  2. Francesc Alías Pujol Secretario/a
  3. Jordi Adell Mercado Vocal

Tipo: Tesis

Teseo: 582833 DIALNET lock_openTDX editor

Resumen

In this dissertation, I study the inclusion of prosody into two applications that involve speech understanding: automatic speech transcription and spoken language translation. In the former case, I propose a method that uses an attention mechanism over parallel sequences of prosodic and morphosyntactic features. Results indicate an F1 score of 70.3% in terms of overall punctuation generation accuracy. In the latter problem I deal with enhancing spoken language translation with prosody. A neural machine translation system trained with movie-domain data is adapted with pause features using a prosodically annotated bilingual dataset. Results show that prosodic punctuation generation as a preliminary step to translation increases translation accuracy by 1% in terms of BLEU scores. Encoding pauses as an extra encoding feature gives an additional 1% increase to this number. The system is further extended to jointly predict pause features in order to be used as an input to a text-to-speech system.