Abstract
This paper presents FAIRSEQ S2, a FAIRSEQ extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis, a suite of automatic metrics is included. Apart from the features added specifically for this extension, FAIRSEQ S2 also benefits from the scalability offered by FAIRSEQ and can be easily integrated with other state-of-the-art systems provided in this framework. The code, documentation, and pre-trained models will be made available at https://github.com/pytorch/fairseq/tree/master/examples/speech_synthesis.
Original language | English |
---|---|
Title of host publication | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing |
Subtitle of host publication | System Demonstrations |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 143-152 |
Number of pages | 10 |
ISBN (Electronic) | 9781955917117 |
State | Published - 2021 |
Externally published | Yes |
Event | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 |
Publication series
Name | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations |
---|
Conference
Conference | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 |
---|---|
Country/Territory | Dominican Republic |
City | Virtual, Punta Cana |
Period | 7/11/21 → 11/11/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics.