FAIRSEQ S2: A Scalable and Integrable Speech Synthesis Toolkit

Changhan Wang, Wei Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng Jen Chen, Jiatao Gu, Juan Pino

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

This paper presents FAIRSEQ S2, a FAIRSEQ extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis, a suite of automatic metrics is included. Apart from the features added specifically for this extension, FAIRSEQ S2 also benefits from the scalability offered by FAIRSEQ and can be easily integrated with other state-of-the-art systems provided in this framework. The code, documentation, and pre-trained models will be made available at https://github.com/pytorch/fairseq/tree/master/examples/speech_synthesis.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing
Subtitle of host publicationSystem Demonstrations
PublisherAssociation for Computational Linguistics (ACL)
Pages143-152
Number of pages10
ISBN (Electronic)9781955917117
StatePublished - 2021
Externally publishedYes
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityVirtual, Punta Cana
Period7/11/2111/11/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'FAIRSEQ S2: A Scalable and Integrable Speech Synthesis Toolkit'. Together they form a unique fingerprint.

Cite this