This paper introduces a large-scale phonetically-balanced English speech corpus developed at ATR for corpus-based speech synthesis. This corpus includes a 16-hour American English speech data spoken by a professional male narrator in "reading style." The contents of prompt sentences concern basically news articles, travel conversations, and novels. The prompt sentences were selected from huge collections of texts using a greedy algorithm to maximize the coverage of linguistic units, such as diphones and triphones. A few measures were taken to control undesirable recording variations in voice quality in the short term (daily) and long term (monthly) while recording the prompt sentences. Statistical figures of the corpus developed as well as those of subsets provided for Blizzard Challenge 2006 and 2007 are presented.
Bibliographic reference. Ni, Jinfu / Hirai, Toshio / Kawai, Hisashi / Toda, Tomoki / Tokuda, Keiichi / Tsuzaki, Minoru / Sakai, Shinsuke / Maia, Ranniery / Nakamura, Satoshi (2007): "ATRECSS -- ATR English speech corpus for speech synthesis", In BLZ3-2007, paper 002.