|
|
|
This project is part of the work at
Carnegie Mellon
University's speech group aimed at advancing the state of Speech Synthesis.
-
Jan 24th 2007: Blizzard Challenge 2007
The call for participation in the third Blizzard Challenge is posted.
Blizzard is a multi-site challenge to build a synthetic voice from a
common database that will be evaluated by a large number of listeners.
-
Jan 21st 2007: Festvox-2.1 Release
Festvox-2.1 has been released. New with this release is Clustergen
Statistical Parametric Synthesis support, EHMM acoustic labeler (no
longer dependent on Sphinx) Thanks to Kishore, Voice Conversion
support (thanks to Tomoki Toda), and cygwin support. See ANNOUNCE-2.1 for more
details.
-
Dec 29 2006: HMM-Based Speech Synthesis System HTS-2.0 has been released.
HTS version 2.0 now supports adaptation and adaptive training
based on MLLR. MAP-based adaptation is also supported. This
version does not include any text analyzer yet, but you can use
the Festival Speech Synthesis System as a text analyzer. This
distribution includes demo scripts using CMU ARCTIC database
(English). Six HTS voices for Festival 1.95 are also released.
They are based on our small synthesis engine which has been
included as a module of Festival. Each of the HTS voices can be
used without any other HTS tools.
Sep 06: The Blizzard Challenge 2006
consisted on 14 different systems, results of the evaluation and the
papers describing the systems are available
here.
-
June 04: 5th ISCA Speech Synthesis
Workshop June 14-16th 2004, Carnegie Mellon University:
Proceedings
The Festvox project aims to make the building of new synthetic
voices more systemic and better documented, making it possible
for anyone to build a new voice. Specifically we offer:
- Documentation, including scripts
explaining the background and specifics for building new voices for
speech synthesis in new and supported languages.
- Specific scripts to build new voices in supported languages,
such as US and UK English.
- Aids to building synthetic voices for
limited domains
- Example speech databases
to help building new voices.
- Links, demos and a repository for new voices
FestVox version 2.0: Jan 2003: new in this release
- Better clunits general voice support
- Support for CMU Sphinx and SphinxTrain to build acoustic
models for labeling
- DOCBOOK version of the documentation, with more general
backgfround documentation
- Initial support for Mac OS X
- configure support to match Edinburgh Speech Tools
The documentation, tools and dependent software are all free without
restriction (commercial or otherwise). Licencing of voices built
by these techniques are the responsibility of the builders.
This work is firmly grounded within Edinburgh University's
Festival Speech Synthesis System and
Carnegie Mellon University's
small footprint Flite synthesis engine
This work has been supported be various groups including,
Carnegie Mellon University, the US National Science Foundation
(NSF), and US Defense Advanced Research Projects Agency (DARPA).
|
|
Requirements for building a voice
|
|
Note the techniques and processes described here do not guarantee
that you'll end up with a high quality acceptable voice, but with a
little care you can likely build a new synthesis voice in a supported
language in a few days, or in a new language in a few weeks (more or
less depending on the complexity of the language, and the desired
quality).
You will need:
- To read the documentation
- A Unix machine (e.g. Linux, FreeBSD, Solaris, etc) with working
audio i/o. This may work on other platforms but many scripts,
perhaps unnecessarily, depend on Unix utilties like,
awk, sed etc.
- Installed versions of Edinburgh University's
Festival Speech Synthesis System and
Edinburgh Speech Tools (distributed with Festival).
- A waveform viewing/labeling program like
emulabel distributed as part of Macquarie University's
EMU speech database
system. Although automatic labeling software is included in
festvox, a display tool is necessary for diagnosis and debugging.
- Patience and care, and a little interest in the subject of speech
technology.
|
|
|