Festival at CMU   |   Synthesis Demos   |   Festvox Home   |   Festival at Edinburgh

Home
Document
Download
Voice Demos
Limited Domain
Example Databases
Mailing Lists
Search Documents
Links
Contact

Limited Domain Synthesis
In some synthesis tasks the range of spoken output required is actually limited (though can still be infinite). The work in limited domain synthesis is to try to make the most common phrases sound the best.

In one extreme, one can pre-record all utterances required, but that is too restrictive. Rather than going to the other extreme of just recording diphones (or unit selection) some words and phrases can be record and used appropriately with a diphone (or unit selection) databases to allow good coverage for common phrases but never bad coverage for less common forms.

Speaking Clock Example
This following is merely a tutorial tool within Festival about recording a (very) limited domain synthesizer. Twenty-four utterances were recorded (on a home PC, hence the background noise) and a speaking clock was then automatically built from the recorded forms, using our standard software tools. Although the results are not perfect, the ease of building such a synthesizer is quite attractive.

The recordings of the follow utterances were made

  "The time is now, exactly five past one, in the morning."
  "The time is now, just after ten past two, in the morning."
  "The time is now, a little after quarter past three, in the morning."
  "The time is now, almost twenty past four, in the morning."
  "The time is now, exactly twenty-five past five, in the morning."
  "The time is now, just after half past six, in the morning."
  "The time is now, a little after twenty-five to seven, in the morning."
  "The time is now, almost twenty to eight, in the morning."
  "The time is now, exactly quarter to nine, in the morning."
  "The time is now, just after ten to ten, in the morning."
  "The time is now, a little after five to eleven, in the morning."
  "The time is now, almost twelve."
  "The time is now, just after five to one, in the afternoon."
  "The time is now, a little after ten to two, in the afternoon."
  "The time is now, exactly quarter to three, in the afternoon."
  "The time is now, almost twenty to four, in the afternoon."
  "The time is now, just after twenty-five to five, in the afternoon."
  "The time is now, a little after half past six, in the evening."
  "The time is now, exactly twenty-five past seven, in the evening."
  "The time is now, almost twenty past eight, in the evening."
  "The time is now, just after quarter past nine, in the evening."
  "The time is now, almost ten past ten, in the evening."
  "The time is now, exactly five past eleven, in the evening."
  "The time is now, a little after quarter to midnight."
    
These were autoaligned and a cluster unit selection voice built from them (a fully automatic process). The resulting synthesizer can then speak any time (in the format of the database).

Full instructions and scripts to build such a limited domain synthesizer such as a speaking clock will be released with the next version of the Building Voices in Festival document.

What time is it?

This speaking clock was fully automatically generated from the recordings of the utterances listed above.

Some pre-synthesized examples:
sound 10:35 am
sound 7:54 pm
sound 4:47 pm
And a real-time synthesized example:
sound The time is now ... (in Pittsburgh, GMT-5:00)
sound The time is now ... (in Katmandu, GMT+5:45)

This page is maintained by Alan W Black (awb@cs.cmu.edu)
festvox.org is hosted on a machine donated by VA Linux Systems
VA Linux Systems