Wednesday, 24th December, 2014

Full distribution: festvox-2.7.0-release.tar.gz document alone: bsv.ps.gz bsv.pdf
Updates and news on FestVox will be posted on http://festvox.org

NOTE: this document is incomplete

Table of Contents

I. Speech Synthesis

Overview of Speech Synthesis

History
Uses of Speech Synthesis
General Anatomy of a Synthesizer

Speech Science

A Practical Speech Synthesis System

Basic Use
Utterance structure
Modules
Utterance access
Utterance building
Extracting features from utterances

II. Building Synthetic Voices

Basic Requirements

Hardware/software requirements
Voice in a new language
Voice in an existing language
Selecting a speaker
Who owns a voice
Recording under Unix
Extracting pitchmarks from waveforms

Limited domain synthesis

designing the prompts
customizing the synthesizer front end
autolabeling issues
unit size and type
using limited domain synthesizers
Telling the time
Making it better

Text analysis

Non-standard words analysis
Token to word rules
Number pronunciation
Homograph disambiguation
TTS modes
Mark-up modes

Lexicons

Word pronunciations
Lexicons and addenda
Out of vocabulary words
Building letter-to-sound rules by hand
Building letter-to-sound rules automatically
Post-lexical rules
Building lexicons for new languages

Building prosodic models

Phrasing
Accent/Boundary Assignment
F0 Generation
Duration
Prosody Research
Prosody Walkthrough

Corpus development

Non-Latin-script languages

Waveform Synthesis

Diphone databases

Diphone introduction
Defining a diphone list
Recording the diphones
Labeling the diphones
Extracting the pitchmarks
Building LPC parameters
Defining a diphone voice
Checking and correcting diphones
Diphone check list

Unit selection databases

Cluster unit selection
Building a Unit Selection Cluster Voice
Diphones from general databases

Statistical Parametric Synthesis

Building a CLUSTERGEN Statistical Parametric Synthesizer
Making it better:Mixed excitation and Random Forests

Labeling Speech

Labeling with Dynamic Time Warping
Labeling with Full Acoustic Models
Prosodic Labeling

Evaluation and Improvements

Evaluation
Does it work at all?
Formal Evaluation Tests
Debugging voices

III. Interfacing and Integration

Markup
Concept-to-speech
Deployment

IV. Recipes

Grapheme-based Synthesizer

General Grapheme-based Voices
Building Indic voices
Creating support for new Indic languages

A Japanese Diphone Voice

US/UK English Diphone Synthesizer

ldom full example

Non-english ldom example

V. Concluding Remarks

Concluding remarks and future

Festival Details

Festival's Scheme Programming Language

Overview

Data Types

Functions

Core functions
List functions
Arithmetic functions
I/O functions
String functions
System functions
Utterance Functions
Synthesis Functions

Debugging and Help

Adding new C++ functions to Scheme

Regular Expressions

Some Examples

Edinburgh Speech Tools

Machine Learning

Resources

Festival resources
General speech resources

Tools Installation

English phone lists

US phoneset
UK phoneset

		Next >>>
		Speech Synthesis

Building Synthetic Voices

Alan W Black and Kevin A. Lenzo