A number of different intonation modules are available with varying levels of control. In general intonation is generated in two steps.
Reflecting this split there are two main intonation modules that call
sub-modules depending on the desired intonation methods. The
Int_Targets modules are defined in Lisp
(`lib/intonation.scm') and call sub-modules which are (so far) in
This is the simplest form of intonation and offers the modules
first of which actually does nothing at all.
Intonation_Targets_Default simply creates a target at the start
of the utterance, and one at the end. The values of which, by default
are 130 Hz and 110 Hz. These values may be set through the
duffint_params for example the following will
general a monotone at 150Hz.
(set! duffint_params '((start 150) (end 150))) (Parameter.set 'Int_Method 'DuffInt) (Parameter.set 'Int_Target_Method Int_Targets_Default)
This module uses the CART tree in
int_accent_cart_tree to predict
if each syllable is accented or not. A predicted value of
means no accent is generated by the corresponding
function. Any other predicted value will cause a `hat' accent to be
put on that syllable.
int_accent_cart_tree is available in the value
simple_accent_cart_tree in `lib/intonation.scm'. It simply
predicts accents on the stressed syllables on content words in
poly-syllabic words, and on the only syllable in single syllable content
words. Its form is
(set! simple_accent_cart_tree ' ((R:SylStructure.parent.gpos is content) ((stress is 1) ((Accented)) ((position_type is single) ((Accented)) ((NONE)))) ((NONE))))
Int_Targets_Simple uses parameters in the a-list
int_simple_params. There are two interesting
f0_mean which gives the mean F0 for this speaker
(default 110 Hz) and
f0_std is the standard deviation of
F0 for this speaker (default 25 Hz). This second value is used
to determine the amount of variation to be put in the generated
For each Phrase in the given utterance an F0 is generated starting at
f0_code+(f0_std*0.6) and declines
f0_std Hz over the
length of the phrase until the last syllable whose end is set to
f0_code-f0_std. An imaginary line called
drawn from start to the end (minus the final extra fall), For each
syllable that is accented (i.e. has an IntEvent related to it) three
targets are added. One at the start, one in mid vowel, and one at the
end. The start and end are at position
baseline Hz (as declined
for that syllable) and the mid vowel is set to
Note this model is not supposed to be complex or comprehensive but it offers a very quick and easy way to generate something other than a fixed line F0. Something similar to this has been for Spanish and Welsh without (too many) people complaining. However it is not designed as a serious intonation module.
This module is more flexible. Two different CART trees can be used to predict `accents' and `endtones'. Although at present this module is used for an implementation of the ToBI intonation labelling system it could be used for many different types of intonation system.
The target module for this method uses a Linear Regression model to
predict start mid-vowel and end targets for each syllable using
arbitrarily specified features. This follows the work described in
black96. The LR models are held as as described below
See section 25.5 Linear regression. Three models are used in the variables
Tilt description to be inserted.
As there seems to be a number of intonation theories that predict F0 contours by rule (possibly using trained parameters) this module aids the external specification of such rules for a wide class of intonation theories (through primarily those that might be referred to as the ToBI group). This is designed to be multi-lingual and offer a quick way to port often pre-existing rules into Festival without writing new C++ code.
The accent prediction part uses the same mechanisms as the Simple
intonation method described above, a decision tree for
accent prediction, thus the tree in the variable
int_accent_cart_tree is used on each syllable to predict
The target part calls a specified Scheme function which returns
a list of target points for a syllable. In this way any arbitrary
tests may be done to produce the target points. For example
here is a function which returns three target points
for each syllable with an
IntEvent related to it (i.e.
(define (targ_func1 utt syl) "(targ_func1 UTT STREAMITEM) Returns a list of targets for the given syllable." (let ((start (item.feat syl 'syllable_start)) (end (item.feat syl 'syllable_end))) (if (equal? (item.feat syl "R:Intonation.daughter1.name") "Accented") (list (list start 110) (list (/ (+ start end) 2.0) 140) (list end 100)))))
This function may be identified as the function to call by the following setup parameters.
(Parameter.set 'Int_Method 'General) (Parameter.set 'Int_Target_Method Int_Targets_General) (set! int_general_params (list (list 'targ_func targ_func1)))
An example implementation of a ToBI to F0 target module is included in `lib/tobi_rules.scm' based on the rules described in jilka96. This uses the general intonation method discussed in the previous section. This is designed to be useful to people who are experimenting with ToBI (silverman92), rather than general text to speech.
To use this method you need to load `lib/tobi_rules.scm' and
setup_tobi_f0_method. The default is in a male's
pitch range, i.e. for
voice_rab_diphone. You can change
it for other pitch ranges by changing the folwoing variables.
(Parameter.set 'Default_Topline 110) (Parameter.set 'Default_Start_Baseline 87) (Parameter.set 'Default_End_Baseline 83) (Parameter.set 'Current_Topline (Parameter.get 'Default_Topline)) (Parameter.set 'Valley_Dip 75)
An example using this from STML is given in `examples/tobi.stml'. But it can also be used from Scheme. For example before defining an utterance you should execute the following either from teh command line on in some setup file
(voice_rab_diphone) (require 'tobi_rules) (setup_tobi_f0_method)
In order to allow specification of accents, tones, and break levels you must use an utterance type that allows such specification. For example
(Utterance Words (boy (saw ((accent H*))) the (girl ((accent H*))) in the (park ((accent H*) (tone H-))) with the (telescope ((accent H*) (tone H-H%))))) (Utterance Words (The (boy ((accent L*))) saw the (girl ((accent H*) (tone L-))) with the (telescope ((accent H*) (tone H-H%))))))
You can display the the synthesized form of these utterance in
Xwaves. Start an Xwaves and an Xlabeller and call the function
display on the synthesized utterance.
Go to the first, previous, next, last section, table of contents.