Go to the first, previous, next, last section, table of contents.

25 Tools

A number of basic data manipulation tools are supported by Festival. These often make building new modules very easy and are already used in many of the existing modules. They typically offer a Scheme method for entering data, and Scheme and C++ functions for evaluating it.

25.1 Regular expressions

Regular expressions are a formal method for describing a certain class of mathematical languages. They may be viewed as patterns which match some set of strings. They are very common in many software tools such as scripting languages like the UNIX shell, PERL, awk, Emacs etc. Unfortunately the exact form of regualr expressions often differs slightly between different applications making their use often a little tricky.

Festival support regular expressions based mainly of the form used in the GNU libg++ Regex class, though we have our own implementation of it. Our implementation (EST_Regex) is actually based on Henry Spencer's `regex.c' as distributed with BSD 4.4.

Regular expressions are represented as character strings which are interpreted as regular expressions by certain Scheme and C++ functions. Most characters in a regular expression are treated as literals and match only that character but a number of others have special meaning. Some characters may be escaped with preceeding backslashes to change them from operators to literals (or sometime literals to operators).

.: Matches any character.
$: matches end of string
^: matches beginning of string
X*: matches zero or more occurrences of X, X may be a character, range of parenthesized expression.
X+: matches one or more occurrences of X, X may be a character, range of parenthesized expression.
X?: matches zero or one occurrence of X, X may be a character, range of parenthesized expression.
[...]: a ranges matches an of the values in the brackets. The range operator "-" allows specification of ranges e.g. a-z for all lower case characters. If the first character of the range is ^ then it matches anything character except those specificed in the range. If you wish - to be in the range you must put that first.
\$...\$: Treat contents of parentheses as single object allowing operators *, +, ? etc to operate on more than single characters.
X\\|Y: matches either X or Y. X or Y may be single characters, ranges or parenthesized expressions.

Note that actuall only one backslash is needed before a character to escape it but becuase these expressions are most often contained with Scheme or C++ strings, the escpae mechanaism for those strings requires that backslash itself be escaped, hence you will most often be required to type two backslashes.

Some example may help in enderstanding the use of regular expressions.

a.b: matches any three letter string starting with an a and ending with a b.
.*a: matches any string ending in an a
.*a.*: matches any string containing an a
[A-Z].*: matches any string starting with a capital letter
[0-9]+: matches any string of digits
-?[0-9]+\$\\.[0-9]+\$?: matches any positive or negative real number. Note the optional preceeding minus sign and the optional part contain the point and following numbers. The point itself must be escaped as dot on its own matches any character.
[^aeiouAEIOU]+: mathes any non-empty string which doesn't conatin a vowel
\$[Ss]at\\(urday\$\\)?\\|\$[Ss]un\\(day\$\\): matches Saturday and Sunday in various ways

The Scheme function string-matches takes a string and a regular expression and returns t if the regular expression macthes the string and nil otherwise.

25.2 CART trees

One of the basic tools available with Festival is a system for building and using Classification and Regression Trees (breiman84). This standard statistical method can be used to predict both categorical and continuous data from a set of feature vectors.

The tree itself contains yes/no questions about features and ultimately provides either a probability distribution, when predicting categorical values (classification tree), or a mean and standard deviation when predicting continuous values (regression tree). Well defined techniques can be used to construct an optimal tree from a set of training data. The program, developed in conjunction with Festival, called `wagon', distributed with the speech tools, provides a basic but ever increasingly powerful method for constructing trees.

A tree need not be automatically constructed, CART trees have the advantage over some other automatic training methods, such as neural networks and linear regression, in that their output is more readable and often understandable by humans. Importantly this makes it possible to modify them. CART trees may also be fully hand constructed. This is used, for example, in generating some duration models for languages we do not yet have full databases to train from.

A CART tree has the following syntax

    CART ::= QUESTION-NODE || ANSWER-NODE
    QUESTION-NODE ::= ( QUESTION YES-NODE NO-NODE )
    YES-NODE ::= CART
    NO-NODE ::= CART
    QUESTION ::= ( FEATURE in LIST )
    QUESTION ::= ( FEATURE is STRVALUE )
    QUESTION ::= ( FEATURE = NUMVALUE )
    QUESTION ::= ( FEATURE > NUMVALUE )
    QUESTION ::= ( FEATURE < NUMVALUE )
    QUESTION ::= ( FEATURE matches REGEX )
    ANSWER-NODE ::= CLASS-ANSWER || REGRESS-ANSWER
    CLASS-ANSWER ::= ( (VALUE0 PROB) (VALUE1 PROB) ... MOST-PROB-VALUE )
    REGRESS-ANSWER ::= ( ( STANDARD-DEVIATION MEAN ) )

Note that answer nodes are distinguished by their car not being atomic.

The interpretation of a tree is with respect to a Stream_Item The FEATURE in a tree is a standard feature (see section 14.6 Features).

The following example tree is used in one of the Spanish voices to predict variations from average durations.

(set! spanish_dur_tree
 '
(set! spanish_dur_tree
 '
   ((R:SylStructure.parent.R:Syllable.p.syl_break > 1 ) ;; clause initial
    ((R:SylStructure.parent.stress is 1)
     ((1.5))
     ((1.2)))
    ((R:SylStructure.parent.syl_break > 1)   ;; clause final
     ((R:SylStructure.parent.stress is 1)
      ((2.0))
      ((1.5)))
     ((R:SylStructure.parent.stress is 1)
      ((1.2))
      ((1.0))))))

It is applied to the segment stream to give a factor to multiply the average by.

wagon is constantly improving and with version 1.2 of the speech tools may now be considered fairly stable for its basic operations. Experimental features are described in help it gives. See the Speech Tools manual for a more comprehensive discussion of using `wagon'.

However the above format of trees is similar to those produced by many other systems and hence it is reasonable to translate their formats into one which Festival can use.

25.3 Ngrams

Bigram, trigrams, and general ngrams are used in the part of speech tagger and the phrase break predicter. An Ngram C++ Class is defined in the speech tools library and some simple facilities are added within Festival itself.

Ngrams may be built from files of tokens using the program ngram_build which is part of the speech tools. See the speech tools documentation for details.

Within Festival ngrams may be named and loaded from files and used when required. The LISP function load_ngram takes a name and a filename as argument and loads the Ngram from that file. For an example of its use once loaded see `src/modules/base/pos.cc' or `src/modules/base/phrasify.cc'.

25.4 Viterbi decoder

Another common tool is a Viterbi decoder. This C++ Class is defined in the speech tools library `speech_tooks/include/EST_viterbi.h' and `speech_tools/stats/EST_viterbi.cc'. A Viterbi decoder requires two functions at declaration time. The first constructs candidates at each stage, while the second combines paths. A number of options are available (which may change).

The prototypical example of use is in the part of speech tagger which using standard Ngram models to predict probabilities of tags. See `src/modules/base/pos.cc' for an example.

The Viterbi decoder can also be used through the Scheme function Gen_Viterbi. This function respects the parameters defined in the variable get_vit_params. Like other modules this parameter list is an assoc list of feature name and value. The parameters supported are:

Relation: The name of the relation the decoeder is to be applied to.
cand_function: A function that is to be called for each item that will return a list of candidates (with probilities).
return_feat: The name of a feature that the best candidate is to be returned in for each item in the named relation.
p_word: The previous word to the first item in the named relation (only used when ngrams are the "language model").
pp_word: The previous previous word to the first item in the named relation (only used when ngrams are the "language model").
ngramname: the name of an ngram (loaded by ngram.load) to be used as a "language model".
wfstmname: the name of a WFST (loaded by wfst.load) to be used as a "language model", this is ignored if an ngramname is also specified.
debug: If specified more debug features are added to the items in the relation.
gscale_p: Grammar scaling factor.

Here is a short example to help make the use of this facility clearer.

There are two parts required for the Viterbi decode a set of candidate observations and some "language model". For the math to work properly the candidate observations must be reverse probabilities (for each candidiate as given what is the probability of the observation, rather than the probability of the candidate given the observation). These can be calculated for the probabilties candidate given the observation divided by the probability of the candidate in isolation.

For the sake of simplicity let us assume we have a lexicon of words to distribution of part of speech tags with reverse probabilities. And an tri-gram called pos-tri-gram over ngram sequences of part of speech tags. First we must define the candidate function

(define (pos_cand_function w)
 ;; select the appropriate lexicon
 (lex.select 'pos_lex)
 ;; return the list of cands with rprobs
 (cadr 
  (lex.lookup (item.name w) nil)))

The returned candidate list would look somthing like

( (jj -9.872) (vbd -6.284) (vbn -5.565) )

Our part of speech tagger function would look something like this

(define (pos_tagger utt)
  (set! get_vit_params
        (list
         (list 'Relation "Word")
         (list 'return_feat 'pos_tag)
         (list 'p_word "punc")
         (list 'pp_word "nn")
         (list 'ngramname "pos-tri-gram")
         (list 'cand_function 'pos_cand_function)))
  (Gen_Viterbi utt)
  utt)

this will assign the optimal part of speech tags to each word in utt.

25.5 Linear regression

The linear regression model takes models built from some external package and finds coefficients based on the features and weights. A model consists of a list of features. The first should be the atom Intercept plus a value. The following in the list should consist of a feature (see section 14.6 Features) followed by a weight. An optional third element may be a list of atomic values. If the result of the feature is a member of this list the feature's value is treated as 1 else it is 0. This third argument allows an efficient way to map categorical values into numeric values. For example, from the F0 prediction model in `lib/f2bf0lr.scm'. The first few parameters are

(set! f2b_f0_lr_start
'(
   ( Intercept 160.584956 )
   ( Word.Token.EMPH 36.0 )
   ( pp.tobi_accent 10.081770 (H*) )
   ( pp.tobi_accent 3.358613 (!H*) )
   ( pp.tobi_accent 4.144342 (*? X*? H*!H* * L+H* L+!H*) )
   ( pp.tobi_accent -1.111794 (L*) )
   ...
)

Note the feature pp.tobi_accent returns an atom, and is hence tested with the map groups specified as third arguments.

Models may be built from feature data (in the same format as `wagon' using the `ols' program distributed with the speech tools library.

Go to the first, previous, next, last section, table of contents.