Does it work at all?

It is very easy to build a voice and get it to say a few phrases and think that the job is done. As you build the voice it is worth testing each part as you built it to ensure it basically performs as expected. But once its all together more general tests are needed. Before you submit it to any formal tests that you will use for benchmarking and grading progrees in the voice, more basic tests should be carried out.

In fact it is stating such initial tests more concretely. Every we have ever built has always had a number mistakes in it that can be trivially fixed. Such as the mfccs were not generated after fixing the pitchmarks. Therefore you syould go through each stage of the build procedure and ensure it really did do what you though it should do, especially if you are totally convinced that section worked perfectly.

Try to find around 100-500 sentences to play through it. It is amazing home many general problems are thrown up when you extend your test set. The next stage is to play so real text. That may be news text from the web, output from your speech translation system, or some email. Initially it is worth just synthesizing the whole set without even listening to it. Problems in analysis and missing diphones etc may be shown up just in the processing of the text. Then you want to listen to the output and identify problems. This make take some amount of investigation. What you want to do is identify where the problem is, is it bad tex analysis, bad lexical entry, a prosody problem, or a waveform synthesis problem. You may need to synthesizes parts of the text in isolation (e.g. using the Festival function SayText and look at the structure of the utterance generated, e.g. using the function utt.features. For example to see what words have been identified from the text analysis

(utt.features utt1 'Word '(name))

Or to see the phones generated

(utt.features utt1 'Segment '(name))

Thus you can view selected parts of an utterance and find out if it is being created as you intended. For some things a graphical display of the utterance may help.

Once you identify where the problem is you need to decide how to fix it (or if it is worth fixing). The problem may be a number of different places:

Before rushing out and getting one hundred people to listen to your new synthetic voice, it is worth doing significant internal testing and evaluation, informally to find errors and test them. Remember the purpose of evaluation in this case is to find errors and fix them. We are not, at least not at this stage, evaluating the voices on an abstract scale, where unseen test data, and blind testing is important.