Flite: a small run-time speech synthesis engine version 1.3-release Copyright Carnegie Mellon University 1999-2005 All rights reserved http://cmuflite.org Flite is a small fast run-time speech synthesis engine. It is part of the suite of free software synthesis tools including University of Edinburgh's Festival Speech Synthesis System and Carnegie Mellon University's FestVox project, tools, scripts and documentation for building synthetic voices. However, flite itself does not require either of these systems to compile and run. New in 1.3-release (October 2005) o fixes to lpc residual extraction to give better quality output o An updated lexicon (festlex_CMU from festival-2.0) and better compression its about 30% of the previous size, with about the same accuracy o Fairly substantial code movements to better support PalmOS and multi-platform cross compilation builds o A PalmOS 5.0 port with an small example talking app ("flop") o runs under ix86_64 linux The core Flite library was developed by Alan W Black (mostly in his so-called spare time) while employed in the Language Technologies Institute at Carnegie Mellon University. The name "flite", originally chosen to mean "festival-lite" is perhaps doubly appropriate as a substantial part of design and coding was done over 30,000ft while awb was travelling, and usually isn't in meetings. The voices, lexicon and language components of flite, both their compression techniques and their actual contents were developed by Kevin A. Lenzo and Alan W Black . Flite is the answer to the complaint that Festival is too big, too slow, and not portable enough. o Flite is designed for very small devices, such as PDAs, and also for large server machines with lots of ports. o Flite is not a replacement for Festival but an alternative run time engine for voices developed in the FestVox framework where size and speed is crucial. o Flite is all in ANSI C, it contains no C++ or Scheme, thus requires more care in programming, and is harder to customize at run time. o It is thread safe o Voices, lexicons and language descriptions can be compiled (mostly automatically for voices and lexicons) into C representations from their FestVox formats o All voices, lexicons and language model data are const and in the text segment (i.e. they may be put in ROM). As they are linked in at compile time, there is virtually no startup delay. o Although the synthesized output is not exactly the same as the same voice in Festival they are effectively equivalent. That is flite doesn't sound better or worse than the equivalent voice in festival, just faster, smaller and scalable. o For standard diphone voices, maximum run time memory requirements are approximately less than twice the memory requirement for the waveform generated. For 32bit archtectures this effectively means under 1M. (Later versions will include a streaming option which will reduce this to less than one quarter). o The flite program supports, synthesis of individual strings or files (utterance by utterance) to direct audio devices or to waveform files. o The flite library offers simple functions suitable for use in specific applications. Flite is distributed with a single 8K diphone voice (derived from the cmu_us_kal voice), a pruned lexicon (derived from cmulex) and a set of models for US English. Here are comparisons with Festival using basically the same 8KHz diphone voice Flite Festival core code 60K 2.6M USEnglish 100K ?? lexicon 600K 5M diphone 1.8M 2.1M runtime <1M 16-20M On a 500Mhz PIII, a timing test of the first two chapters of "Alice in Wonderland" (doc/alice) was done. This produces about 1300 seconds of speech. With flite it takes 19.128 seconds (about 70.6 times faster than real time) with Festival it takes 97 seconds (13.4 times faster than real time). On the ipaq (with the 16KHz diphones) flite synthesizes 9.79 time faster than real time. Requirements: o A good C compiler, some of these files are quite large and some C compilers might choke on these, gcc is fine. Sun CC 3.01 has been tested too. Visual C++ 6.0 is known to fail on the large diphone database files. We recommend you use GCC under Cygwin or mingw32 instead. o GNU Make o An audio device isn't required as flite can write its output to a waveform file. Supported platforms: We have successfully compiled and run on o Various Intel Linux systems (and iPaq Linux), under various versions of GCC (2.7.2 to 4.x) o FreeBSD 3.x and 4.x o Solaris 5.7, and Solaris 9 o Initial support for Mac OS X o Windows 2000/XP under Cygwin 1.3.5 and later o Some support for WinCE (2.11 and 3.0) is included but is not complete o PalmOS 5.x devices (Treo 600, Zire 31 and Tungsten C) o Successfully compiles and runs under 64Bit Linux architectures o OSF1 V4.0 (gives an unimportant warning about sizes when compiled cst_val.c) Other similar platforms should just work, we have also cross compiled on a Linux machine for StrongARM. However note that new byte order architectures may not work directly as there is some careful byte order constraints in some structures. These are portable but may require reordering of some fields, contact us if you are moving to a new archiecture. Alan W Black email: awb@cs.cmu.edu Language Technologies Institute http://www.cs.cmu.edu/~awb/ Carnegie Mellon University tel: +1-412-268-6299 5000 Forbes Ave, Pittsburgh PA, 15213, USA. fax: +1-412-268-6298