Go to the first, previous, next, last section, table of contents.

Executable Programs

This section gives a brief description of the executable programs available with the speech tools. Most of these programs are simple wrap-around main() functions to library routines.

Many of these programs have man pages. Please consult the man pages for more detailed information. Most programs print a summary of their command line options when given the -help flag. Some programs are "finished", while others are still "in progress". The finished programs should be well documented and stable. The "in progress" programs are near completion but typically still require some work regarding user interfaces and documentation.

Data manipulation programs

ch_wave

Changes waveform file formats, performs re-sampling and scaling, prints information on waveform headers etc.

ch_track

Changes track file formats, converts track files into label files, smoothes tracks, re-samples tracks. Tracks are for F0, LPC coefficients, ceptra and such like.

ch_lab

Changes label file formats, converts label files into track files, performs one-to-one mapping of labels from one set to another, performs context sensitive label re-writing.

Audio Playback

na_play

Plays arbitrary waveform files on a variety of hardware audio devices. Can perform re-sampling to match audio device capability. `na_play' has support for a number of audio devices. Compile time options specify which devices are supported. Note you must actually have these devices on your machine before `na_play' can play any waveform.

`na_play', depending on compile-time options, supports the following audio devices, specified by the `-p' command.

sunaudio: 8k ulaw direct to `/dev/audio' found on most Sun machines. This is also found under Linux and FreeBSD, and possibly others. This is the default if netaudio is not supported.
netaudio: NCD's network transparent audio system (NAS). This allows use of audio devices across a network. NAS has support for, Suns, Linux, FreeBSD, HPs and probably other machines by now.
sun16audio: This is only available on newer Sun workstations and has been enabled at compile time. This provides 16bit linear PCM at various sample rates.
linux16audio: This is only available on Linux workstations and has been enabled at compile time. This provides 16bit linear PCM at various sample rates.
freebsd16audio: This is only available on workstations running FreeBSD and has been enabled at compile time. This provides 16bit linear PCM at various sample rates.
mplayeraudio: This is only available under Windows NT 4.0 and Windows 95 and has been enabled at compile time. This provides 16bit linear PCM at various sample rates.
win32audio This is only available under Windows NT 4.0 and Windows 95 and has been enabled at compile time. This provides 16bit linear PCM at various sample rates, playing the audio directly rather than saving to a file as with mplayeraudio.
irixaudio Audio support for SGI's IRIX 6.2.
Audio_Command: Allows the specification of an arbitrary UNIX command to play the waveform. This won't normally be used with na_play as you could just use the command directly but is necessary with some systems using the speech tools.

The default audio is netaudio if it is supported. If not the platform specific auido mode is the default (e.g. sun16audio, linux16audio, freebsd16audio or mplayeraudio). If none of these is supported, sunaudio is the default. The Audio_Command method is always an option.

Signal Processing

pda

Pitch tracker based on super resolution pitch determination (srpd). Takes waveforms (of any type) as input and produces F0 contours.

icda

Pitch tracker with smoothing based on super resolution pitch determination (srpd). Takes waveforms (of any type) as input and produces F0 contours. Smoothing involes median smoothing of the pda output and interpolation through unvoiced regions.

sig2fv

Basic signal processing functions allowing generation of LPC coefficents, cepstra, mel cepstra etc at pitch synchronous and fixed intervals. Also allows generation of delta and delta coefficients.

sigfilter

Signal filter, used for generating LPC residuals amongst others.

Speech Recognition

viterbi (in progress)

A straightforward Viterbi decoder, using an ngram language model (which can be estimated using build_ngram, and a sequence of observation probability vectors.