Edinburgh Speech Tools  2.4-release
 All Classes Functions Variables Typedefs Enumerations Enumerator Friends Pages
ch_wave

Audio file manipulation

Synopsis

ch_wave [input file0] [input file1] ... -o [output file] [-h ] [-itype string] [-n int] [-f int] [-ibo string] [-iswap ] [-istype string] [-c string] [-start float] [-end float] [-from int] [-to int] [-o ofile] [-otype string] [-F int] [-obo string] [-oswap ] [-ostype string] [-scale float] [-scaleN float] [-lpfilter int] [-hpfilter int] [-forder int] [-fafter ] [-info ] [-add ] [-pc string] [-key ifile] [-divide. ] [-divide ] [-ext string] [-extract string]

ch_wave is used to manipulate the format of a waveform file. Operations include:

  • file format conversion
  • resampling (changing the sampling frequency)
  • byte-swapping
  • making multiple input files into a single multi-channel output file
  • making multiple input files into a single single-channel output file
  • extracting a single channel from a multi-channel waveform
  • scaling the amplitude of the waveform
  • low pass and high pass filtering
  • extracting a time-delimited portion of the waveform

ch_wave is a executable program that serves as a wrap-around for the EST_Wave class and the basic wave manipulation functions. More advanced waveform processing is performed by the signal processing library.

Options

  • -h: Options help
  • -itype: string Input file type (optional). If set to raw, this indicates that the input file does not have a header. While this can be used to specify file types other than raw, this is rarely used for other purposes as the file type of all the existing supported types can be determined automatically from the file's header. If the input file is unheadered, files are assumed to be shorts (16bit). Supported types are nist, est, esps, snd, riff, aiff, audlab, raw, ascii
  • -n: int Number of channels in an unheadered input file
  • -f: int Sample rate in Hertz for an unheadered input file
  • -ibo: string Input byte order in an unheadered input file: possibliities are: MSB , LSB, native or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
  • -iswap: Swap bytes. (For use on an unheadered input file)
  • -istype: string Sample type in an unheadered input file: short, mulaw, byte, ascii
  • -c: string Select a single channel (starts from 0). Waveforms can have multiple channels. This option extracts a single channel for progcessing and discards the rest.
  • -start: float Extract sub-wave starting at this time, specified in seconds
  • -end: float Extract sub-wave ending at this time, specified in seconds
  • -from: int Extract sub-wave starting at this sample point
  • -to: int Extract sub-wave ending at this sample point
  • -o: ofile Output filename. If not specified output is to stdout.
  • -otype: string Output file type, (optional). If no type is Specified the type of the input file is assumed. Supported types are: nist, est, esps, snd, riff, aiff, audlab, raw, ascii
  • -F: int Output sample rate in Hz. If this is different from the input sample rate, resampling will occur
  • -obo: string Output byte order: MSB, LSB, native, or nonnative. Suns, HP, SGI Mips, M68000 are MSB (big endian) Intel, Alpha, DEC Mips, Vax are LSB (little endian)
  • -oswap: Swap bytes when saving to output
  • -ostype: string Output sample type: short, mulaw, byte or ascii
  • -scale: float Scaling factor. Increase or descrease the amplitude of the whole waveform by the factor given
  • -scaleN: float Scaling factor with normalization. The waveform is scaled to its maximum level, after which it is scaled by the factor given
  • -lpfilter: int Low pass filter, with cutoff frequency in Hz Filtering is performed by a FIR filter which is built at run time. The order of the filter can be given by -forder. The default value is 199
  • -hpfilter: int High pass filter, with cutoff frequency in Hz Filtering is performed by a FIR filter which is built at run time. The order of the filter can be given by -forder. The default value is 199.
  • -forder: int Order of FIR filter used for lpfilter and hpfilter. This must be ODD. Sensible values range from 19 (quick but with a shallow rolloff) to 199 (slow but with a steep rolloff). The default is 199.
  • -fafter: Do filtering after other operations such as resampling (default : filter before other operations)
  • -info: Print information about file and header. This option gives useful information such as file length, sampling rate, number of channels etc No output is produced
  • -add: A new single channel waveform is created by adding the corresponding sample points of each input waveform
  • -pc: string Combine input waveforms to form a single multichannel waveform. The argument to this option controls how long the new waveform should be. If the option is LONGEST, the output wave if the length of the longest input wave and shorter waves are padded with zeros at the end. If the option is FIRST, the length of the new waveform is the length of the first file on the command line, and subsequent waves are padded or cut to this length
  • -key: ifile Label file designating subsections, for use with
  • -divide.: The KEYLAB file is a label file which specifies where chunks (such as individual sentences) in a waveform begin and end. See section of wave extraction.
  • -divide: Divide a single input waveform into multiple output waveforms. Each output waveform is extracted from the input waveform by using the KEYLAB file, which specifies the start and stop times for each chunk. The output files are named according to the filename in the KEYLAB file, with extension given by -ext. See section on wave extraction
  • -ext: string File extension for divided waveforms
  • -extract: string Used in conjunction with -key to extract a single section of waveform from the input waveform. The argument is the name of a file given in the file column of the KEYLAB file.

Making multiple waves into a single wave

If multiple input files are specified, by default they are concatenated into the output file.

$ ch_wave kdt_010.wav kdt_011.wav kdt_012.wav kdt_013.wav -o out.wav

In the above example, 4 single channel input files are converted to one single channel output file. Multi-channel waveforms can also be concatenated provided they all have the same number of input channels.

Multiple input files can be made into a multi-channel output file by using the -pc option:

$ ch_wave kdt_010.wav kdt_011.wav kdt_012.wav kdt_013.wav -o -pc LONGEST out.wav

The argument to -pc can either be LONGEST, in which the output waveform is the length of the longest input file, or FIRST in which it is the length of the first input file.

Extracting channels from multi-channel waves

The -c option is used to specify channels which should be extracted from the input. If the input is a 4 channel wave,

$ ch_wave kdt_m.wav -o a.wav -c "0 2"

will extract the 0th and 2nd channel (counting starts from 0). The argument to -c can be either a single number of a list of numbers (wrapped in quotes)

Extracting of a single region from a waveform

There are several ways of extracting a region of a waveform. The simplest way is by using the start, end, to and from commands to delimit a sub portion of the input wave. For example,

$ ch_wave kdt_010.wav -o small.wav -start 1.45 -end 1.768

extracts a subwave starting at 1.45 seconds and extending to 1.768 seconds.

Alternatively,

$ ch_wave kd_010.wav -o small.wav -from 5000 -to 10000

extracts a subwave starting at 5000 samples and extending to 10000 samples. Times and samples can be mixed in sub-wave extraction. The output waveform will have the same number of channels as the input waveform.

Extracting of a multiple regions from a waveform

Multiple regions can be extracted from a waveform, but as it would be too complicated to specify the start and end points on the command line, a label file with start and end points, and file names is used.

The file is called a key label file and in xwaves label format looks like:

separator ;
#
0.308272  121 sil ;     file kdt_010.01 ;
0.440021  121 are ;     file kdt_010.02 ;
0.512930  121 your ;    file kdt_010.03 ;
0.784097  121 grades ;  file kdt_010.04 ;
1.140969  121 higher ;  file kdt_010.05 ;
1.258647  121 or ;      file kdt_010.06 ;
1.577145  121 lower ;   file kdt_010.07 ;
1.725516  121 than ;    file kdt_010.08 ;
2.315186  121 nancy's ; file kdt_010.09 ;

Each line represents one region. The first column is the end time of that region and the start time of the next. The next two columns are colour and an arbitrary name, and the filename in which the output waveform is to be stored is kept as a field called file in the last column. In this example, each region corresponds to a single word in the file.

If the above file is called "kdt_010.words.keylab", the command:

$ ch_wave kdt_010.wav -key kdt_010.words -ext .wav -divide

will divide the input waveform into 9 output waveforms called kdt_010.01.wav, kdt_010.02.wav ... kdt_010.09.wav. The -ext option specifies the extension of the new waveforms, and the -divide command specifies that division of the entire waveform is to take place.

If only a single file is required the -extract option can be used, in which case its argument is the filename required.

$ ch_wave kdt_010.wav -key kdt_010.words -ext .wav -extract kdt_010.03 \
      -o kdt_010.03.wav

Note that an output filename should be specified with this option.

Adding headers and format conversion

It is usually a good idea for all waveform files to have headers as this way different byte orders, sampling rates etc can be handled safely. ch_wave provides a means of adding headers to raw files.

The following adds a header to a file of 16 bit shorts

$ ch_wave kdt_010.raw1 -o kdt_010.h1.wav -otype nist -f 16000 -itype raw

The following downsamples the input to 8 KHz

$ ch_wave kdt_010.raw1 -o kdt_010.h2.wav -otype nist -f 16000  \
          -F 8000 -itype raw

The following takes a 8K ulaw input file and produces a 16bit, 20Khz output file:

$ ch_wave kdt_010.raw2 -o kdt_010.h3.wav -otype nist -istype ulaw \
              -f 8000 -F 20000 -itype raw