Go to the first, previous, next, last section, table of contents.

Introduction to the Edinburgh Speech Tools

Here is some justification behind the design of the Edinburgh Speech Tools.

Why do we need such a thing?

Most speech researchers spend a considerable amount of time writing, developing and debugging code. In fact, many researchers spend most of their time doing this. The sad fact is that most of this time is spent on unnecessary tasks - time which could be better spent doing "real" research. The library is intended to provide software that programmers use day-to-day, and provide this in an easy to use fashion.

An additional problem arises with incompatibility. How may times have you spent trying to change one file format into another, or struggled adapting a function written in one program to another? Such problems are often caused by lack of standardisation. The library provides easy standardisation and support for a variety of systems.

What is a Library?

The Edinburgh Speech Tools Library has two main parts: a software library and a set of programs which use the library.

A library is a single central place where useful software is kept. A UNIX library is a single file (in this case called libestools.a) which can be linked to an individual program. When writing a program, you can call any of the functions in the library, and they will automatically be linked into your program when you compile. The key point is that you never need look at the library itself or copy the code in it. That way you can write small programs, concentrate on the algorithms and not have to worry about any infrastructure issues.

The speech tools also provide a number of utility programs for things like playback, sampling rate conversion file format conversion etc. Usually these programs are just wrap-around executables based on standard speech tools library functions.

What does the library contain?

The library is written in C++. It contains the following:

Speech and Linguistic classes: There are C++ classes for commonly used speech and language data types, including waveforms, tracks, labels, grammars etc.
Audio playback: Easy to use routines to record and play audio data without any fuss.
Signal processing: Commonly used signal processing algorithms such as Cepstra and LPC.
Statistical functions.
Utility Functions and Class Useful classes such as lists, vectors, matrices, strings and functions for reading files, parsing command lines etc.

How is it used?

All the necessary documentation should be provided on-line, but the code is well commented and the ultimate authority is always the source.

You may simply use the library with your programs as follows. Let us assume that the veriable ESTDIR contains the pathname of your installed version of the speech tools (e.g. `/lusr/local/est')

gcc myfile.cc -I$ESTDIR/include -L$ESTDIR/lib -lestools -leststring

Why doesn't this library exist already?

One might think that it is strange that such a library doesn't (to our knowledge) exist already.

There are a number of existing speech libraries, but none of them are quite suitable for our purposes. Of course there will always be functions that you'd like which don't exist, but in addition to that, there are some key reasons why existing libraries are not of sufficient standard. In fact one can spot the same mistakes being made again and again. For example:

Proprietary software

There are some commercial packages available, which duplicate some functionality of the library, but these are restriced in that often access to the source code and redistribution is not allowed.

File formats

Nearly all systems assume that the user will use the file format of the library and no other. This is a ridiculous assumption, and ensures that users will spend hours converting files in one format to another. While it is indisputable that everyone would benefit from standard files formats, there isn't a standard at present, and realistically there never will be.

Bad programming style.

A system like Audlab (CSTR's previous in-house speech library) has much of the functionality that one would require of a speech library. However, the programming style prevents easy use. The classic mistake is lack of modularity. It is common in Audlab code to see file i/o mixed in with array mallocing, mixed in with the algorithms themselves. While this may have been justifiable once, it is not now. The reason this is such a disaster relates to the previous problem. If one wanted to change an Audlab program from using Audlab files to another file type, it requires a fundamental restructuring of the entire program.

Bad development setup

A library should not be a static thing, but rather should be added to as time goes on. This development must be done in a co-ordinated manner. What is disastrous is for every developer to have their own privtae version. If this happens, the different version soon become incompatible and one programmers improvements are unusable by another.

User-Friendliness

Libraries are often disliked because they are difficult to use. It goes without saying that a library which is unobtrusive and easy to use will be popular.

Lack of documentation

Documentation is one of the biggest problems is software engineering for several reasons. Firstly, it is difficult to write - one has to imagine the type of situations the users will need help on. Secondly, as the software changes so must the documentation. Keeping it up to date is often difficult. And finally, perhaps the most relevant point is that most programmers hate writing documentation!

These points should make clear some of the mistakes that have been made in previous library development - mistakes which we hope to avoid with the speech tools. The library is still being written and certainly does not provide as yet a complete solution to the above mentioned problems. However, we hope it is a substantial step in the right direction.

Go to the first, previous, next, last section, table of contents.