Creating support for new Indic languages

If you have a language that is not included in the set of languages with explicit support, you can add support quite easily. First, set up an Indic voice as described above and also complete steps till prune_silence with your wav files. There are three files in the festvox folder in the voice directory that you need to pay attention to: indic_lexicon.scm, indic_utf8_ord_map.scm and unicode_sampa_map_new.scm.

As the names suggest, the indic_lexicon file is the one where many language-specific rules are specified. We will come back to this file later. The indic_utf8_ord_map is a mapping from the Unicode characters in the language to the corresponding ordinals. You will need to create this mapping for your language and add it to this file. You can use something like the ord() function in Python to generate this.

Next, we need to map the characters in your language to the phonemes they correspond to. This is done in the unicode_sampa_map_new file. You need to map the list of ordinals that you generated earlier to SAMPA phones and add it to this file.

Lastly, you should now add language specific rules to the indic_lexicon file. First, change the name of the language which is by default Hindi. We will assume that our new language is called AnotherLang, and it has Tamil's voicing rules and Bengali's final schwa deletion rules.

(defvar lex:language 'AnotherLang)

(define (delete_final_schwa)
  "(delete_final_schwa)
Returns t if final schwa is deleted in the current language"
  (member lex:language '(Hindi Gujarati Rajasthani Bengali Assamese AnotherLang)))

        (if (eq lex:language 'Tamil)
                (set! phones (tamil_voicing_postfixes phones)))
        (if (eq lex:language 'AnotherLang)
                (set! phones (tamil_voicing_postfixes phones)))

Next, you need to define ranges of characters in your language that fall into categories like independent vowels, consonants, nukta etc. Once again, you will use the ordinals that you mapped for this. See the section of the file that begins with

(set! indic_char_type_ranges

You can also add support for saying English words that appear in your Indic language document. You can do this by mapping the SAMPA phonemes in your language to English phonemes. See the section of the file that begins with

;;; CMU SAMPA   Comments

Make other language-specific changes as shown above. Then, you can continue as usual from the build_prompts step to build your voice. If the build_prompts step has errors it means that some characters are probably missing from your mapping.