First, we show that the embedding property of neural networks can be used to lower the amount of expertise in unit selection speech synthesis. This manuscript presents three contributions. 2) To study whether it is possible to produce speech corresponding to different speakers, with their respective tone and regionalism accent.
1) To study whether it is possible alleviate the need for human linguistic expertise to build or modify a TTS system. For speech synthesis to become universal in its usage and abilities, it must be easily customizable while being able to produce widely varied speech. Many commercial systems rely on human linguistic expertise, while being limited to synthesize speech for a single speaker voice and speaking style. Text-to-speech synthesis (TTS) turns a written text into an audio speech signal.
It is also an ideal reference for practitioners in the fields of human communication interaction and telephony. Weaving together the various strands of this multidisciplinary field, the book is designed for graduate students in electrical engineering, computer science, and linguistics. Including coverage of the very latest techniques such as unit selection, hidden Markov model synthesis, and statistical text analysis, explanations of the more traditional techniques such as format synthesis and synthesis by rule are also provided. Introductory chapters on linguistics, phonetics, signal processing and speech signals lay the foundation, with subsequent material explaining how this knowledge is put to use in building practical systems that generate speech. Giving an in-depth explanation of all aspects of current speech synthesis technology, it assumes no specialized prior knowledge. Text-to-Speech Synthesis provides a complete, end-to-end account of the process of generating speech by computer.