the University of Tokyo
| JAPANESE |
Graduate School of Information Science and Technology, The University of Tokyo
Access & Campus Map
Education & Research
Introduction of IST
Departments and Faculty
 
Computer Science
  Mathematical Informatics
  Information Physics & Computing
  Information & Communication Engineering
  Mechano-Informatics
  Creative Informatics
International Center for Information Science and Technology (ICIST)
International Cooperation
Admissions
Home > Professors
Professors
2010/06/02

Enabling a machine to speak like a human

Professor Keikichi Hirose
(Department of Information and Communication Engineering)

Symbiosis between human and machine begins once a robot is equipped with abilities to speak with accent and intonation just like a human. Professor Hirose promotes speech synthesis research to realize “dialogues” between human and machine through speech for the time to come. His main research focus is to enable a machine to be able to freely manipulate prosody including accent and intonation.

Speech has two aspects ---- “sound” when it is pronounced and “prosody” when it is spoken such as accent and intonation, and prosody is an important element to deliver a spontaneous speech. Speech synthesis research reflects both aspects. If a machine can incorporate the mechanism of prosody, which is easily practiced by a human as a means of communication, and can handle pleasant dialogues without discordance in speech, it could become a true human partner.

Although fundamentals of the mechanisms explaining how sound is generated by vibrating vocal cord and is delivered from a mouth as a speech are clear, structures of the human mouth and nose are complex and have plasticity, which dynamically yet delicately changes sound. Because how it happens is not fully captured, machine speech is still not even close to human sound. It is a headache shared by all speech researchers. Professor Hirose provided insights to ensure high quality sound by adding a prosody model to the statistical processing method used in speech recognition (Hidden Markov Model) in pursuit of making a breakthrough to give humanlike speech abilities to a machine.

If a machine can speak with humanlike intonation, this would dramatically change the animation world. Today voice actors read the lines according to characters' motion, however, if a machine is enabled to speak with the exact voice expected by the director, it would give immeasurable impact on films and TV. Speech synthesis technology is applied to English language education for Japanese and Japanese language education for foreigners. Professor Hirose practices all-round research in speech, and carries the weight of expectations of delivering the surprise of voice to us.


Graduate School of Information Science and Technology
the University of Tokyo