Characterful Speech Synthesis

Dr Matthew Aylett, CTO CereProc
Wednesday, 25 November, 2009 - 14:00
Sala de Documentação, Floor 2, UMa

Speech synthesis is a key enabling technology for pervasive computing and the personification of autonomous agents as well as a key requirement for accessability. In this talk I will present the current state of the art speech synthesis technique 'unit selection' and how to integrate this synthesis technology into common applications.

I will go on to argue that current approaches to sythesis, and current commercial pressures, make it difficult for many systems to create characterful synthesis. We will present how CereProc's approach differs from the industry standard and how we have attempted to maintain and increase the characterfullness of CereVoice's output.

We will outline the expressive synthesis markup that is supported by the system, how these are expressed in underlying digital signal processing and selection tags. Finally we will present the concept of second pass synthesis where cues can be manually tweaked to allow direct control of intonation style, and where synthesis can be seamlessly mixed with pre-recorded prompts to produce extremely natural output.

We will also demonstrate how we can use synthesis to 'clone' celebrity voices with a brief demonstration of voices copied from George W. Bush and James May of Top Gear. (e.g http://www.idyacy.com/cgi-bin/bushomatic.cgi)

Time permiting I will also demonstrate some experiments looking at hybrid approaches to parametric/unit selection synthesis.

Bio:

Matthew Aylett has been involved in speech technology as a student and researcher since 1994. He obtained an MSc in speech and language processing (Distinction) from the University of Edinburgh in 1995. Subsequently he worked as a research associate on spoken dialogue whilst pursuing a PhD (awarded in 2000) focused on phonetic and prosodic analysis of spontaneous speech.

In April 2000, he joined the R&D team of Edinburgh University spin-out Rhetorical Systems Ltd. He played a fundamental role in both designing and building the rVoice speech synthesiser. Other key contributions included work on prosodic modelling and intelligibility. He continued to publish research work over this period at an international level.

In 2005 he took a research sabbatical at the prestigious International Computer Science Institute (ICSI), Berkeley, where he worked on the prosodic analysis of dialogue. He returned to Edinburgh and founded Cereproc Ltd in 2006 with the aim of creating commercially available, characterful speech synthesis. In 2007 Cereproc released the first
commercial synthesis to allow modification of voice quality for adding underlying emotion to voices. He has remained active both commercially, where he dictates Cereproc's technical strategy, and academically, as a research fellow at CSTR, focusing on novel speech synthesis techniques.

http://www.cogsci.ed.ac.uk/~matthewa and http://www.cereproc.com