On 17 June 2015, at 16:00 in TUT room ICT-638

Unsupervised Word Segmentation and Lexicon Discovery from Speech using Acoustic Word Embeddings

a seminar by Sharon Goldwater

School of Informatics, University of Edinburgh, UK


Abstract

The current generation of automatic speech recognition systems has been commercially successful in applications ranging from dictation software to speech interfaces for mobile devices. Yet, with current supervised techniques, developing a high quality system for a new language requires hundreds of hours of transcribed speech and a large expert-produced pronunciation dictionary. While possible for some languages, this approach is untenable if we hope to develop speech technology for most of the 7000 languages in the world (including systems that might help to document the many endangered languages).

This talk describes our initial work on developing *unsupervised* speech technology, where the learning is more analogous to what human infants do: from speech audio alone, the system must learn to segment the speech stream into word tokens and cluster repeated instances of the same word together to learn a lexicon of vocabulary items. Our approach combines ideas originally developed for cognitive modelling of human language acquisition with those from speech technology and machine learning. I will present our early results on a small-vocabulary English corpus (TIDIGITS) and discuss some of the challenges we will face in scaling up the system. I aim to make the talk accessible to listeners without much background in speech recognition.

Bio

Sharon Goldwater is a Reader in the Institute for Language, Cognition and Computation at the University of Edinburgh's School of Informatics. She worked as a researcher in the Artificial Intelligence Laboratory at SRI International from 1998-2000 before starting her Ph.D. at Brown University, supervised by Mark Johnson. She completed her Ph.D. in 2006 and spent two years as a postdoctoral researcher at Stanford University before moving to Edinburgh. Her current research focuses on unsupervised learning for automatic speech and language processing and computer modeling of language acquisition in children. She is particularly interested in Bayesian approaches to the induction of linguistic structure, ranging from phonetic categories to morphology and syntax.