February Fourier Talks 2012

Fred Goodman


Increasing Speaker Recognition Algorithm Agility and Effectiveness for “Unseen” Conditions.


For more than 30 years, speaker recognition developers concentrated on telephone applications, for both practical and commercial reasons. In 2005, the focus began to shift to microphone collections in rooms, as technology changed (e.g. VOIP) and new applications appeared. Recognition performance improved, but only after a large training set was provided. This is impractical for all but the most controlled applications. Our task in recent years has been to wean developers from this extremely expensive data requirement. We have done so by encouraging:

1. The development of new features which are robust to wideband noise and room reverberation
2. The use of simulated noise and channel data for development.
3. The collection of a wide range of speech variations (e.g. vocal effort)
4. The development of signal characterization techniques to determine speech, noise & room conditions. This information can then be used to modify algorithm parameters at match time.

During the same period, dimension reduction techniques based on Eigen-analysis (e.g. Joint Factor Analysis, PPCA) have also increased robustness substantially. These processes began to appear when cell-phones were introduced into the NIST evaluation in 2004, and continue to have a major impact. The latest methods are computationally efficient, and appear to be easing the “calibration” problem inherent in any biometric --- i.e. setting a consistent yes/no threshold over a wide range of input conditions. In the future, we hope to incorporate both supervised and unsupervised training into speaker recognition systems, permitting fast performance optimization when encountering new environments.