Increasing Speaker Recognition Algorithm Agility and Effectiveness for “Unseen” Conditions.
For more than 30 years, speaker recognition developers concentrated on telephone applications, for both practical and commercial reasons. In 2005, the focus began to shift to microphone collections in rooms, as technology changed (e.g. VOIP) and new applications appeared. Recognition performance improved, but only after a large training set was provided. This is impractical for all but the most controlled applications. Our task in recent years has been to wean developers from this extremely expensive data requirement. We have done so by encouraging:
1. The development of new features which are robust to wideband noise and room reverberation
During the same period, dimension reduction techniques based on Eigen-analysis (e.g. Joint Factor Analysis, PPCA) have also increased robustness substantially. These processes began to appear when cell-phones were introduced into the NIST evaluation in 2004, and continue to have a major impact. The latest methods are computationally efficient, and appear to be easing the “calibration” problem inherent in any biometric --- i.e. setting a consistent yes/no threshold over a wide range of input conditions. In the future, we hope to incorporate both supervised and unsupervised training into speaker recognition systems, permitting fast performance optimization when encountering new environments.