February Fourier Talks 2011

Jeffrey Woodard

Title:

Bag-of-Words Computer Vision Methods for Forensic Applications

Abstract:

This talk reviews a family of computer vision methods that require no human involvement like explicit segmentation of objects, manual preprocessing, or supervised training.  The statistical Bag-of-Words approach name originated because of its text documents retrieval roots, and also because in computer vision tasks spatial information is discarded, as if one were throwing visual words into a “bag.”  The talk discusses the application of these methods mainly to the forensic domain of automated writer recognition of scanned documents.

In a three phase approach, signal processing first represents images by local feature vectors that are distinctive and also reasonably resistant to moderate sources of image variation like rotations and scale.  The local vectors are next vector quantized into visual “words” using unsupervised clustering method.  Third and finally, unsupervised training and classification is performed based either by a generative technique that originated from text document retrieval, called Probabilistic Latent Semantic Analysis, or by a direct matching of visual word histograms.

Significantly, the statistically based Bag-of-Words approach does not explicitly incorporate linguistic or other domain specific knowledge.  Therefore, this automated approach can work reliably on a wide variety of photographic databases. The methods have been applied in the writer recognition task to varying languages and corpuses without extensive and time-consuming re-engineering, and if fact, has been readily extended to non-linguistic tasks like insect species classification. 

These methods have been rigorously evaluated on databases of Arabic and Dutch handwritten text. With a strict leave-one-out classification paradigm, the best result for the Arabic database of 51 writers is an accuracy of 98.7 % Rank-1 retrieval.  The best result to date for the more difficult Dutch database of 250 writers is an accuracy of 95 % Rank-1 retrieval. This result far exceeds 84 %, the best published results on the same database.