February Fourier Talks 2018

Ernest Fokoue

Rochester Institute of Technology


Random Subspace Learning for Prediction in Ultra-High Dimensional Spaces


Ultra-high dimensional data sets, a kind of big data with p>>>n, has recently come under considerable scrutiny in statistical machine learning and data science in general, because of their ubiquitousness in many applications. DNA Microarray Gene Expression datasets, hyperspectral image data sets, corpora of texts of statistical document analysis, just to name a few, all fall in this class of machine learning problems known as ultra-high dimensional. Typically, three main approaches ae used to address the statistical learning tasks associated with this kind of data, namely (a) selection (b) regularization and (c) aggregation of ensembles of lower dimensional base learners. In this talk, I will present a random subspace learning approach that cleverly exploits the individual a priori predictive “strength” of each dimension to build ensembles of base learners for high accurate predictions in both classification and regression. One of the greatest advantages of our approach lies in its flexibility in the choice of base learners, in the sense that unlike random forest, our method seamlessly allows any base learner, from linear regression, to logistic regression to trees to support vector machines, just to name a few. I demonstrate the strengths of our method on artificial and real data. Keywords: Ultra-high dimensional spaces, random subspace learning, prediction, base learner

Back to FFT2018 speakers
Back to FFT2018 home