Paradigm Change for Feature Selection
Speaker: Jiayang Sun, George Mason University/USDA
Abstract: Feature selection is essential for many applications, such as for finding important biomarkers or discovering drug targets. The popular approach to feature selection in large regression data with sparse features is to use a penalized likelihood or a shrinkage estimation, such as the procedure with a LASSO, SCAD, elastic net, or MCP penalty, or apply a tree-based procedure. We first present our paradigm change to feature selection in regression, by using a new subsampling method, called the Subsampling Winner algorithm (SWA). We address the importance of controlling the false discovery rate while pursuing important features. We compare the SWA with LASSO, SCAD, MCP, and random forest procedures and provide the final recommendations. We then challenge the foundation for feature selection in regression and suggest another paradigm change. We consider not only a feature’s effect on an outcome but also to all other co-existing features and present a network approach. Part of this talk is based on joint work with Y. Fan, J. Ma, X. Qiao, and A. Defeo.