The series hosts a seminar every other week on current research topics. The seminar often features an invited guest speaker and occasionally local faculty members, students or others affiliated with the department. The usual time of the seminar is 3:30-4:30 pm on Fridays. Professors Tatiyana V Apanasovich (firstname.lastname@example.org), Qing Pan (email@example.com) and Emre Barut (firstname.lastname@example.org ) are the Seminar Series Coordinators.
Date: Friday, April 7th, 11:00am-12:00pm
Location: Duques Hall, Room 251
Title: Experimental Design Methods for Large-scale Statistical Computation and Distributed Computing
Speaker: Peter Qian, Professor of Statistics at University of Wisconsin-Madison
Abstract: Big Data appear in a growing number of areas like marketing, physics, biology, engineering, and the Internet. For example, for every hour, more than one million transaction data are stored in WalMart database and a HPC based computer model can produce results of millions of runs. While large volume of data offers more statistical power, it also brings computational challenges.
We first introduce an experimental design algorithm, called orthogonalizing EM (OEM), intended for various least squares problems for large observational data. The main idea of the procedure is to orthogonalize an arbitrary model matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We demonstrate that OEM is highly efficient for large-scale least squares problems.
We then present a reformulation and generalization of OEM that leads to a reduction in computational complexity for least squares and penalized least squares problems. The reformulation, named the GOEM (Generalized Orthogonalizing EM) algorithm, is further extended to a wider class of models including generalized linear models and Cox's proportional hazards model. Synthetic and real data examples are included to illustrate its efficiency compared with standard techniques.
Finally, we will discuss several new classes of space-filling designs inspired by Samurai Sudoku for conducting distributed computing or large-scale simulations. A growing trend in science and engineering is to distribute runs of a large computer simulation across different groups, machines or locations. Due to the complexity of the hardware and the simulation code, some batches in such an experiment may malfunction or fail to converge. By ensuring that the analysis can be done at the batch level and the experiment level, these new designs provide a robust solution to this problem. We will also talk about applications of these designs in solving optimization under uncertainty problems.