COE-F-43
Empirical Bayes Regression Analysis with Many Regressors but
Fewer Observations
Muni S. Srivastava and Tatsuya Kubokawa
University of Toronto and University of Tokyo
September 8, 2004
In this paper, we consider the prediction problem in multiple linear regression model in which the number of predictor variables, p, is extremely large compared to the number of available observations, n. The least squares predictor based on a generalized inverse is not efficient. It is shown that no more than n predictor variables, or n linear combinations of the p predictor variables may be needed for any efficient prediction. We propose six empirical Bayes estimators of the regression parameters used for prediction. Three of them are shown to have uniformly lower prediction error than the least squares predictors when the vector of regressor variables are assumed to be random with mean vector zero and the covariance matrix (1/n)XtX where Xt = (x1, . . . , xn) is the p~n matrix of observations on the regressor vector centered from their sample means. For other estimators, we use simulation to show its superiority over the least squares predictor.