Estimation and Inference for High-dimensional Generalized Linear Models with Knowledge Transfer
Sai Li, Linjun Zhang, Tony Cai, and Hongzhe Li
Abstract:
Transfer learning provides a powerful tool for incorporating related data into a target study of interest. In epidemiology and medical studies, the classification of a target disease could borrow information across diseases and populations. In this work, we consider transfer learning for high-dimensional generalized linear models (GLMs). A novel algorithm, TransGLM, that incorporates data from the target study as well as the auxiliary studies is proposed. Minimax rate of convergence for estimation is established and the proposed estimator is shown to be rate-optimal.
Statistical inference for the target regression coefficients is also studied. Asymptotic normality for a debiased estimator is established and confidence intervals are constructed. Numerical studies show significant improvements in estimation and inference accuracy. Proposed methods are applied to a real data study concerning the classification of colorectal cancer using microbiomes, and are shown to enhance the classification accuracy in comparison to the single-task methods.