Signal Classification for the Integrative Analysis of Multiple Sequences of Multiple Tests
Dongdong Xiang, Dave Zhao, and Tony Cai
Abstract:
The integrative analysis of multiple datasets is becoming increasingly important in many fields of research. When the same features are studied in several independent experiments, a common integrative approach is to jointly analyze the multiple sequences of multiple tests that result. It is frequently necessary to classify each feature into one of several categories, depending on the null and non-null configuration of its corresponding test statistics. This paper studies this signal classification problem, motivated by a range of applications in large-scale genomics. Two new types of misclassification rates are introduced, and both oracle and data-driven procedures are developed to control each of these types while also achieving the largest expected number of correct classifications. The proposed data-driven procedures are proved to be asymptotically valid and optimal under mild conditions, and are shown in numerical experiments to be nearly as powerful as oracle procedures, with substantial gains in power over their competitors in many settings. In an application to psychiatric genetics, the proposed procedures are used to discover genetic variants that may affect both bipolar disorder and schizophrenia, as well as variants that may help distinguish between these conditions.