Large-Scale Multiple Testing under Dependency

Wenguang Sun and Tony Cai


  • Abstract: This article considers the problem of multiple testing under dependency in a compound decision-theoretic framework. The observed data are assumed to be generated from an underlying two-state hidden Markov model. We propose oracle and asymptotically optimal data-driven procedures that aim to minimize the false non-discovery rate (FNR) subject to a constraint on the false discovery rate (FDR). It is shown that the performance of a multiple testing procedure can be substantially improved by adaptively exploiting the dependency structure among hypotheses, and hence conventional FDR procedures that ignore this structural information are inefficient. Both theoretical properties and numerical performances of the proposed procedures are investigated. It is shown that the proposed procedures control the FDR at the desired level, enjoy certain optimality properties and are especially powerful in identifying clustered non-null cases. The new procedure is applied to an influenza-like illness surveillance study for detecting the timing of epidemic periods.

  • Paper: pdf file.

  • R codes for the data-driven testing procedure. Here is the readme file.

  • Other related paper:

    Sun, W. & Cai, T. (2007).
    Oracle and adaptive compound decision rules for false discovery rate control.
    J. American Statistical Association 102 , 901-912.

    Jin, J. & Cai, T. (2007).
    Estimating the null and the proportion of non-null effects in large-scale multiple comparisons.
    J. American Statistical Association 102, 495-506.

    Cai, T., Jin, J. & Low, M. (2007).
    Estimation and confidence sets for sparse normal mixtures.
    The Annals of Statistics 35, 2421-2449.


Last updated on April 9, 2008.