Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization
Hyunseung Kang, Anru Zhang, Tony Cai, and Dylan Small
Abstract:
Instrumental variables have been widely used for estimating the causal effect be- tween exposure and outcome. Conventional estimation methods require complete knowledge about all the instruments' validity; a valid instrument must not have a direct effect on the outcome and not be related to unmeasured confounders. Often, this is impractical as highlighted by Mendelian randomization studies where genetic markers are used as instruments and complete knowledge about instruments' validity is equivalent to complete knowledge about the involved genes' functions. In this paper, we propose a method for estimation of causal effects when this complete knowledge is absent. It is shown that identification and estimation is possible under a weaker requirement that more than 50% of instruments are valid, without precisely knowing which of the 50%+ instruments are valid. Sharp identification limits with invalid instruments are given. A fast penalized l1 estimation method, called sisVIVE, is introduced for estimating the causal effect without knowing which instruments are valid, with theoretical guarantees on its performance. The proposed method is demonstrated on simulated data and a real Mendelian randomization study concerning the effect of body mass index on health-related quality of life index. An R package sisVive is available on CRAN. Supplementary materials for this article are available online.
Software: The R package sisVIVE selects invalid instruments among a candidate set of potentially bad instruments. The algorithm selects potentially invalid instruments and provides an estimate of the causal effect between treatment and outcome. Please see our paper Kang, Zhang, Cai, and Small (2015) for details.