Abstract:
Model search in very high-dimensional spaces raises computational
challenges, and standard approaches such as serial Markov chain Monte Carlo
(MCMC) methods are often ineffective. I introduce a novel shotgun
stochastic search (SSS) approach for model space exploration that is
inspired by existing MCMC approaches but offers the ability to much more
rapidly identify good models as dimension escalates. Parallel computing is
at the core of SSS methodology. Rather than simply parallelizing existing
MCMC stochastic search methods by simultaneously running multiple chains, I
describe a new stochastic search that differs in two key respects: (i) SSS
evaluates and records many candidate models in parallel at each iteration,
efficiently exploring neighborhoods of models; (ii) SSS is designed to move
towards and aggressively explore regions of model space that contain
multiple high probability models. While serial approaches typically
traverse model space via pair-wise model comparisons, the use of parallel
computing allows for potentially tens of thousands of models in a
neighborhood of a given model to be simultaneously considered, yielding a
stochastic search with different properties than the usual serial
implementation. I highlight the relationship between standard MCMC
approaches and SSS and provide examples where the ability of SSS to rapidly
catalogue high probability models is superior to competing MCMC methods. I
present examples from cancer genomics that demonstrate the effectiveness of
SSS, where modeling goals include the identification of complex
multivariate patterns of association within sets of key genes and also
between sets of genes and observed patient outcomes.