Sparse Segment Identifications with Applications to DNA Copy Number Variation Analysis
X. Jessie Jeng, T. Tony Cai and Hongzhe Li
Abstract:
Copy number variations (CNVs) are alterations of the DNA of a genome that results in the cell having an abnormal number of copies of one or more sections of the DNA. Germline CNVs have been shown to be associated with many complex diseases. Detecting and identifying all the CNVs in a given sample or in multiple population-based samples is an important first step in many CNV analyses. In this chapter, we review statistical methods for CNV identification, focusing on latest developed methods for sparse segment identifications in various settings. We review methods for optimal CNV identification for a single sample based on SNP allele intensity data, methods for robust CNV identification based on the next generation sequence (NGS) data. and methods for detection of recurrent CNVs in a population when a large set of samples are available. Our review focuses on problem formulations and optimal statistical properties of the procedures. We illustrate these methods using data from the 1000 Genomes Project and data from a large genome-wide association study of neuroblastoma. Areas that need further research are also presented.