Regressing Multivariate Gaussian Distribution on Vector Covariates for Co-expression Network Analysis
Arnab Auddy, Tony Cai, and Hongzhe Li
Abstract:
Population-level single-cell gene expression data captures the gene expressions of thousands of cells for each individual within a sizable cohort. This data enables the construction of cell-type- and individual-specific gene co-expression network by estimating the covariance matrices. It is important to understand how such co-expression networks are associated with individual-level covariates. This paper considers Fréchet regression with multivariate Gaussian distribution as an outcome and vector covariates, where the Wasserstein distance between distributions is used as a replacement for the Euclidean distance. A test statistic is defined based on Fréchet mean and covariate weighted Fréchet mean. The asymptotic distribution of the test statistic is derived under the assumption of simultaneously diagonalizable covariance matrices. Although the proposed test statistic is motivated by considering the multivariate normal distribution as the outcome, it can be applied for testing the association between covariance matrices and covariates, where permutation can be used for assessing its statistical significance. Simulations show that the proposed test has correct type 1 error and adequate power. Results from an analysis of large-scale single-cell data reveal an association between the gene co-expression network of genes in the nutrient sensing pathway and age, indicating the perturbed gene co-expression network as people age.