Network Analysis of Gene Co-expression Data & Concordance Index for Multivariate Survival Outcomes

Friday, October 11, 2019 11:00 am - 7:59 pm

Speaker: Qing Pan, George Washington University

Abstract: When searching for gene pathways leading to specific disease outcomes, additional information on gene characteristics is often available that may facilitate to differentiate genes related to the disease from irrelevant background when connections involving both types of genes are observed and their relationships to the disease are unknown. In the first part of the talk, we propose method to single out irrelevant background genes with the help of auxiliary information through a logistic regression, and cluster relevant genes into cohesive groups using the adjacency matrix. Expectation-maximization algorithm is modified to maximize a joint pseudo-likelihood assuming latent indicators for relevance to the disease and latent group memberships as well as Poisson or multinomial distributed link numbers within and between groups. A robust version allowing arbitrary linkage patterns within the background is further derived. Asymptotic consistency of label assignments under the stochastic block model is proven. Superior performance and robustness in finite samples are observed in simulation studies. The proposed robust method identifies previously missed gene sets underlying autism related neurological diseases using diverse data sources including de novo mutations, gene expressions and protein-protein interactions.

In the second half of the talk, we propose an extension of Harrell’s concordance (C) index to evaluate the prognostic utility of biomarkers for diseases with multiple measurable outcomes that can be prioritized. Our prioritized concordance index measures the probability that, given a random subject pair, the subject with the worst disease status as of a time τ has the higher predicted risk. Our prioritized concordance index uses the same approach as the win-ratio, by basing generalized pairwise comparisons on the most severe or clinically important comparable outcome. We use an inverse probability weighting technique to correct for study-specific censoring. Asymptotic properties are derived using U-statistic properties. We apply the prioritized concordance index to two types of disease processes with a rare primary outcome and a more common secondary outcome. Our simulation studies show that when a predictor is predictive of both outcomes, the new concordance index can gain efficiency and power in identifying true prognostic variables compared to using the primary outcome alone. Using the prioritized concordance index, we examine whether novel clinical measures can be useful in predicting risk of type II diabetes in patients with impaired glucose resistance whose disease status can also regress to normal glucose resistance.