High-dimensional statistical approaches for revealing signaling heterogeneity across human cancers



Differential Network Analysis


Network inference is subject to statistical uncertainty and observed differences between two networks inferred from two datasets may be due to noise and variability in estimation rather than any true difference in underlying network topology. Significance testing for network differences is a challenging statistical problem, involving high-dimensional estimation and comparison of non-nested hypotheses. Our recently developed method "differential network" performs formal two-sample testing between high-dimensional Gaussian graphical models (GGMs) and is implemented in the R-package DiffNet. We exploit this technique to test signaling network heterogeneity accross human cancers. For technical details of the approach we refer the reader to:

Städler, N. and Mukherjee, S. (2013). "Two-Sample Testing in High-Dimensional Models". Preprint arXiv:1210.4584

Städler, N. and Mukherjee, S. (2013). "Network-based multivariate gene-set testing". Preprint arXiv:1308.2771

Städler N., Dondelinger F., Hill S., Kwok Shing Ng P., Akbani R., Werner H., Shahmoradgoli M., Lu Y., Mills G., Mukherjee S. "High-dimensional statistical approaches reveal heterogeneity in signaling networks across human cancers". Submitted


  1. R Package DiffNet and Package Manual

    R-Code (and Example) The R-script diffnet_groups.R contains code for executing a differential network analysis between different pairs of diseases. This is illustrated with a small example in diffnet_groups_test.R.



Network-Based Clustering


The MixGLasso-package is a powerful tool for multivariate, network-based clustering of high-dimensional data. This approach simultaneously identifies clusters and learns networks topologies. MixGLasso is based on a Gaussian mixture models. Estimation is performed using a EM-type algorithm which uses L1-penalization to regularize estimation and takes care of scaling and related issues that arise due to the unknown nature of the cluster assignments. We used this package for clustering proteomic data from 3467 patient samples spanning 11 Cancer Genome Atlas diseases. For technical details we refer the reader to:

Städler, N. and Mukherjee, S. (2012). "Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models". To appear in the Annals of Applied Statistics

Städler N., Dondelinger F., Hill S., Kwok Shing Ng P., Akbani R., Werner H., Shahmoradgoli M., Lu Y., Mills G., Mukherjee S. "High-dimensional statistical approaches reveal heterogeneity in signaling networks across human cancers". Submitted


  1. R Package MixGLasso and Package Manual




Home