Highdimensional statistical approaches for revealing signaling heterogeneity across human cancers
Differential Network Analysis
Network inference is subject to statistical uncertainty and observed differences between two networks inferred from two datasets may be due to noise and variability in estimation rather than any true difference in underlying network topology. Significance testing for network differences is a challenging statistical problem, involving highdimensional estimation and comparison of nonnested hypotheses. Our recently developed method "differential network" performs formal twosample testing between highdimensional Gaussian graphical models (GGMs) and is implemented in the Rpackage DiffNet. We exploit this technique to test signaling network heterogeneity accross human cancers. For technical details of the approach we refer the reader to:
Städler, N. and Mukherjee, S. (2013). "TwoSample Testing in HighDimensional Models". Preprint arXiv:1210.4584
Städler, N. and Mukherjee, S. (2013). "Networkbased multivariate geneset testing". Preprint arXiv:1308.2771
Städler N., Dondelinger F., Hill S., Kwok Shing Ng P., Akbani R., Werner H., Shahmoradgoli M., Lu Y., Mills G., Mukherjee S. "Highdimensional statistical approaches reveal heterogeneity in signaling networks across human cancers". Submitted

• R Package DiffNet and Package Manual
• RCode (and Example) The Rscript diffnet_groups.R contains code for executing a differential network analysis between different pairs of diseases. This is illustrated with a small example in diffnet_groups_test.R.
NetworkBased Clustering
The MixGLassopackage is a powerful tool for multivariate, networkbased clustering of highdimensional data. This approach simultaneously identifies clusters and learns networks topologies. MixGLasso is based on a Gaussian mixture models. Estimation is performed using a EMtype algorithm which uses L1penalization to regularize estimation and takes care of scaling and related issues that arise due to the unknown nature of the cluster assignments. We used this package for clustering proteomic data from 3467 patient samples spanning 11 Cancer Genome Atlas diseases. For technical details we refer the reader to:
Städler, N. and Mukherjee, S. (2012). "Penalized estimation in highdimensional hidden Markov models with statespecific graphical models". To appear in the Annals of Applied Statistics
Städler N., Dondelinger F., Hill S., Kwok Shing Ng P., Akbani R., Werner H., Shahmoradgoli M., Lu Y., Mills G., Mukherjee S. "Highdimensional statistical approaches reveal heterogeneity in signaling networks across human cancers". Submitted