Published on March 20, 2009
Probabilistic refinement of cellular pathway models Cambridge Statistical Laboratory Networks seminar series 2009 Jan 21 Florian Markowetz firstname.lastname@example.org
What is a signaling pathway? Environmental stimuli Protein Receptor in cell membrane Pat hw mRNA Protein cascade ay Transcription factors regulating target genes DNA
Pathway reconstruction Signaling pathways are important - Deregulation causes many diseases incl. cancer Signaling pathways are poorly understood - Only parts-lists - missing are interactions within and between pathways Biological research - So far mostly focused on individual genes New genome-scale datasets - Opportunity for data integration and novel methods
What data do we have? Proteins: - interactions between proteins Bulk of data: - binding to DNA Microarray mRNA: Protein - Expression under different stimuli - binding to DNA mRNA Sequence: - binding motifs - epigenetic marks DNA Morphology
Pathways as graphs • Nodes are (mostly) known • Goal: infer edges from data • Data are heterogeneous • co-expression between Edges genes • interactions between proteins • binding motifs at genes • binding of proteins to Nodes • Protein domains DNA • Functional annotation • Cause-effect data: Paths • changing environments • experimental perturbations
Pathway reconstruction “Classical” statistical approaches: Treat the genes/proteins as random variables and explore correlation structure in the data: – Correlation graphs – Gaussian graphical models (partial correlation) – Bayesian networks Challenges/Problems/Opportunities 1. Correlation may be un-informative 2. Integrate heterogeneous and noisy and complementary data sources Review: Markowetz and Spang (2007)
– Part 1 – Nested Effects Models
Experimental perturbations Drugs Small molecules RNAi Protein Stress Knockout mRNA DNA Readout: Global gene expression measurements
Drosophila immune response Columns: perturbed genes Rows: effects on other genes 1. Silencing tak1 reduces expression of all LPS- inducible transcripts 2. Silencing rel (key) or mkk4/hep reduces expression of subsets of induced transcripts (Boutros et al, Dev Cell 2002)
(!) Two types of entities Components of signaling pathway which are experimentally perturbed Downstream effect reporters
(!!) Only indirect information No direct observation of perturbation effects on other pathway components! Inference from observed perturbation effects on downstream reporters.
The information gap Direct information: Indirect information: effects are visible at other effects are only visible at pathway components down-stream reporters Pathway Pathway B B D D A C A C - Cell survival or death - Growth rate - downstream genes
Correlation won’t do “Classical” approach Pathway Correlation B D Graphical models: - Bayes Nets A C - GGMs Mutual Information Nested Downstream Effects regulated genes Models
Nested Effects Models 1. Set of candidate pathway genes INPUT 2. High-dimensional phenotypic profile, e.g. microarray Graph representation of information flow explaining OUTPUT the phenotypes Phenotypic profiles Inferred pathway Gene perturbations A AB B C D EF CD E F G GH H Effects
NEM: model formulation M’xyz: Expected Observed Z X Y X X FN FN Y Y FP Z Z FN E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6 Pathway genes: X, Y, Z Effect reporters: E1, …, E6 • core topology • states are observed • to be reconstructed = Data D = Model M • positions in pathway unknown = Parameters θ Marginal likelihood Posterior: P ( M | D ) = 1/Z . P( D | M ) . P( M )
Likelihood P( D | M, θ ) Compare predictions with observations: Y Prediction E1=0 E2=1 X Z Observation 1. E1=1 E2=1 2. E1=0 E2=1 E1 E2 Error probabilities e.g. false NEG rate 20%, false POS rate 5% Lik = Pr( E1 = 1) ⋅ Pr( E2 = 1) ⋅ Pr( E1 = 0) ⋅ Pr( E2 = 1) = 0.05 ⋅ 0.95 ⋅ 0.80 ⋅ 0.95
Marginal likelihood P ( D | M ) = ∫ P ( D | M , Θ ) P (Θ | M ) dΘ m l n 1 ∏∑∏ P(e | M ,θ i = j ) =m ik n i =1 j =1 k =1 Uniform prior over positions Distribution of single effect Product over Product over reporter with all effect Average over possible positions replicate known position reporters observation in the pathway
NEM: inference Model space: all transitively closed directed graphs Exhaustive enumeration: score all models to find the one fitting the data best Markowetz et al. Bioinformatics, 2005 MCMC, Simulated Annealing: take small probabilistic steps to explore model space . . . with A Tresch; in preparation Divide and conquer: break a big model into smaller, manageable pieces and then re-assemble Markowetz et al. ISMB 2007
NEM: extensions Likelihood based on Drop transitivity requirement log-ratios of effects Feature selection to concentrate on informative effect reporters Tresch and Markowetz (2008)
NEMs on Drosophila data
Summary of part 1 1. Gene perturbation screens with gene- expression readouts 2. Perturbation screens suffer from the information gap between pathways and reporters 3. Nested Effects Models reconstruct pathway features from subset relations between observed effects
– Part 2 – Data integration and probabilistic refinement of a signaling pathway hypothesis
Pathway refinement 1. Start from given pathway hypothesis Even if our understanding of pathways is poor, that does not mean we have none at all! 2. Evaluate evidence for hypothesis in data 3. Identify weakly supported areas and likely extensions Not reconstruction from scratch. Step 1: assemble pathway hypothesis (KEGG, literature, …) for pheromone response pathway in Yeast
Edge data I Support for hypothesis in protein-protein interaction data
Edge data II Support for hypothesis in co-expression data
Edge data III Why is it so hard to reconstruct nuclear regulatory network from correlations?
Edge data IV Support for hypothesis in TF-DNA binding data
Paths: cause-effect data Expression profiling of knock-out mutants (Hughes et al., 2000) Result: transcriptional response to perturbation only visible on down-stream genes (information gap!)
Conclusion from data analysis • Every data source is informative for a specific compartment of the pathway • No data source is informative in all compartments • We expect these observations also to hold for other MAPK and signaling pathways. Need compartment-specific integrative model encompassing edge, node, and path data.
Integrative model Conditional distributions for each data type Pathway graph as hidden/latent variables Prior Parameters Graphical model defines Different data types contribute posterior P(G|data) to each compartment -> inference by Gibbs sampler
Evaluation 1. Fit model parameters on pheromone response pathway (training) 2. Use fitted model on other MAPK pathways (generalization to closely related examples) 3. Use fitted model on all other Yeast signaling pathways (generalization to everything else) … work in progress …
Acknowledgements Nested Effects Models Rainer Spang (Univ. Regensburg) .:. Dennis Kostka (UC SF) .:. Achim Tresch (Gene Center Munich) .:. Holger Fröhlich (DKFZ Heidelberg) .:. Tim Beißbarth (Univ. Göttingen) .:. Josh Stuart, Charlie Vaske (UC SC) .:. Data integration Olga G. Troyanskaya (Princeton) .:. Edoardo Airoldi (Harvard) .:. David Blei (Princeton) .:.
Probabilistic refinement of cellular pathway models Thank you ! Florian Markowetz email@example.com
Probabilistic refinement of cellular pathway models. Add to your list(s) Download to your calendar using vCal. Florian Markowetz, Cancer Research UK ...
1.Probabilistic refinement ofcellular pathway modelsCambridge Statistical Laboratory Networks seminar series2009 Jan 21 Florian Markowetz florian.markowetz
... Modeling Cellular Signaling Systems: An Abstraction ... on a temporal and probabilistic ... model, we build refinements adding ...
... ERK signalling pathway using the stochastic process ... Probabilistic model checking is a formal ... of cellular signalling pathways, ...
Refinement and expansion of signaling pathways: ... probabilistic model that depicts known ... logical system in a mathematical model. Cellular ...
Probabilistic Graphical Models for Cellular Pathways ... Florian Markowetz, Probabilistic Graphical Models for Cellular Pathways, 2005 April 50. Author:
Refinement and expansion of signaling pathways: the osmotic response ... as a probabilistic model that depicts ... model refinement ...
In this paper we describe a modeling strategy for cellular signaling systems based on a temporal and probabilistic ... the model can be ...
Inferring Cellular Networks Using Probabilistic Graphical Models. ... in the same pathways using a Hidden Markov Model ... Refinement and expansion of ...