biosummer04 yang keynote

53 %
47 %
Information about biosummer04 yang keynote
Education

Published on January 17, 2008

Author: Silvestre

Source: authorstream.com

Detecting adaptive protein evolution:  Detecting adaptive protein evolution Ziheng Yang Department of Biology University College London There are two main explanations for genetic variation observed within a population or between species: Natural selection (survival of the fittest) mutation and drift (survival of the luckiest):  There are two main explanations for genetic variation observed within a population or between species: Natural selection (survival of the fittest) mutation and drift (survival of the luckiest) Gillespie, J.H. 1998. Population genetics: a concise guide. John Hopkins University Press, Baltimore. Hartl, D.L., and A.G. Clark. 1997. Principles of population genetics. Sinauer Associates, Sunderland, Massachusetts. Positive & negative selection:  Positive & negative selection Genotype AA Aa aa Frequency p2 2p(1-p) (1-p)2 Fitness 1 1+s 1+2s (A: “wildtype-allele”; a: new mutant) s is selection coefficient: s  0: neutral evolution s < 0: negative (purifying) selection s > 0: positive selection (adaptive evolution) Positive & negative selection:  Positive & negative selection Whether mutation or selection dominates the fate of the new allele depends on whether |Ns|  1, where N is the effective population size. Ns < -3: fatal mutations -3 < Ns < -1: unlucky losers -1 < Ns < 1: nearly neutral 1 < Ns < 3: occasional hopefuls Ns > 3: rare monsters Theories of molecular evolution:  Theories of molecular evolution Akashi, H. (1999) Gene 238: 39-51 Detecting the effect of natural selection is useful for (a) advancing evolutionary theory (b) inferring functional significance from genomic data.:  Detecting the effect of natural selection is useful for (a) advancing evolutionary theory (b) inferring functional significance from genomic data. Evolutionary conservation means functional significance.:  Evolutionary conservation means functional significance. Thomas, et al. 2003. Nature 424:788-793 Fast-evolving genes or gene regions are also functionally important if the variability is driven by natural selection.:  Fast-evolving genes or gene regions are also functionally important if the variability is driven by natural selection. In protein-coding genes, we can distinguish between synonymous (silent) and nonsynonymous (replacement) mutations, and contrast their substitution rates to infer selection on the protein.:  In protein-coding genes, we can distinguish between synonymous (silent) and nonsynonymous (replacement) mutations, and contrast their substitution rates to infer selection on the protein. Synonymous & nonsynonymous substitutions:  Synonymous & nonsynonymous substitutions Definitions:  Definitions dS (KS) : number of synonymous substitutions per synonymous site dN (KA): number of nonsynonymous substitutions per nonsynonymous site  = dN/dS: nonsynonymous/synonymous rate ratio The  ratio measures selection at the protein level:  The  ratio measures selection at the protein level  = 1: neutral evolution  < 1: negative (purifying) selection  > 1: positive (diversifying) selection Data & information:  Data & information a2 GGC TCT CAC TCC ATG AGG TAT TTC TTC ACA TCC a24 ... ..C ... ... ... ..T ... ... .A. ..C ... a11 ... ..C ..A ... ... ... ... ... .A. ..C ... aw24 ... ..C ... ... ... ... ... ... CA. ..C ... aw68 ... ..C ... ... ... ..A ... ... .A. ..C ... a3 ... ..T ..T ... ... ... ... ... C.. ..T ... Early studies average synonymous and nonsynonymous rates over sites and have little power in detecting adaptive evolution.:  Early studies average synonymous and nonsynonymous rates over sites and have little power in detecting adaptive evolution. Possible approaches:  Possible approaches Decide on which sites might be under selection and focus on them (Hughes & Nei 1988 Nature 335:167-170) (fixed-sites model) Test each site for positive selection (Suzuki & Gojobori 1999 Mol. Biol. Evol. 16: 1315–1328) Use a statistical distribution to model the  variation (random-sites model, fishing expedition) A simple approach (Fitch et al. 1997; Suzuki & Gojobori 1999):  A simple approach (Fitch et al. 1997; Suzuki & Gojobori 1999) TTC TA TTC ATC TTA TAT TTT TTC TTC TTC TTT CT CA TA 3 nonsynonymous changes 1 synonymous change Use of codon models to detect amino acid sites under diversifying selection:  Use of codon models to detect amino acid sites under diversifying selection Likelihood Ratio Test (LRT) for sites under positive selection Bayes calculation of posterior probabilities of sites under positive selection Rates to CTG:  Rates to CTG Synonymous CTC (Leu)  CTG (Leu) CTG TTG (Leu)  CTG (Leu) CTG Nonsynonymous GTG (Val)  CTG (Leu) CTG CCG (Pro)  CTG (Leu) CTG Rate matrix Q = {qij}:  Rate matrix Q = {qij} (Goldman & Yang 1994 Mol Biol Evol 11:725-736 Muse & Gaut 1994 Mol Biol Evol 11:715-724) LRT of sites under positive selection:  LRT of sites under positive selection H0: there are no sites at which  > 1 H1: there are such sites Compare 2 = 2(1 - 0) with a 2 distribution (Nielsen & Yang 1998 Genetics 148:929-936; Yang, Nielsen, Goldman & Pedersen 2000. Genetics 155:431-449) Two pairs of useful models:  Two pairs of useful models M1a (Nearly Neutral) Site class k: 0 1 pk: p0 p1 k: 0<1 1=1 M2a (Positive Selection) Site class k: 0 1 2 pk: p0 p1 p2 k: 0<1 1=1 2>1 Modified from Nielsen & Yang (1998), where 0=0 is fixed Slide22:  M7 (beta, using 10 site classes)  ~ beta(p, q) M8 (beta&) p0 of sites from beta(p, q) p1 = 1 - p0 of sites with s > 1 From Yang et al. (2000) Discretisation of a continuous distribution:  Discretisation of a continuous distribution 0 0.2 0.4 0.6 0.8 1  ratio Sites M7(beta) Mixture distribution M8(beta&):  Mixture distribution M8(beta&) Sites 0 0.2 0.4 0.6 0.8 1  ratio =1.7 p1 p0 from beta(p, q) Likelihood function and Empirical Bayesian inference of sites under selection (M2a):  Likelihood function and Empirical Bayesian inference of sites under selection (M2a) Site class k: 0 1 2 Proportion pk: p0 p1 p2  ratio k: 0 < 1 1 = 1 2 > 1 Bayes Empirical Bayes (BEB): M2a:  Bayes Empirical Bayes (BEB): M2a Human MHC Class I data: 192 alleles, 270 codons :  Human MHC Class I data: 192 alleles, 270 codons Model  Parameter estimates M7 (beta) 7,498.97 beta(0.10, 0.35) M8 (beta&) 7,232.68 p0 = 0.90, beta(0.17, 0.71) (p1 = 0.10), s = 5.12 Likelihood ratio test of positive selection: 2 = 2  266.29 = 532.58, P < 0.000, d.f. = 2 Posterior probabilities for MHC:  Posterior probabilities for MHC 25 sites identified by M8 (beta&) using both NEB & BEB:  25 sites identified by M8 (beta&) using both NEB & BEB Comparison between NEB and BEB from real data analysis and computer simulation suggests that :  Comparison between NEB and BEB from real data analysis and computer simulation suggests that BEB is effective in correcting high false positive rates of NEB in small (non-informative) data sets. BEB does not seem to cause a loss of power in large (informative) data sets. Some wrong models are more useful than the true model. A small data set (HTLV tax gene) (Suzuki & Nei 2004 MBE 21:914-921):  A small data set (HTLV tax gene) (Suzuki & Nei 2004 MBE 21:914-921) 20 sequences, 181 codons. 23 singleton differences on star tree: 2 synonymous, 21 nonsynonymous NEB M0 (one-ratio), M2 (selection), M2a (PositiveSelection), M8 (beta&) all give  = 4.87. Every site is under positive selection with P = 1 BEB 21 sites have 0.91 < P < 0.93 under M2a and 0.96 < P < 0.97 under M8. Other sites have P ~ 57% or 70%. Performance measures in simulation:  Performance measures in simulation True positive = 50/80 False positive = 10/120 Accuracy = 50/60 Performance of BEB (NEB) in simulations:  Performance of BEB (NEB) in simulations (cutoff P = 95%) Advantages of ML:  Advantages of ML Accounts for the genetic code Accounts for ts/tv rate bias and codon usage bias Avoids bias in ancestral reconstruction Uses probability theory to correct for multiple hits Assumptions & Limitations:  Assumptions & Limitations Same selective pressure over all lineages No recombination within the sequence No variation in synonymous rate among sites Same rate for all amino acid changes No sequencing or alignment errors The level of sequence divergence and the number of sequences are two major factors affecting accuracy and power. Data of only a few closely related sequences do not contain much information. Adaptive molecular evolution:  Adaptive molecular evolution proteins involved in immunity or defence (MHC, immunoglobulin VH, class 1 chitinas) proteins involved in evading defence systems (HIV env, nef, gap, pol, etc., capsid in FMD virus, flu virus hemagglutinin gene) proteins involved in male & female reproduction (abalone sperm lysin, sea urchin bindin, proteins in mammals) Miscellaneous Acknowledgments:  Acknowledgments BBSRC http://abacus.gene.ucl.ac.uk/ References:  References Yang, Z., and J.P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution 15: 496-503. Yang, Z. 2001. Adaptive molecular evolution, Chapter 12 (pp. 327-350) in Handbook of statistical genetics, eds. D. Balding, M. Bishop, and C. Cannings. Wiley, New York. Yang, Z. 2002. Inference of selection from multiple species alignments. Current Opinion in Genetics and Development 12:688-694. Wong, W.S.W., et al. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168: 1041-1051. Yang, Z., et al. submitted. Bayes empirical Bayes inference of amino acid sites under positive selection. Molecular Biology & Evolution

Add a comment

Related presentations

Related pages

biosummer04 yang keynote, SlideSearchEngine.com

biosummer04 yang keynote Education presentation by Silvestre ... Published on January 17, 2008. Author: Silvestre. Source: authorstream.com
Read more