Bayesian Divergence Time Estimation – Lecture at Bodega 2014 Workshop

50 %
50 %
Information about Bayesian Divergence Time Estimation – Lecture at Bodega 2014 Workshop
Education

Published on March 14, 2014

Author: trayc7

Source: slideshare.net

Description

A lecture on Bayesian divergence-time estimation by Tracy A. Heath (http://phylo.bio.ku.edu/content/tracy-heath). This lecture was given at the 2014 Bodega Bay Applied Phylogenetics Workshop, hosted by the University of California, Davis (http://treethinkers.org/2014-workshop/). This lecture precedes a tutorial on the software BEAST v2 (http://treethinkers.org/divergence-time-estimation-using-beast/)

B D T E Tracy Heath Integrative Biology, University of California, Berkeley Ecology & Evolutionary Biology, University of Kansas 2014 Applied Phylogenetics Workshop Bodega Bay, CA USA

O Overview of divergence time estimation • Relaxed clock models – accounting for variation in substitution rates among lineages • Tree priors and fossil calibration break BEAST v2.1.1 Tutorial http://treethinkers.org/divergence-time-estimation-using-beast/ • Walk through: set up BEAST input file in BEAUti and execute BEAST MCMC analysis • On your own: complete analysis & summarize output lunch

A T-S  E Phylogenetic trees can provide both topological information and temporal information 0.2 expected substitutions/site Primates Carnivora Cetacea Simiiformes Artiodactyla Microcebus Homininae Hippo Fossil Calibrations PhylogeneticRelationshipsSequence Data Did Simiiformes experience accelerated rates of molecular evolution? What is the age of the MRCA of mouse lemurs (Microcebus)? 100 0.020.040.060.080.0 Equus Rhinoceros Bos Hippopotamus Balaenoptera Physeter Ursus Canis Felis Homo Pan Gorilla Pongo Macaca Callithrix Loris Galago Daubentonia Varecia Eulemur Lemur Hapalemur Propithecus Lepilemur Mirza M. murinus M. griseorufus M. myoxinus M. berthae M. rufus1 M. tavaratra M. rufus2 M. sambiranensis M. ravelobensis Cheirogaleus Simiiformes Microcebus Cretaceous Paleogene Neogene Q Time (Millions of years) Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003)

A T-S  E Phylogenetic divergence-time estimation • What was the spacial and climatic environment of ancient angiosperms? • Did the uplift of the Patagonian Andes drive the diversity of Peruvian lilies? • How has mammalian body-size changed over time? • Is diversification in Caribbean anoles correlated with ecological opportunity? • How has the rate of molecular evolution changed across the Tree of Life? (Antonelli & Sanmartin. Syst. Biol. 2011) (Lartillot & Delsuc. Evolution 2012) (Mahler, Revell, Glor, & Losos. Evolution 2010) (Nabholz, Glemin, Galtier. MBE 2008) Historical biogeography Molecular evolution Trait evolution Diversification Anolis fowleri (image by L. Mahler) Understanding Evolutionary Processes

D T E Goal: Estimate the ages of interior nodes to understand the timing and rates of evolutionary processes Model how rates are distributed across the tree Describe the distribution of speciation events over time External calibration information for estimates of absolute node times calibrated node 100 0.020.040.060.080.0 Equus Rhinoceros Bos Hippopotamus Balaenoptera Physeter Ursus Canis Felis Homo Pan Gorilla Pongo Macaca Callithrix Loris Galago Daubentonia Varecia Eulemur Lemur Hapalemur Propithecus Lepilemur Mirza M. murinus M. griseorufus M. myoxinus M. berthae M. rufus1 M. tavaratra M. rufus2 M. sambiranensis M. ravelobensis Cheirogaleus Simiiformes Microcebus Cretaceous Paleogene Neogene Q Time (Millions of years)

U H B “From East Gondwana to Central America: historical biogeography of the Alstroemeriaceae” (Chacón et al., J. Biolgeograpy 2012)

D T E Historical biogeography requires external calibration Model how rates are distributed across the tree Describe the distribution of speciation events over time External calibration information for estimates of absolute node times (Chacón et al., J. Biolgeograpy 2012)

D T E What about when the fossil record (or other types of calibration information) is poor or absent? Example: Despite the rich diversity of Anolis there are few fossils There are some amber fossils, but these fossils fall within a narrow time range Amber Anolis fossil (http://www.anoleannals.org/2012/03/06/the-hi-tech-world-of-anole-paleontology/)

D T E What about when the fossil record is poor or absent? Model how rates are distributed across the tree Describe the distribution of speciation events over time Estimation of relative divergence times Anolis hendersoni (Image courtesy of L. Mahler)

R T  D “Ecological opportunity and the rate of morphological evolution in the diversification of Greater Antillean Anoles” Anolis fowleri (image courtesy of L. Mahler) (Mahler, Revell, Glor, & Losos. Evolution 2010)

T D  I D Divergence time estimation of rapidly evolving pathogens provide information about spatial and temporal dynamics of infectious diseases Sequences sampled at different time horizons impose a temporal structure on the tree by providing ages for non- contemporaneous tips (Pybus & Rambaut. 2009. Nature Reviews Genetics.)

A T-S  E Phylogenetic trees can provide both topological information and temporal information 100 0.020.040.060.080.0 Equus Rhinoceros Bos Hippopotamus Balaenoptera Physeter Ursus Canis Felis Homo Pan Gorilla Pongo Macaca Callithrix Loris Galago Daubentonia Varecia Eulemur Lemur Hapalemur Propithecus Lepilemur Mirza M. murinus M. griseorufus M. myoxinus M. berthae M. rufus1 M. tavaratra M. rufus2 M. sambiranensis M. ravelobensis Cheirogaleus Simiiformes Microcebus Cretaceous Paleogene Neogene Q Time (Millions of years) Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003; Heath et al. MBE 2012)

T G M C Assume that the rate of evolutionary change is constant over time (branch lengths equal percent sequence divergence) 10% 400 My 200 My A B C 20% 10% 10% (Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)

T G M C We can date the tree if we know the rate of change is 1% divergence per 10 My A B C 20% 10% 10% 10% 200 My 400 My 200 My (Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)

T G M C If we found a fossil of the MRCA of B and C, we can use it to calculate the rate of change & date the root of the tree A B C 20% 10% 10% 10% 200 My 400 My (Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)

R  G M C Rates of evolution vary across lineages and over time Mutation rate: Variation in • metabolic rate • generation time • DNA repair Fixation rate: Variation in • strength and targets of selection • population sizes 10% 400 My 200 My A B C 20% 10% 10%

U A Sequence data provide information about branch lengths In units of the expected # of substitutions per site branch length rate × time 0.2 expected substitutions/site PhylogeneticRelationshipsSequence Data

P L f (D | V,θs,Ψ) V Vector of branch lengths θs Sequence model parameters D Sequence data Ψ Tree topology

R  T The expected # of substitutions/site occurring along a branch is the product of the substitution rate and time length = rate × time length = rate length = time Methods for dating species divergences estimate the substitution rate and time separately

S R Substitution rate: the rate at which mutations are fixed in a population Depends on: mutation rate, selection, population size, drift length = subst. rate Mutation rate measures the rate at which mutations occur over time and is affected by metabolic rate, generation time, DNA repair efficiency

R  T The sequence data provide information about branch length for any possible rate, there’s a time that fits the branch length perfectly 0 1 2 3 4 5 0 1 2 3 4 5 BranchRate Branch Time time = 0.8 rate = 0.625 branch length = 0.5 (based on Thorne & Kishino, 2005)

B D T E length = rate length = time R (r,r,r,...,rN−) A (a,a,a,...,aN−) N number of tips

B D T E length = rate length = time R (r,r,r,...,rN−) A (a,a,a,...,aN−) N number of tips

B D T E Posterior probability f (R,A,θR,θA,θs | D,Ψ) R Vector of rates on branches A Vector of internal node ages θR,θA,θs Model parameters D Sequence data Ψ Tree topology

B D T E f (R,A,θR,θA,θs | D) = f(D | R,A,θR,θA,θs)f(R,A,θR,θA,θs) f(D) f(D | R,A,θR,θA,θs) Likelihood f(R,A,θR,θA,θs) Joint prior density f(D) Marginal probability of the data

B D T E The likelihood depends on the node times and the rates of evolution, but not on the processes generating the rates and node times f (D | R,A,θR,θA,θs) = f (D | R,A,θs)

B D T E Assume that the process governing the ages of nodes operates independently of processes governing mutation, and that the process governing the total rates of substitutions is independent from the mutational parameters that determine relative rates of different substitutions: f(R,A,θR,θA,θs) = f(R | θR) f(A | θA) f(θR) f(θA) f(θs)

B D T E After enforcing these assumptions, the posterior distribution of the parameters and hyperparameters can be expressed as: f(R,A,θR,θA,θs | D) = f (D | R,A,θs) f(R | θR) f(A | θA) f(θR) f(θA) f(θs) f(D)

B D T E Estimating divergence times relies on 2 main elements: • Branch-specific rates: f (R | θR) • Node ages: f (A | θA,C)

M R V Some models describing lineage-specific substitution rate variation: • Global molecular clock (Zuckerkandl & Pauling, 1962) • Local molecular clocks (Hasegawa, Kishino & Yano 1989; Kishino & Hasegawa 1990; Yoder & Yang 2000; Yang & Yoder 2003, Drummond and Suchard 2010) • Punctuated rate change model (Huelsenbeck, Larget and Swofford 2000) • Log-normally distributed autocorrelated rates (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001; Thorne & Kishino 2002) • Uncorrelated/independent rates models (Drummond et al. 2006; Rannala & Yang 2007; Lepage et al. 2007) • Mixture models on branch rates (Heath, Holder, Huelsenbeck 2012) Models of Lineage-specific Rate Variation

G M C The substitution rate is constant over time All lineages share the same rate branch length = substitution rate low high Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)

G M C Assume the clock rate is gamma-distributed R = (r,r,...,r) r ∼ Gamma(α,λ) f (R | θR) = f (r | α,λ) rate density r rate prior distribution Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)

G M C The sampled rate is applied to every branch in the tree rate density r rate prior distribution Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)

R  G M C Rates of evolution vary across lineages and over time Mutation rate: Variation in • metabolic rate • generation time • DNA repair Fixation rate: Variation in • strength and targets of selection • population sizes 10% 400 My 200 My A B C 20% 10% 10%

R-C M To accommodate variation in substitution rates ‘relaxed-clock’ models estimate lineage-specific substitution rates • Local molecular clocks • Punctuated rate change model • Log-normally distributed autocorrelated rates • Uncorrelated/independent rates models • Mixture models on branch rates

L M C Rate shifts occur infrequently over the tree Closely related lineages have equivalent rates (clustered by sub-clades) low high branch length = substitution rate Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)

L M C Most methods for estimating local clocks required specifying the number and locations of rate changes a priori Drummond and Suchard (2010) introduced a Bayesian method that samples over a broad range of possible random local clocks low high branch length = substitution rate Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)

A R Substitution rates evolve gradually over time – closely related lineages have similar rates The rate at a node is drawn from a lognormal distribution with a mean equal to the parent rate low high branch length = substitution rate Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)

A R R = (r,r,...,rN−) σ2 = φ ∗ ∆t μ = ln(rpi ) − σ2 2 ri ∼ Lognormal(μ,σ2) f (R | θR) = f (R | φ,A,rroot) φ is the variance parameter ∆t is the difference in time between the 2 nodes Density Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)

A R The rate at a node is drawn from a lognormal distribution with a mean equal to the parent rate The rate for the branch is equal to the mean of the two subtending nodes Density Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)

P R C Rate changes occur along lineages according to a point process At rate-change events, the new rate is a product of the parent’s rate and a Γ-distributed multiplier low high branch length = substitution rate Models of Lineage-specific Rate Variation (Huelsenbeck, Larget and Swofford 2000)

I/U R Lineage-specific rates are uncorrelated when the rate assigned to each branch is independently drawn from an underlying distribution low high branch length = substitution rate Models of Lineage-specific Rate Variation (Drummond et al. 2006)

I/U R In BEAST, the rates for the branches are drawn from a discretized lognormal distribution Density Rate Models of Lineage-specific Rate Variation (Drummond et al. 2006)

I/U R 0 2.01.51.00.5 Branch rate (r) Density Branch rates under the uncorrelated, discritized LN model Models of Lineage-specific Rate Variation (Drummond et al. 2006)

I/U R Density 0 2.01.51.00.5 Branch rate (r) Branch rates under the uncorrelated, discritized LN model Models of Lineage-specific Rate Variation (Drummond et al. 2006)

I M M Dirichlet process prior: Branches are partitioned into distinct rate categories branch length = substitution rate c5 c4 c3 c2 substitution rate classes c1 Models of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE)

T D P P (DPP) A stochastic process that models data as a mixture of distributions and can identify latent classes present in the data Branches are assumed to form distinct substitution rate clusters Efficient Markov chain Monte Carlo (MCMC) implementations allow for inference under this model branch length = substitution rate c5c4c3c2 substitution rate classes c1 DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

T D P P (DPP) A stochastic process that models data as a mixture of distributions and can identify latent classes present in the data Random variables under the DPP informed by the data: • the number of rate classes • the assignment of branches to classes • the rate value for each class branch length = substitution rate c5c4c3c2 substitution rate classes c1 DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

T D P P (DPP) Local molecular clock G0 5 c3 5 c2 rate classes branch length = substitution rate 8 c1 rate density ci r class-rate prior distribution DPP Model of Lineage-specific Rate Variation Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955.

T D P P (DPP) Global molecular clock G0 rate classes branch length = substitution rate 18 c1 rate density class-rate prior distribution ci r DPP Model of Lineage-specific Rate Variation Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955.

T D P P (DPP) Independent rates G0 1 c1 1 c18 1 c3 1 c2 rate classes branch length = substitution rate rate density ci r class-rate prior distribution DPP Model of Lineage-specific Rate Variation Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955.

T D P P (DPP) Global molecular clock 4 c3 6 c2 rate classes branch length = substitution rate 8 c1 Each of the 682,076,806,159 configurations has a prior weight DPP Model of Lineage-specific Rate Variation Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955.

M R V These are only a subset of the available models for branch-rate variation • Global molecular clock • Local molecular clocks • Punctuated rate change model • Log-normally distributed autocorrelated rates • Uncorrelated/independent rates models • Dirchlet process prior Models of Lineage-specific Rate Variation

M R V Are our models appropriate across all data sets? cave bear American black bear sloth bear Asian black bear brown bear polar bear American giant short-faced bear giant panda sun bear harbor seal spectacled bear 4.08 5.39 5.66 12.86 2.75 5.05 19.09 35.7 0.88 4.58 [3.11–5.27] [4.26–7.34] [9.77–16.58] [3.9–6.48] [0.66–1.17] [4.2–6.86] [2.1–3.57] [14.38–24.79] [3.51–5.89] 14.32 [9.77–16.58] 95% CI mean age (Ma) t2 t3 t4 t6 t7 t5 t8 t9 t10 tx node MP•MLu•MLp•Bayesian 100•100•100•1.00 100•100•100•1.00 85•93•93•1.00 76•94•97•1.00 99•97•94•1.00 100•100•100•1.00 100•100•100•1.00 100•100•100•1.00 t1 Eocene Oligocene Miocene Plio Plei Hol 34 5.3 1.823.8 0.01 Epochs Ma Global expansion of C4 biomass Major temperature drop and increasing seasonality Faunal turnover Krause et al., 2008. Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evol. Biol. 8. Taxa 1 5 10 50 100 500 1000 5000 10000 20000 0100200300 MYA Ophidiiformes Percomorpha Beryciformes Lampriformes Zeiforms Polymixiiformes Percopsif. + Gadiif. Aulopiformes Myctophiformes Argentiniformes Stomiiformes Osmeriformes Galaxiiformes Salmoniformes Esociformes Characiformes Siluriformes Gymnotiformes Cypriniformes Gonorynchiformes Denticipidae Clupeomorpha Osteoglossomorpha Elopomorpha Holostei Chondrostei Polypteriformes Clade r ε ΔAIC 1. 0.041 0.0017 25.3 2. 0.081 * 25.5 3. 0.067 0.37 45.1 4. 0 * 3.1 Bg. 0.011 0.0011 OstariophysiAcanthomorpha Teleostei Santini et al., 2009. Did genome duplication drive the origin of teleosts? A comparative study of diversification in ray-finned fishes. BMC Evol. Biol. 9.

M R V These are only a subset of the available models for branch-rate variation • Global molecular clock • Local molecular clocks • Punctuated rate change model • Log-normally distributed autocorrelated rates • Uncorrelated/independent rates models • Dirchlet process prior Model selection and model uncertainty are very important for Bayesian divergence time analysis Models of Lineage-specific Rate Variation

B D T E Estimating divergence times relies on 2 main elements: • Branch-specific rates: f (R | θR) • Node ages: f (A | θA,C) http://bayesiancook.blogspot.com/2013/12/two-sides-of-same-coin.html

P  N T Relaxed clock Bayesian analyses require a prior distribution on node times f(A | θA) Different node-age priors make different assumptions about the timing of divergence events Node Age Priors

G N T P Assumed to be vague or uninformative by not making assumptions about biological processes Uniform prior: the time at a given node has equal probability across the interval between the time of the parent node and the time of the oldest daughter node (conditioned on root age) Node Age Priors

G N T P Assumed to be vague or uninformative by not making assumptions about biological processes Dirichlet prior: ages of the interior nodes on a single path spanning the age of the root node to one of the tip nodes are sampled from a flat Dirichlet distribution (conditioned on root age) Node Age Priors

S B P Node-age priors based on stochastic models of lineage diversification Yule process: assumes a constant rate of speciation, S, across lineages A pure birth process—every node leaves extant descendants (no extinction) Leads to an exponential waiting-time between speciation events f(A | S,N) Node Age Priors

S B P Node-age priors based on stochastic models of lineage diversification Constant-rate birth-death process: at any point in time a lineage can speciate at rate S or go extinct with a rate of E f(A | S,E,N) Node Age Priors

S B P Different values of S and E lead to different trees Bayesian inference under these models can be very sensitive to the values of these parameters Using hyperpriors on S and E accounts for uncertainty in these hyperparameters Node Age Priors

S B P Node-age priors based on stochastic models of lineage diversification Birth-death-sampling process: an extension of the constant-rate birth-death model that accounts for random sampling of tips Conditions on a probability of sampling a tip, ρ f(A | S,E,ρ,N) Node Age Priors

P  N T Sequence data are only informative on relative rates & times Node-time priors cannot give precise estimates of absolute node ages We need external information (like fossils) to calibrate or scale the tree to absolute time f(A | θA,C) Node Age Priors

C D T Fossils (or other data) are necessary to estimate absolute node ages There is no information in the sequence data for absolute time Uncertainty in the placement of fossils A B C 20% 10% 10% 10% 200 My 400 My

C D Bayesian inference is well suited to accommodating uncertainty in the age of the calibration node Divergence times are calibrated by placing parametric densities on internal nodes offset by age estimates from the fossil record A B C 200 My Density Age

F C Fossil and geological data can be used to estimate the absolute ages of ancient divergences Time (My) Calibrating Divergence Times

F C The ages of extant taxa are known Time (My) Calibrating Divergence Times

F C Fossil taxa are assigned to monophyletic clades Time (My)Minimum age Calibrating Divergence Times Notogoneus osculus (Grande & Grande J. Paleont. 2008)

F C Fossil taxa are assigned to monophyletic clades and constrain the age of the MRCA Minimum age Time (My) Calibrating Divergence Times

M B P Assume constant rates of speciation (S) and extinction (E) (20 extant taxa) 0175 255075100125150 Time Birth-death model

M B P Assume constant rates of speciation (S) and extinction (E) (20 extant taxa) 0175 255075100125150 Time Birth-death model

M T P Fossilization events were generated according to a Poisson process this example has 162 fossilization events 0175 255075100125150 Time Modeling the Process of Fossilization

M T P The fossil sampling rate was evolved under an autocorrelated Brownian motion model 0.2 1.05 Sampling Rate 0175 255075100125150 Time Modeling the Process of Preservation/Recovery

M T P The fossil sampling rate was evolved under an autocorrelated Brownian motion model 0.2 1.05 Sampling Rate 0175 255075100125150 Time Modeling the Process of Preservation/Recovery

M T P 18 fossils were “recovered” in proportion to their sampling rates 0.2 1.05 Sampling Rate 0175 255075100125150 Time Recovered fossil Modeling the Process of Preservation/Recovery

R F Assume we know the true phylogenetic placement of the recovered fossils 0175 255075100125150 Time Modeling the Process of Preservation/Recovery

C F Only the oldest fossil assigned to a given node can be used for calibration 0175 255075100125150 Time Fossil Calibration

C F Only the oldest fossil assigned to a given node can be used for calibration 0175 255075100125150 Time Fossil Calibration

C F Only the oldest fossil assigned to a given node can be used for calibration 0175 255075100125150 Time Fossil Calibration

C F Taphonomic bias • disparity in fossilization and preservation • geographical distribution • recovery bias • identification 0175 255075100125150 Time Fossil Calibration

A F  C Misplaced fossils can affect node age estimates throughout the tree – if the fossil is older than its presumed MRCA Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

A F  C Crown clade: all living species and their most-recent common ancestor (MRCA) Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

A F  C Stem lineages: purely fossil forms that are closer to their descendant crown clade than any other crown clade Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

A F  C Fossiliferous horizons: the sources in the rock record for relevant fossils Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

F C Age estimates from fossils can provide minimum time constraints for internal nodes Reliable maximum bounds are typically unavailable Minimum age Time (My) Calibrating Divergence Times

P D  C N Parametric distributions are typically off-set by the age of the oldest fossil assigned to a clade These prior densities do not (necessarily) require specification of maximum bounds Uniform (min, max) Exponential (λ) Gamma (α, β) Log Normal (µ, σ2) Time (My)Minimum age Calibrating Divergence Times

P D  C N Describe the waiting time between the divergence event and the age of the oldest fossil Minimum age Time (My) Calibrating Divergence Times

P D  C N Overly informative priors can bias node age estimates to be too young Minimum age Exponential (λ) Time (My) Calibrating Divergence Times

P D  C N Uncertainty in the age of the MRCA of the clade relative to the age of the fossil may be better captured by vague prior densities Minimum age Exponential (λ) Time (My) Calibrating Divergence Times

P D  C N Expectednodeage Min age (fossil) Density Node age - Fossil age 0 252015105 30 30 35 40 45 Lognormal prior density λ = 5-1 λ = 20-1 λ = 60-1 Density Node age - Fossil age 0 80604020 100 60 80 100 120 140 Exponential prior density Expectednodeage Min age (fossil) Calibrating Divergence Times

P  P   Hyperprior: place an higher-order prior on the parameter of a prior distribution Sample the time from the MRCA to the fossil from a mixture of different exponential distributions Account for uncertainty in values of λ Density Hyperparameter Hyperprior Density Parameter Prior

P D  C N Common practice in Bayesian divergence-time estimation: Estimates of absolute node ages are driven primarily by the calibration density Specifying appropriate densities is a challenge for most molecular biologists Uniform (min, max) Exponential (λ) Gamma (α, β) Log Normal (µ, σ2 ) Time (My)Minimum age Calibration Density Approach

I F C We would prefer to eliminate the need for ad hoc calibration prior densities Calibration densities do not account for diversification of fossils Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

I F C We want to use all of the available fossils Example: Bears 12 fossils are reduced to 4 calibration ages with calibration density methods Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

I F C We want to use all of the available fossils Example: Bears 12 fossils are reduced to 4 calibration ages with calibration density methods Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

I F C Because fossils are part of the diversification process, we can combine fossil calibration with birth-death models Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

I F C This relies on a branching model that accounts for speciation, extinction, and rates of fossilization, preservation, and recovery Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

T F B-D P (FBD) Improving statistical inference of absolute node ages Eliminates the need to specify arbitrary calibration densities Better capture our statistical uncertainty in species divergence dates All reliable fossils associated with a clade are used 150 100 50 0 Time arXiv preprint: http://arxiv.org/abs/1310.2968 (Heath, Huelsenbeck, Stadler. in revision)

T F B-D P (FBD) Recovered fossil specimens provide historical observations of the diversification process that generated the tree of extant species 150 100 50 0 Time Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) The probability of the tree and fossil observations under a birth-death model with rate parameters: S = speciation E = extinction F = fossilization/recovery 150 100 50 0 Time Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) We assume that the fossil is a descendant of a specified calibrated node The time of the fossil: indicates an observation of the birth-death process after the age of the node 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) The fossil must attach to the tree at some time: 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) If it is the descendant of an unobserved lineage, then there is a speciation event at time on one of the 2 branches 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) If it is the descendant of an unobserved lineage, then there is a speciation event at time on one of the 2 branches 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) If = , the fossil is an observation of a lineage ancestral to the extant species 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) If = , the fossil is an observation of a lineage ancestral to the extant species 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) The probability of this realization of the diversification process is conditional on: S, E, and F 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Using MCMC, we can sample the age of the calibrated node • while conditioning on S, E, and F other node ages and 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) MCMC allows us to consider all possible values of (marginalization) 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) MCMC allows us to consider all possible values of (marginalization) 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) MCMC allows us to consider all possible values of (marginalization) 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) MCMC allows us to consider all possible values of (marginalization) 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) The posterior samples of the calibrated node age are informed by the fossil attachment times 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) The FBD model allows multiple fossils to calibrate a single node 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Given and , the new fossil can attach to the tree via speciation along either branch in the extant tree 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Given and , the new fossil can attach to the tree via speciation along either branch in the extant tree 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Given and , the new fossil can attach to the tree via speciation along either branch in the extant tree 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Or the unobserved branch leading to the other calibrating fossil 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) If = , then the new fossil lies directly on a branch in the extant tree 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) If = , then the new fossil lies directly on a branch in the extant tree 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Or it is an ancestor of the other calibrating fossil 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) The probability of this realization of the diversification process is conditional on: S, E, and F 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Using MCMC, we can sample the age of the calibrated node while conditioning on S, E, and F other node ages and and 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) MCMC allows us to consider all possible values of (marginalization) 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (http://arxiv.org/abs/1310.2968)

B I U  FBD Implemented in: • DPPDiv: github.com/trayc7/FDPPDIV Available soon: • RevBayes: sourceforge.net/projects/revbayes • BEAST2: beast2.cs.auckland.ac.nz/ 150 100 50 0 Time 0250 50100150200 Time (My) FBD Implementation (http://arxiv.org/abs/1310.2968)

FBD: P  S D FBD Model: Analyses of simulated datasets using 5 different sets of fossils Calibration fossils 10%50% 5%25% (simulated sequence data under GTR+Γ) Simulations: Analysis

I  N  F C 0 0.2 0.4 0.6 0.8 1 0.1 1 10 100 1000 FBD Model Pct. total fossils 50% 25% 10% 5% Coverageprobability True node age (log scale) Simulations: Results (http://arxiv.org/abs/1310.2968)

I  N  F C 0 10 20 30 40 50 60 70 80 90 0.1 1 10 100 1000 FBD Model Pct. total fossils 50% 25% 10% 5% Credibleintervalwidth True node age (log scale) Simulations: Results (http://arxiv.org/abs/1310.2968)

L T T The FBD model provides an estimate of the number of lineages over time 050100150200 Time (My) 3.0 2.0 1.0 1.0 50100150 0200 Time (My) Numberoflineages

L T T Lineage diversity over time with fossils MCMC samples the times of lineages in the reconstructed tree and the times of the fossil lineages (for 1 simulation replicate with 10% random sample of fossils) Simulations: Results

L T T Lineage diversity over time with fossils Visualize extant and sampled fossil lineage diversification when using all available fossils (21 total) (for 1 simulation replicate with 10% random sample of fossils) Simulations: Results

L T T Lineage diversity over time with fossils Choosing only the oldest (calibration) fossils reduced the set of sampled fossils from 21 to 12, giving less information about diversification over time (for 1 simulation replicate with 10% random sample of fossils) Simulations: Results

B: D T Sequence data for extant species: • 8 Ursidae • 1 Canidae (dog) • 1 Phocidae (spotted seal) Fossil ages: • 12 Ursidae • 5 Canidae • 5 Pinnipedimorpha DPP relaxed clock model (Heath, Holder, Huelsenbeck MBE 2012) Fixed tree topology Sequence Data Fossil Data Phylogenetic Relationships fossil canids fossil pinnepeds stem fossil ursids fossil Ailuropodinae giant short-faced bear cave bear Empirical Analysis (Wang 1994, 1999; Krause et al. 2008; Fulton & Strobeck 2010; Abella et al. 2012)

B: D T Gray wolf Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Ursidae Time (My)60 2040 0 Eocene Oligocene Miocene Plio Pleis Paleo 95% CI Empirical Analysis (http://arxiv.org/abs/1310.2968 ; silhouette images: http://phylopic.org/)

B: D T Am. black bear Gray wolf Spotted seal Giant panda Spectacled bear Sun bear Asian black bear Brown bear Polar bear Sloth bear Ursidae fossil Ailuropodinae stem fossil Ursidae Time (My)60 2040 0 Eocene Oligocene Miocene Plio Pleis Paleo fossil canids fossil pinnipeds fossil Arctodus fossil Ursus Empirical Analysis (http://arxiv.org/abs/1310.2968 ; silhouette images: http://phylopic.org/)

T F B-D P (FBD) Improved statistical inference of absolute node ages Biologically motivated models can better capture statistical uncertainty Am. black bear Gray wolf Spotted seal Giant panda Spectacled bear Sun bear Asian black bear Brown bear Polar bear Sloth bear Ursidae fossil Ailuropodinae stem fossil Ursidae Time (My)60 2040 0 Eocene Oligocene Miocene Plio Pleis Paleo fossil canids fossil pinnipeds fossil Arctodus fossil Ursus Modeling Diversification Processes (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Improved statistical inference of absolute node ages Use all available fossils Eliminates arbitrary choice of calibration priors Am. black bear Gray wolf Spotted seal Giant panda Spectacled bear Sun bear Asian black bear Brown bear Polar bear Sloth bear Ursidae fossil Ailuropodinae stem fossil Ursidae Time (My)60 2040 0 Eocene Oligocene Miocene Plio Pleis Paleo fossil canids fossil pinnipeds fossil Arctodus fossil Ursus Modeling Diversification Processes (http://arxiv.org/abs/1310.2968)

T F B-D P (FBD) Improved statistical inference of absolute node ages Extensions of the FBD can account for stratigraphic sampling of fossils and shifts in rates of speciation and extinction 0175 255075100125150 Time 0.2 1.05 Preservation Rate Modeling Diversification Processes (http://arxiv.org/abs/1310.2968)

F T D Ideally, we would like to include all of the available data Account for uncertainty in the placement of fossil lineages Keep all fossil data, not just the oldest descendant for a given node Time

F T D Combining extant and fossil species Fossil AgeData Sequence Data © AntWeb.org Morphological Data © AntWeb.org 0% 20% 40% 60% 80% 100% Orthoptera Paraneoptera Neuroptera Raphidioptera Coleo Polyphaga Coleoptera Adephaga Lepidoptera Mecoptera Xyela Macroxyela Runaria Paremphytus Blasticotoma Tenthredo Aglaostigma Dolerus Selandria Strongylogaster Monophadnoides Metallus Athalia Taxonus Hoplocampa Nematinus Nematus Cladius Monoctenus Gilpinia Diprion Cimbicinae Abia Corynis Arge Sterictiphora Perga Phylacteophaga Lophyrotoma Acordulecera Decameria Neurotoma Onycholyda Pamphilius Cephalcia Acantholyda Megalodontes cephalotes Megalodontes skorniakowii Cephus Calameuta Hartigia Syntexis Sirex Xeris Urocerus Tremex Xiphydria Orussus Stephanidae A Stephanidae B Megalyridae Trigonalidae Chalcidoidea Evanioidea Ichneumonidae Cynipoidea Apoidea A Apoidea B Apoidea C Vespidae Grimmaratavites Ghilarella Aulidontes Protosirex Aulisca Karatavites Sepulca Onokhoius Trematothorax Thoracotrema Prosyntexis Ferganolyda Rudisiricius Sogutia Xyelula Brigittepterus Mesolyda Brachysyntexis Dahuratoma Pseudoxyelocerus Palaeathalia Anaxyela Syntexyela Kulbastavia Undatoma Abrotoxyela Mesoxyela mesozoica SpathoxyelaTriassoxyela Leioxyela Nigrimonticola Chaetoxyela Anagaridyela Eoxyela Liadoxyela Xyelotoma Pamphiliidae undescribed Turgidontes Praeoryssus Paroryssus Mesorussus Symphytopterus Cleistogaster Leptephialtites Stephanogaster 97 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100100 100 100 100 100 100 100 97 68 55 77 75 91 84 61 57 62 98 96 71 71 85 93 52 95 64 80 64 99 77 99 98 97 82 52 58 94 90 60 70 57 8361 70 60 350 300 250 200 150 100 50 0 million years before present % of morphological characters scored for each terminal (Ronquist, Klopfstein, et al. Syst. Biol. 2012.)

O Overview of divergence time estimation • Relaxed clock models – accounting for variation in substitution rates among lineages • Tree priors and fossil calibration break BEAST v2.1.1 Tutorial http://treethinkers.org/divergence-time-estimation-using-beast/ • Walk through: set up BEAST input file in BEAUti and execute BEAST MCMC analysis • On your own: complete analysis & summarize output lunch

D T E S Program Models/Method r8s Strict clock, local clocks, NPRS, PL ape (R) NPRS, PL multidivtime log-n autocorrelated (plus some others) PhyBayes OU, log-n autocorrelated (plus some others) PhyloBayes CIR, white noise (uncorrelated) (plus some others) BEAST Uncorrelated (log-n & gamma), local clocks TreeTime Dirichlet model, CPP, uncorrelated MrBayes 3.2 CPP, strict clock, autocorrelated, uncorrelated DPPDiv DPP, strict clock, uncorrelated

BEAST Bayesian Evolutionary Analysis Sampling Trees • population size • growth/decline in population • bottlenecks/transition points • gene trees/species trees • virus transmission dynamics • recombination • migration • founder effects • epidemiological tracking • phylogeography • trait evolution • dates of MRCAs • lineage rates • ancestral character state reconstruction • times of bottlenecks/transitions http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST Bayesian Evolutionary Analysis Sampling Trees • free, open-source, cross-platform software package for estimating evolutionary parameters on rooted trees • includes several utility programs for creating input files and summarizing output • relies on a verbose XML syntax for executing analysis http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST BEAUti • a GUI, utility program fro generating properly formatted BEAST XML input files http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST BEAST 1 XML: The eXtensible Markup Language • specifies the sequences, node calibrations, models, priors, output file names, etc. • dataset-specific issues can arise and some understanding of the BEAST-specific XML format is essential for troubleshooting • there are a number of interesting models and analyses available in BEAST that cannot be specified using the BEAUti utility • XML syntax help: http://beast.bio.ed.ac.uk/XML_format http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST BEAST 2 XML: The eXtensible Markup Language • specifies the sequences, node calibrations, models, priors, output file names, etc. • dataset-specific issues can arise and some understanding of the BEAST-specific XML format is essential for troubleshooting • there are a number of interesting models and analyses available in BEAST that cannot be specified using the BEAUti utility • XML syntax help: http://www.beast2.org/wiki/index.php/BEAST_2.0.x_XML http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST BEAST (analysis program) • reads the commands in the xml input file • performs MCMC and generates output files • <file_stem>.trees (contains the trees and branch rates for every n generations) • <file_stem>.log (contains the parameter samples for every n generation) • the main BEAST binary only runs a single chain, for MCMCMC use BEASTMC3 http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST LogCombiner • combines the log or trees files from multiple, independent MCMC runs http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST TreeAnnotator • reads in the trees file and summarizes the topology, branch times, and rates • places annotations on the tree that can easily be viewed in FigTree • SumTrees in the DendroPy package (Sukumaran & Holder, 2010) is an alternative to TreeAnnotator. SumTrees is a richer program and offers more options for summarizing topology and branch parameters http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST T • Walkthrough using BEAUTti and executing the analysis in BEAST • Independently complete the tutorial and summarize the BEAST output http://beast.bio.ed.ac.uk/ (Drummond, Suchard, Xie, & Rambaut, MBE, 2012)

BEAST T This tutorial uses a simulated dataset of 10 taxa with 4 calibration points ML phylogeny R 1 2 3 Log-normal (ln(20), 0.75) Exponential (30-1 ) Normal (140, 10) Uniform (8, 32) 3 2 1 R T1 T2 T3 T4 T5 T6 T8 T7 T9 T10 20406080100120 0160 140200 180 Time 60 40 8 32 1R Calibrated nodes FossilsT mean = 140 mean = 90 median = 60 mean = 20

Add a comment

Related presentations

Related pages

Bayesian Divergence Time Estimation – Workshop Lecture ...

A lecture on Bayesian divergence-time estimation ... Share Bayesian Divergence Time Estimation – Workshop ... (8/2/2014)** #bodega14 – Bodega Bay ...
Read more

Divergence Time Estimation using BEAST | Workshop in ...

Bayesian Divergence Time Estimation ... Divergence Time Estimation using BEAST v2.1.3. ... 2016 Bodega Workshop 12 December 2015;
Read more

2014 Workshop | Workshop in Applied Phylogenetics

Workshop in Applied Phylogenetics. Bodega Marine Laboratory, Bodega Bay, California, March 8–15, 2014. ... Divergence-Time Estimation Lecture & BEAST2 ...
Read more

Heath Lab – EEOB – Iowa State University

Bayesian Methods for Estimating Divergence ... Tutorials and Lectures for workshops on applied phylogenetics ... Bayesian Divergence Time Estimation ...
Read more

D. S - Bioinformatics & Computational Biology

formance of Bayesian divergence time estimation with ... time estimation. All tutorials and lectures ... Bodega Applied Phylogenetics Workshop ...
Read more

Lecture 5: Bayesian Classification - Education - documents

... http://stp.lingfil.uu.se/~santinim/ml/2014/ml4lt_2014.htm ... Share Lecture 5: Bayesian ... Bayesian Divergence Time Estimation – Workshop Lecture.
Read more

| Events

Events Advanced Bioinformatics workshop dates for 2016 announced. ... 2014. 22 September: Workshop ... Bayesian divergence time estimation using relaxed clocks
Read more

The Fourth International Workshop on the Perspectives on ...

The Fourth International Workshop on the ... selection and estimation. ... Bayesian and Kullback-Leibler divergence principles of model ...
Read more

Time Estimation | LinkedIn

Time Estimation. Articles, experts, jobs, and more: get all the professional insights you need on LinkedIn. ... Project Officer at Time Warner Cable, ...
Read more

Bayesian | LinkedIn

View 40909 Bayesian posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn. LinkedIn Home What is LinkedIn? Join Today
Read more