Computational prediction and characterization of genomic islands: *insights into bacterial pathogenicity

67 %
33 %
Information about Computational prediction and characterization of genomic islands:...
Education

Published on January 5, 2009

Author: mlangill

Source: slideshare.net

Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity Morgan G.I. Langille Department of Molecular Biology & Biochemistry Simon Fraser University http://tinyurl.com/genomic-islands

Genomic Island History Early 1990’s clusters of virulence genes were found in E. coli (Hacker, et al.,1990) Pathogenicity Islands (PAIs) Clusters of genes that are associated with bacterial virulence Genomic Islands (GIs) (Hacker, et al., 2000) Segments of a genome that are thought to have originated from a horizontal transfer event

Early 1990’s clusters of virulence genes were found in E. coli (Hacker, et al.,1990)

Pathogenicity Islands (PAIs)

Clusters of genes that are associated with bacterial virulence

Genomic Islands (GIs) (Hacker, et al., 2000)

Segments of a genome that are thought to have originated from a horizontal transfer event

Genomic Island Interest Pathogenicity Islands Adhesins Fimbriae, intimin, etc. Secretion Systems Type III and Type IV Toxins Hemolysins, Pertussis toxin Invasins, Modulins, and Effectors Antibiotic Resistance Islands Metabolic Islands

Pathogenicity Islands

Adhesins

Fimbriae, intimin, etc.

Secretion Systems

Type III and Type IV

Toxins

Hemolysins, Pertussis toxin

Invasins, Modulins, and Effectors

Antibiotic Resistance Islands

Metabolic Islands



Genomic Island Interest

Methods for Predicting GIs Sequence based Abnormal sequence composition GC% bias, dinucleotide bias, codon bias, etc Genomic features associated with mobile genetic elements Direct repeats, IS elements, presence of tRNA and mobility genes (Integrases, transposases, etc.)

Sequence based

Abnormal sequence composition

GC% bias, dinucleotide bias, codon bias, etc

Genomic features associated with mobile genetic elements

Direct repeats, IS elements, presence of tRNA and mobility genes (Integrases, transposases, etc.)

Methods of Predicting GIs Comparative genomics based Identify genomic regions with anomalous phylogenetic patterns Requires multiple genomes

Comparative genomics based

Identify genomic regions with anomalous phylogenetic patterns

Requires multiple genomes



Previous state of GI identification Sequence based methods Numerous methods and constant improving of algorithm design Not very user friendly and accuracy of various methods not well described Comparative based methods Used by many researchers, but with no established method (only in-house scripts) Limited access to user friendly tools for this type of analysis

Sequence based methods

Numerous methods and constant improving of algorithm design

Not very user friendly and accuracy of various methods not well described

Comparative based methods

Used by many researchers, but with no established method (only in-house scripts)

Limited access to user friendly tools for this type of analysis

Outline IslandPick: A comparative genomics approach for genomic island identification Evaluating sequence composition based genomic island prediction methods IslandViewer: An integrated interface for computational identification and visualization of genomic islands The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain CRISPRs and their association with genomic islands

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

Outline IslandPick: A comparative genomics approach for genomic island identification Evaluating sequence composition based genomic island prediction methods IslandViewer: An integrated interface for computational identification and visualization of genomic islands The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain CRISPRs and their association with genomic islands

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

Mauve-whole genome aligner Allows genome arrangements and inversions Fast – Aligns two genomes < 15 minutes Command line accessible http://gel.ahabs.wisc.edu/mauve/ (Darling, et al., 2004)

Allows genome arrangements and inversions

Fast – Aligns two genomes < 15 minutes

Command line accessible

http://gel.ahabs.wisc.edu/mauve/

IslandPick: Outline Query Genome A Genome B Genome C Genome D Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C) Mauve (A & D) Genome D Putative Genomic Islands BLAST Identify overlapping unique regions

Selecting Comparative Genomes Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C) Mauve (A & D) Genome D Putative Genomic Islands BLAST Identify overlapping unique regions Genome B Genome C Genome D Comparative Genome Selection (using CVTree distances) Query Genome A

What genomes to use? We want to compare the query genome to other comparative genomes within certain evolutionary distances Need a phylogenetic tree or a distance matrix for all sequenced bacteria species

We want to compare the query genome to other comparative genomes within certain evolutionary distances

Need a phylogenetic tree or a distance matrix for all sequenced bacteria species

CVTree Uses matching K-strings between the proteomes of two organisms Constructs phylogenetic trees without alignment Avoids choosing genes for phylogenetic reconstruction Web Server http://cvtree.cbi.pku.edu.cn Downloadable command line executable (Qi, et al., 2004)

Uses matching K-strings between the proteomes of two organisms

Constructs phylogenetic trees without alignment

Avoids choosing genes for phylogenetic reconstruction

Web Server http://cvtree.cbi.pku.edu.cn

Downloadable command line executable

Example: Pseudomonas Tree Tree built using conserved genes, Omp85 & CarB, and maximum parsimony CVTree distances from P.syringae B728a are shown 0.227 0.256 0.397 0.393 0.411 0.428 0.430 0 0.481 P. fluorescens Pf-5 P. putida KT2440 P. fluorescens PfO-1 P. syringae tomato DC3000 P. syringae phaseolicola 1448A P. syringae syringae B728a P. aeruginosa PAO1 P. aeruginosa PA14 Acinetobacter ADP1

Tree built using conserved genes, Omp85 & CarB, and maximum parsimony

CVTree distances from P.syringae B728a are shown

Determining Distance Cutoffs Given the distances between any two species, how do we choose comparison genomes? Maximum Distance Cutoff Eliminates the use of genomes that have diverged too much (noise) Minimum Distance Cutoff Eliminates the use of genomes that have not diverged enough (very closely related strains) Minimum Number of Genomes Eliminates the use of too few comparative genomes

Given the distances between any two species, how do we choose comparison genomes?

Maximum Distance Cutoff

Eliminates the use of genomes that have diverged too much (noise)

Minimum Distance Cutoff

Eliminates the use of genomes that have not diverged enough (very closely related strains)

Minimum Number of Genomes

Eliminates the use of too few comparative genomes

Example: Pseudomonas Tree Maximum Distance Cutoff = 0.42 Minimum Number of Genomes = 3 0.227 0.256 0.397 0.393 0.411 0.428 0.430 0 0.481 P. fluorescens Pf-5 P. putida KT2440 P. fluorescens PfO-1 P. syringae tomato DC3000 P. syringae phaseolicola 1448A P. syringae syringae B728a P. aeruginosa PAO1 P. aeruginosa PA14 Acinetobacter ADP1 Minimum Distance Cutoff = 0.10

Predicting Similar Aged GIs GI Insertion Query Genome 1 genome < distance X Query Genome GI Insertion

Outline IslandPick: A comparative genomics approach for genomic island identification Evaluating sequence composition based genomic island prediction methods IslandViewer: An integrated interface for computational identification and visualization of genomic islands The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain CRISPRs and their association with genomic islands

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

Accuracy of GI methods Sequence based GI prediction methods Only require a single genome Can easily make false predictions Highly expressed genes May miss predictions Amelioration of DNA to host genome Source genome has same composition as host genome Usually evaluate accuracy using simulated horizontal gene transfer events or small datasets of verified GIs IslandPick is independent of sequence composition methods generated a “positive” dataset of islands

Sequence based GI prediction methods

Only require a single genome

Can easily make false predictions

Highly expressed genes

May miss predictions

Amelioration of DNA to host genome

Source genome has same composition as host genome

Usually evaluate accuracy using simulated horizontal gene transfer events or small datasets of verified GIs

IslandPick is independent of sequence composition methods

generated a “positive” dataset of islands

Developing a Negative Dataset To identify false positives we need a “negative” dataset that does not contain GIs Identify regions that are conserved across several genomes using Mauve whole genome alignment Use the same genomes as selected by IslandPick with one additional cutoff

To identify false positives we need a “negative” dataset that does not contain GIs

Identify regions that are conserved across several genomes using Mauve whole genome alignment

Use the same genomes as selected by IslandPick with one additional cutoff

Negative Dataset Query Genome 1 genome > distance X GI Insertion Query Genome GI Insertion

IslandPick Cutoffs

118 chromosomes 771 GIs ~100 genes/strain 173 chromosomes 736 chromosomes (Langille, et al., 2008)

118 chromosomes

771 GIs

~100 genes/strain

GI Prediction Accuracy Positive Dataset Negative Dataset Predicted Dataset Entire Genome TP FP FN Precision = TP / (TP + FP) Recall = TP / (TP + FN) TN

GI Prediction Accuracy (Langille, et al.,2008) Tool Average number of nucleotides in GIs per genome (kb) Precision Recall Overall Accuracy SIGI-HMM 233 92 33.0 86 IslandPath/ Dimob 171 86 36 86 PAI IDA 163 68 32 84 Centroid 171 61 28 82 IslandPath/ Dinuc 444 55 53 82 Alien Hunter 1265 38 77 71 Literature* 639 100 87 96

Outline IslandPick: A comparative genomics approach for genomic island identification Evaluating sequence composition based genomic island prediction methods IslandViewer: An integrated interface for computational identification and visualization of genomic islands The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain CRISPRs and their association with genomic islands

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

IslandViewer (Langille, et al., 2009) Website that integrates the most accurate GI prediction programs SIGI-HMM, IslandPath-DIMOB, and IslandPick Genomic island prediction pre-calculated for all genomes Automatically updated monthly User genome submission available IslandPick can be run using manually selected comparison genomes Download data for a genomic island, a chromosome, or entire dataset http://www.pathogenomics.sfu.ca/islandviewer/

Website that integrates the most accurate GI prediction programs SIGI-HMM, IslandPath-DIMOB, and IslandPick

Genomic island prediction pre-calculated for all genomes

Automatically updated monthly

User genome submission available

IslandPick can be run using manually selected comparison genomes

Download data for a genomic island, a chromosome, or entire dataset

http://www.pathogenomics.sfu.ca/islandviewer/









IslandPick – Manual genome selection

User Genome Submission

Outline IslandPick: A comparative genomics approach for genomic island identification Evaluating sequence composition based genomic island prediction methods IslandViewer: An integrated interface for computational identification and visualization of genomic islands The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain CRISPRs and their association with genomic islands

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

P seudomonas aeruginosa Liverpool Epidemic Strain (LES) Highly successful at colonizing cystic fibrosis (CF) patients Has replaced previously established strains Caused infections of non-CF patients Can cause greater morbidity in CF than other strains of P. aeruginosa ( Salunkhe, et al., 2005)

Highly successful at colonizing cystic fibrosis (CF) patients

Has replaced previously established strains

Caused infections of non-CF patients

Can cause greater morbidity in CF than other strains of P. aeruginosa

( Salunkhe, et al., 2005)

LES Analysis Genome sequenced by Sanger Centre I led annotation of the genome and analysis of GIs 6 Prophages 5 Genomic Islands (Winstanley, Langille, et al., 2008)

Genome sequenced by Sanger Centre

I led annotation of the genome and analysis of GIs

6 Prophages

5 Genomic Islands

Signature-tagged mutagenesis (STM) STM is a method to identify genes associated with pathogenesis LES used in a chronic rat lung infection model 47 genes identified by STM 5 of these genes are within GIs and prophage regions http://www.traill.uiuc.edu/uploads/porknet/papers/LitchtensteigerPaper.pdf

STM is a method to identify genes associated with pathogenesis

LES used in a chronic rat lung infection model

47 genes identified by STM

5 of these genes are within GIs and prophage regions

LES Prophage (Winstanley, Langille, et al., 2008)

LES Genomic Islands (Winstanley, Langille, et al., 2008)

LES in-vivo competitive index Mutants grown for 7 days in rat lung with the wild type LES A CI of less than 1 indicates attenuation of virulence 4 genes within prophage and GIs had strong impact on competitiveness (Winstanley, Langille, 2008)

Mutants grown for 7 days in rat lung with the wild type LES

A CI of less than 1 indicates attenuation of virulence

4 genes within prophage and GIs had strong impact on competitiveness

Outline IslandPick: A comparative genomics approach for genomic island identification Evaluating sequence composition based genomic island prediction methods IslandViewer: An integrated interface for computational identification and visualization of genomic islands The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain CRISPRs and their association with genomic islands

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

Overview of CRISPRs CRISPRs: C lustered r egularly i nterspaced s hort p alindromic r epeats Able to provide phage resistance and block conjugation Thought to be similar to RNAi, except DNA (instead of RNA) is thought to be the target

CRISPRs: C lustered r egularly i nterspaced s hort p alindromic r epeats

Able to provide phage resistance and block conjugation

Thought to be similar to RNAi, except DNA (instead of RNA) is thought to be the target

CRISPRs and HGT Previous studies have shown some evidence of HGT of CRISPRs Phylogenetic profiles of CAS genes (Haft, et al., 2005) CRISPRs within 10 megaplasmids (Godde, et al., 2006) CRISPRs within two prophage in Clostridium difficile (Sebaihia, et al., 2006) Analysis of CRISPRs and GIs had not been conducted previously

Previous studies have shown some evidence of HGT of CRISPRs

Phylogenetic profiles of CAS genes (Haft, et al., 2005)

CRISPRs within 10 megaplasmids (Godde, et al., 2006)

CRISPRs within two prophage in Clostridium difficile (Sebaihia, et al., 2006)

Analysis of CRISPRs and GIs had not been conducted previously

CRISPRs within GIs CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage.php GI predictions were taken from the union of IslandPick, IslandPath-DIMOB, and SIGI-HMM Number of CRISPRs inside and outside GIs were compared CRISPRs are over-represented in GIs Domain of Life Number of Genomes Number of GIs Proportion of Genome in GIs Total Number of CRISPRs Expected CRISPRs in GIs Observed CRISPRs in GIs Significance (Chi-square Test)* Archaea 49 298 3.7% 206 7.7 14 0.020 Bacteria 306 4874 6.4% 837 53.3 114 8.1x 10 -18 Archaea & Bacteria 355 5172 6.1% 1043 64.0 128 1.6x 10 -16

CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage.php

GI predictions were taken from the union of IslandPick, IslandPath-DIMOB, and SIGI-HMM

Number of CRISPRs inside and outside GIs were compared

Phage genes within GIs Many GIs are known to contain phage genes What proportion of GI genes have links to phage? Identified genes with “phage” in their annotation within GIs 35% of all ‘phage genes’ are within GIs (6% expected) Phage genes are over-represented in GIs Genomic Regions Number of ‘phage genes’ Total number of genes in region Chi- Square Test Observed Expected 3 Inside GIs 1 6990 1264.22 165784 ~0 Outside GIs 1 12868 18593.78 2438303

Many GIs are known to contain phage genes

What proportion of GI genes have links to phage?

Identified genes with “phage” in their annotation within GIs

35% of all ‘phage genes’ are within GIs (6% expected)

Archaea and CRISPRs Prevalence of CRISPRs in Archaea genomes could result in reduced phage genes Archaea Bacteria Genomes containing a CRISPR 90% 40% Proportion of phage genes 0.10% 0.79% Proportion of GIs with a phage gene 5.1% 17.6%

GIs with CRISPRs and phage genes Is there evidence supporting that some CRISPRs are being transferred by phage? GIs containing CRISPR(s) also contain an over-representation of phage genes -> suggesting that some CRISPRs are transferred by phage Genomic Regions Number of ‘phage genes’ Total number of genes in region Chi- Square Test Observed Expected 3 GIs containing CRISPR(s) 2 13 4.5 1500 5.7 x 10 -5 Outside GIs 2 812 820.5 274073

Is there evidence supporting that some CRISPRs are being transferred by phage?

CRISPR conclusions CRISPR over-representation in GIs suggest that they are being horizontally transferred Some GIs that contain CRISPRs may have phage origins CRISPRs in Archaea could be limiting HGT by increasing resistance to phage

CRISPR over-representation in GIs suggest that they are being horizontally transferred

Some GIs that contain CRISPRs may have phage origins

CRISPRs in Archaea could be limiting HGT by increasing resistance to phage

Conclusions Several advances in GI computational prediction IslandPick, a novel automated comparative genomics based GI prediction program Analysis of the accuracy of several sequenced based GI prediction methods IslandViewer: An integrated interface for computational identification and visualization of genomic islands Insights into GI evolution and their pathogenicity P. aeruginosa LES – evidence that genomic islands and prophage regions contain genes that provide a competitive advantage for infection in a chronic rat infection model. CRISPRs and their association with genomic islands

Several advances in GI computational prediction

IslandPick, a novel automated comparative genomics based GI prediction program

Analysis of the accuracy of several sequenced based GI prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

Insights into GI evolution and their pathogenicity

P. aeruginosa LES – evidence that genomic islands and prophage regions contain genes that provide a competitive advantage for infection in a chronic rat infection model.

CRISPRs and their association with genomic islands

Acknowledgements Supervisor Dr. Fiona Brinkman Supervisor Committee Dr. Baillie Dr. Pio P. aeruginosa LES Craig Winstanley Roger Levesque Bob Hancock Nick Thomson

Add a comment

Related presentations

Related pages

COMPUTATIONAL PREDICTION AND CHARACTER IZATION OF GENOMIC ...

COMPUTATIONAL PREDICTION AND ... INSIGHTS INTO BACTERIAL PATHOGENICITY by ... genomic islands: insights into bacte rial
Read more

COMPUTATIONAL PREDICTION AND CHARACTER IZATION OF GENOMIC ...

... of genomic islands: insights into bacterial ... computational prediction and character ization of genomic islands: insights into bacterial pathogenicity.
Read more

Computational Prediction of Genomic Island Phd Thesis Final

Computational Prediction of Genomic Island Phd Thesis Final - Free download as PDF File (.pdf), Text file (.txt) or read online for free. Computational ...
Read more

Genomic island - Wikipedia, the free encyclopedia

A genomic island (GI) is part of a ... For example a GI associated with pathogenesis is often called a pathogenicity island ... Computational prediction ...
Read more

Morgan Langille | Mendeley - Free reference manager and ...

... genomic islands in Bacteria ... Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity ...
Read more

Current Students | Graduate Program in Bioinformatics

Current Students; Alumni; General ... Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity ...
Read more

Morgan Langille | University of California, Davis | Papers ...

Computational Prediction and Characterization of Genomic Islands: Insights Into Bacterial Pathogenicity more. ... Evaluation of Genomic Island Predictors ...
Read more

PIPS: Pathogenicity Island Prediction Software

The adaptability of pathogenic bacteria to hosts is influenced by the genomic ... Several computational ... islands (pathogenicity island prediction ...
Read more

IslandViewer: An Integrated Interface for Computational ...

... or ∗ To Genomic island prediction methods ... GI prediction methods into IslandViewer were ... step in the characterization of a bacterial ...
Read more