UC Davis EVE161 Lecture 10 by @phylogenomics

50 %
50 %
Information about UC Davis EVE161 Lecture 10 by @phylogenomics
Education

Published on February 20, 2014

Author: phylogenomics

Source: slideshare.net

Lecture 10: EVE 161:
 Microbial Phylogenomics ! Lecture #10: Era III: Genome Sequencing ! UC Davis, Winter 2014 Instructor: Jonathan Eisen Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !1

Where we are going and where we have been • Previous lecture: ! 9: rRNA Case Study - Built Environment • Current Lecture: ! 10: Genome Sequencing • Next Lecture: ! 11: Genome Sequencing II Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !2

1st Genome Sequence Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Fleischma !3

insight progress 1. Library construction 2. Random sequencing phase (i) Sequence DNA (15,000 sequences per Mb) (i) Isolate DNA –1 3. Closure phase (i) Assemble sequences (ii) Close gaps –1 (ii) Fragment DNA (iii) Edit GGG ACTGTTC... (iii) Clone DNA (iv) Annotation 237 800,000 1 700,000 4. Complete genome sequence 239 100,000 238 200,000 600,000 300,000 500,000 400,000 Figure 1 Diagram depicting the steps in a whole-genome shotgun sequencing project. analysis of the genomes of two thermophilic bacterial species, be extensive, it is somehow constrained by phylogenetic relationAquifex aeolicus and Thermotoga maritima, revealed that 20–25% of ships. Other evidence for a ‘core’ of particular lineages comes from the genes in these species were more similar to genes from archaea the finding of a conserved core of euryarchaeal genomes21,22 and than those from bacteria13,14. This led to the suggestion of possible another finding that some types of gene might be more prone to gene Slides for these species and archaeal transfer than others23. It Winter seems extensive gene exchanges between UC Davis EVE161 Course Taught by Jonathan Eisentherefore2014 likely that horizontal gene

Complete Genome/Chromosome Progress Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

From http://genomesonline.org Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

TIGR Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Why Completeness is Important • Improves characterization of genome features • Gene order, replication origins • Better comparative genomics • Genome duplications, inversions • Presence and absence of particular genes can be very important • Missing sequence might be important (e.g., centromere) • Allows researchers to focus on biology not sequencing • Facilitates large scale correlation studies Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

General Steps in Analysis of Complete Genomes • Identification/prediction of genes • Characterization of gene features • Characterization of genome features • Prediction of gene function • Prediction of pathways • Integration with known biological data • Comparative genomics Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

General Steps in Analysis of Complete Genomes • Structural Annotation • Identification/prediction of genes • Characterization of gene features • Characterization of genome features • Functional Annotation • Prediction of gene function • Prediction of pathways • Integration with known biological data • Evolutionary Annotation • Comparative genomics Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Structural Annotation I: Genes in Genomes • Protein coding genes. ! In long open reading frames ! ORFs interrupted by introns in eukaryotes ! Take up most of the genome in prokaryotes, but only a small portion of the eukaryotic genome • RNA-only genes ! Transfer RNA ! ribosomal RNA ! snoRNAs (guide ribosomal and transfer RNA maturation) ! intron splicing ! guiding mRNAs to the membrane for translation ! gene regulation—this is a growing list Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Structural Annotation II: Other Features to Find • Gene control sequences ! Promoters ! Regulatory elements • Transposable elements, both active and defective ! DNA transposons and retrotransposons ! Many types and sizes • Other Repeated sequences. ! Centromeres and telomeres ! Many with unknown (or no) function • Unique sequences that have no obvious function Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

How to Find ncRNAs • The most universal genes, such as tRNA and rRNA, are very conserved and thus easy to detect. Finding them first removes some areas of the genome from further consideration. • One easy approach to finding common RNA genes is just looking for sequence homology with related species: a BLAST search will find most of them quite easily • Functional RNAs are characterized by secondary structure caused by base pairing within the molecule. • Determining the folding pattern is a matter of testing many possibilities to find the one with the minimum free energy, which is the most stable structure. • The free energy calculations are in turn based on experiments where short synthetic RNA molecules are melted • Related to this is the concept that paired regions (stems) will be conserved across species lines even if the individual bases aren’t conserved. That is, if there is an A-U pairing on one species, the same position might be occupied by a G-C in another species. • This is an example of concerted evolution: a deleterious mutation at one site is cancelled by a compensating mutation at another site. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

RNA Structure • • RNA differs from DNA in having fairly common G-U base pairs. Also, many functional RNAs have unusual modified bases such as pseudouridine and inosine. The pseudoknot, pairing between a loop and a sequence outside its stem, is especially difficult to detect: computationally intense and not subject to the normal situation that RNA base pairing follows a nested pattern – But pseudoknots seem to be fairly rare. • Essentially, RNA folding programs start with all possible short sequences, then build to larger ones, adding the contribution of each structural element. – There is an element of dynamic programming here as well. – And, “stochastic context-free grammars”, something I really don’t want to approach right now! Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Finding tRNAs • • • tRNAs have a highly conserved structure, with 3 main stem-andloop structures that form a cloverleaf structure, and several conserved bases. Finding such sequences is a matter of looking in the DNA for the proper features located the proper distance apart. Looking for such sequences is well-suited to a decision tree, a series of steps that the sequence must pass. In addition, a score is kept, rating how well the sequence passed each step. This allows a more stringent analysis later on, to eliminate false positives. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Bacteria / Archaeal Protein Coding Genes • Bacteria use ATG as their main start codon, but GTG and TTG are also fairly common, and a few others are occasionally used. – • The stop codons are the same as in eukaryotes: TGA, TAA, TAG – • • stop codons are (almost) absolute: except for a few cases of programmed frameshifts and the use of TGA for selenocysteine, the stop codon at the end of an ORF is the end of protein translation. Genes can overlap by a small amount. Not much, but a few codons of overlap is common enough so that you can’t just eliminate overlaps as impossible. Cross-species homology works well for many genes. It is very unlikely that non-coding sequence will be conserved. – • Remember that start codons are also used internally: the actual start codon may not be the first one in the ORF. But, a significant minority of genes (say 20%) are unique to a given species. Translation start signals (ribosome binding sites; Shine-Dalgarno sequences) are often found just upstream from the start codon – – however, some aren’t recognizable genes in operons sometimes don’t always have a separate ribosome binding site for each gene Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Composition Methods • The frequency of various codons is different in coding regions as compared to non-coding regions. – This extends to G-C content, dinucleotide frequencies, and other measures of composition. Dicodons (groups of 6 bases) are often used – Well documented experimentally. • The composition varies between different proteins of course, and it is affected within a species by the amounts of the various tRNAs present – horizontally transferred genes can also confuse things: they tend to have compositions that reflect their original species. – A second group with unusual compositions are highly expressed genes. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Eukaryotic Genes Harder to Find • • Some fundamental differences between prokaryotes and eukaryotes: There is lots of non-coding DNA in eukaryotes. – First step: find repeated sequences and RNA genes – Note that eukaryotes have 3 main RNA polymerases. RNA polymerase 2 (pol2) transcribes all protein-coding genes, while pol1 and pol3 transcribe various RNA-only genes. • • • most eukaryotic genes are split into exons and introns. Only 1 gene per transcript in eukaryotes. No ribosome binding sites: translation starts at the first ATG in the mRNA – thus, in eukaryotic genomes, searching for the transcription start site (TSS) makes sense. • Many fewer eukaryotic genomes have been sequenced Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Exons • Exon sequences can often be identified by sequence conservation, at least roughly. • Dicodon statistics, as was used for prokaryotes, also is useful – eukaryotic genomes tend to contain many isochores, regions of different GC content, and composition statistics can vary between isochores. • The initial and terminal exons contain untranslated regions, and thus special methods are needed to detect them. • Predicting splice junctions is a matter of collecting information about the sequences surrounding each possible GT/AC pair, then running this information through some combination of decision tree, Markov models, discriminant analysis, or neural networks, in an attemp to massage the data into giving a reliable score. – In general, sites are more likely to be correct if predicted by multiple methods – Experimental data from ESTs can be very helpful here. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Annotation Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Classification I: GO • The Gene Ontology (GO) consortium (http://www.geneontology.org/) is an attempt describe gene products with a structured controlled vocabulary, a set of invariant terms that have a known relationship to each other. • Each GO term is given a number of the form GO:nnnnnnn (7 digits), as well as a term name. For example, GO:0005102 is “receptor binding”. • There are 3 root terms: biological process, cellular component, and molecular function. A gene product will probably be described by GO terms from each of these “ontologies”. (ontology is a branch of philosophy concerned with the nature of being, and the basic categories of being and their relationships.) – • For instance, cytochrome c is described with the molecular function term “oxidoreductase activity”, the biological process terms “oxidative phosphorylation” and “induction of cell death”, and the cellular component terms “mitochondrial matrix” and “mitochondrial inner membrane” The terms are arranged in a hierarchy that is a “directed acyclic graph” and not a tree. This means simply that each term can have more than one parent term, but the direction of parent to child (i.e. less specific to more specific) is always maintained. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Classification II: Enzyme Nomenclature • Enzyme functions: which reactants are converted to which products • Enzyme functions are given unique numbers by the Enzyme Commission. – Across many species, the enzymes that perform a specific function are usually evolutionarily related. However, this isn’t necessarily true. There are cases of two entirely different enzymes evolving similar functions. – Often, two or more gene products in a genome will have the same E.C. number. – E.C. numbers are four integers separated by dots. The left-most number is the least specific – For example, the tripeptide aminopeptidases have the code "EC 3.4.11.4", whose components indicate the following groups of enzymes: • EC 3 enzymes are hydrolases (enzymes that use water to break up some other molecule) • EC 3.4 are hydrolases that act on peptide bonds • EC 3.4.11 are those hydrolases that cleave off the amino-terminal amino acid from a polypeptide • EC 3.4.11.4 are those that cleave off the amino-terminal end from a tripeptide • Top level E.C. numbers: – E.C. 1: oxidoreductases (often dehydrogenases): electron transfer – E.C. 2: transferases: transfer of functional groups (e.g. phosphate) between molecules. – E.C. 3: hydrolases: splitting a molecule by adding water to a bond. – E.C. 4: lyases: non-hydrolytic addition or removal of groups from a molecule – E.C. 5: isomerases: rearrangements of atoms within a molecule – E.C. 6: ligases: joining two molecules using energy from ATP Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Prediction • • • • • • BLAST searches HMM models of specific genes or gene families (Pfam, TIGRfam, FIGfam). Sequence motifs and domains. If the gene is not a good match to previously known genes, these provide useful clues. Cellular location predictions, especially for transmembrane proteins. Genomic neighbors, especially in bacteria, where related functions are often found together in operons and divergons (genes transcribed in opposite directions that use a common control region). Biochemical pathway/subsystem information. If an organism has most of the genes needed to perform a function, any missing functions are probably present too. – Also, experimental data about an organism’s capacities can be used to decide whether the relevant functions are present in the genome. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Prediction II: Membrane Spanning • Integral membrane proteins contain amino acid sequences that go through the membrane one or several times. – There are also peripheral membrane proteins that stick to the hydrophilic head groups by ionic and polar interactions – There are also some that have covalently bound hydrophobic groups, such as myristoylate, a 14 carbon saturated fatty acid that is attached to the N-terminal amino group. • There are 2 main protein structures that cross membranes. – Most are alpha helices, and in proteins that span multiple times, these alpha helices are packed together in a coiled-coil. Length = 15-30 amino acids. – Less commonly, there are proteins with membrane spanning “beta barrels”, composed of beta sheets wrapped into a cylinder. An example: porins, which transport water across the membrane. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Prediction by Phylogeny • Key step in genome projects • More accurate predictions help guide experimental and computational analyses • Many diverse approaches • All improved both by “phylogenomic” type analyses that integrate evolutionary reconstructions and understanding of how new functions evolve Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Prediction • Identification of motifs ! Short regions of sequence similarity that are indicative of general activity ! e.g., ATP binding • Homology/similarity based methods ! Gene sequence is searched against a databases of other sequences ! If significant similar genes are found, their functional information is used • Problem ! Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Helicobacter pylori Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

H. pylori genome - 1997 “The ability of H. pylori to perform mismatch repair is suggested by the presence of methyl transferases, mutS and uvrD. However, orthologues of MutH and MutL were not identified.” Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

MutL ?? From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylogenetic Tree of MutS Family Yeast Human Celeg Aquae Strpy Bacsu Synsp Deira Helpy Borbu Metth mSaco Yeast Human Mouse Arath Arath Human Mouse Spombe Yeast Yeast Spombe Yeast Celeg Human Fly Xenla Rat Mouse Human Yeast Neucr Arath Aquae Trepa Chltr Deira Theaq BacsuBorbu Thema SynspStrpy Ecoli Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Neigo Based on Eisen, 1998 Nucl Acids 30 Res 26: 4291-4300.

MutS Subfamilies MSH5 Yeast Human Celeg MSH6 MSH3 MSH1 MutS2 Aquae Strpy Bacsu Synsp Deira Helpy Borbu Metth mSaco Yeast Human Mouse Arath Yeast Celeg MSH4 Human Arath Human Mouse Spombe Yeast Fly Xenla Rat Mouse Human Yeast Neucr Arath Yeast Spombe Aquae Chltr Deira Theaq Thema MSH2 Trepa BacsuBorbu SynspStrpy Ecoli Neigo MutS1 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Based on Eisen, 1998 Nucl Acids 31 Res 26: 4291-4300.

Overlaying Functions onto Tree MutS2 MSH5 Aquae Strpy Bacsu Synsp Deira Helpy Borbu Metth Yeast Human Celeg MSH6 mSaco Yeast Human Mouse Arath MSH3 MSH1 MSH4 Yeast Celeg Human Arath Human Mouse Spombe Yeast Fly Xenla Rat Mouse Human Yeast Neucr Arath Yeast Spombe Aquae Chltr Deira Theaq Thema Trepa BacsuBorbu Synsp Strpy Ecoli Neigo MutS1 MSH2 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Based on Eisen, 1998 Nucl Acids 32 Res 26: 4291-4300.

MutS Subfamilies • • • • • MutS1 MSH1 MSH2 MSH3 MSH6 Bacterial MMR Euk - mitochondrial MMR Euk - all MMR in nucleus Euk - loop MMR in nucleus Euk - base:base MMR in nucleus Bacterial - function unknown Euk - meiotic crossing-over Euk - meiotic crossing-over ! • MutS2 • MSH4 • MSH5 TIGR Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Prediction Using Tree MSH5 - Meiotic Crossing Over Aquae Strpy Bacsu Synsp Deira Helpy Borbu Metth Yeast Human Celeg MSH6 - Nuclear 
 Repair Of Mismatches MutS2 - Unknown Functions mSaco Yeast Human Mouse Arath Yeast Celeg Human Arath MSH3 - Nuclear 
 Human Mouse RepairOf Loops Spombe Yeast MSH1 Mitochondrial Repair MSH4 - Meiotic Crossing Over Fly Xenla Rat Mouse Human Yeast Neucr Arath Yeast Spombe Aquae Chltr Deira Theaq Thema MSH2 - Eukaryotic Nuclear Mismatch and Loop Repair Trepa BacsuBorbu Synsp Strpy Ecoli Neigo Slides for MutS1 - EVE161 Course Taught by Jonathan Eisen Winter 2014 UC Davis Bacterial Mismatch and Loop Repair Based on Eisen, 1998 Nucl Acids 34 Res 26: 4291-4300.

Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Which Subfamilies? MutL Homologs Bacteria Escherichia coli K12 Haemophilus influenzae Rd KW20 Neisseria gonorrhoeae Helicobacter pylori 26695 Mycoplasma genitalium G-37 Mycoplasma pneumoniae M129 Bacillus subtilis 169 Streptococcus pyogenes Mycobacterium tuberculosis Synechocystis sp. PCC6803 Treponema pallidum Nichols Borrelia burgdorferi B31 Aquifex aeolicus Deinococcus radiodurans R1 1 1 1 1 2 2 2 1 2 2 2 MutS1 MutS1 MutS1 MutS2 MutS1,MutS2 MutS1,MutS2 MutS1,MutS2 MutS1 MutS1,MutS2 MutS1,MutS2 MutS1,MutS2 1 1 1 1 1 1 1 1 1 1 Archaea Archaeoglobus fulgidus VC-16, DSM4304 Methanococcus janasscii DSM 2661 Methanobacterium thermoautotrophicum ΔH 1 MutS2 - Eukaryotes Saccharomyces cerevisiae Homo sapiens 6 5 MSH1-6 MSH2-6 3+ 3+ TIGR Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Blast Search of H. pylori “MutS” Sequences producing significant alignments: sp|P73625|MUTS_SYNY3 sp|P74926|MUTS_THEMA sp|P44834|MUTS_HAEIN sp|P10339|MUTS_SALTY sp|O66652|MUTS_AQUAE sp|P23909|MUTS_ECOLI DNA DNA DNA DNA DNA DNA MISMATCH MISMATCH MISMATCH MISMATCH MISMATCH MISMATCH REPAIR REPAIR REPAIR REPAIR REPAIR REPAIR Score E (bits) Value PROTEIN PROTEIN PROTEIN PROTEIN PROTEIN PROTEIN 117 69 64 62 57 57 • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 3e-25 1e-10 3e-09 2e-08 4e-07 4e-07

High Mutation Rate in H. pylori Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylogenomics PHYLOGENENETIC PREDICTION OF GENE FUNCTION EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 2B 1A 2A 1B 3B IDENTIFY HOMOLOGS 2 1 3 4 5 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 1 2 3 4 5 6 1 3B 2 3 4 5 6 3 4 5 6 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 1A 2A 3A 1B 2B 1 3B 2 INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 1A 1B Species 2 2A 2B Species 3 3A 3B 1 2 3 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Duplication Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Based on Eisen, 1998 Genome Res 8: 163-167.

1 2 4 3 5 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 6

Chemosynthetic Symbionts Eisen et al. 1992 Eisen et al. 1992. J. Bact.174: 3416 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. ) Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes ! • Ask: Yes or No, is each gene found in each other species ! • Cluster genes by distribution patterns (profiles) Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Sporulation Gene Profile Wu et al. 2005 PLoS Genetics 1: e65. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

B. subtilis new sporulation genes Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Functional Prediction III: Colocalization • Operon structure is often maintained over fairly large taxonomic regions. – – • Sometimes gene order is altered, and sometimes one or more enzymes are missing. But in general, this phenomenon allows recognition or verification that widely diverged enzymes do in fact have the same function. This is an operon that contains part of the glycolytic pathway. – – – – – – 1: phosphoclycerate mutase 2: triosephosphate isomerase 3: enolase 4: phosphoglycerate kinase 5: glyceraldehyde 3-phosphate dehydrogenase 6: central glycolytic gene regulator Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Metabolic Predictions Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Comparative Genomics Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !50

Using the Core Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !51

between even related species. Our molecular picture of evolution for the past 20 years has been dominated by the small-subunit ribosomal RNA phylogentic tree analysed. Analyses of complete genome sequences have led to many recent suggestions that the extent of horizontal gene exchange is much greater than was previously realized10–12. For example, an Table 2 Genome features from 24 microbial genome sequencing projects Organism Genome size (Mbp) No. of ORFs (% coding) Unknown function Aeropyrum pernix K1 1.67 1,885 (89%) A. aeolicus VF5 1.50 1,749 (93%) 663 (44%) A. fulgidus 2.18 2,437 (92%) 1,315 B. subtilis 4.20 4,779 (87%) 1,722 B. burgdorferi 1.44 1,738 (88%) Chlamydia pneumoniae AR39 1.23 1,134 (90%) Chlamydia trachomatis MoPn 1.07 936 C. trachomatis serovar D 1.04 928 Deinococcus radiodurans 3.28 E. coli K-12-MG1655 4.60 H. influenzae H. pylori 26695 Unique ORFs 407 (27%) (54%) 641 (26%) (42%) 1,053 (26%) 1,132 (65%) 682 (39%) 543 (48%) 262 (23%) (91%) 353 (38%) 77 (8%) (92%) 290 (32%) 255 (29%) 3,187 (91%) 1,715 (54%) 1,001 (31%) 5,295 (88%) 1,632 (38%) 1,114 (26%) 1.83 1,738 (88%) 592 (35%) 237 (14%) 1.66 1,589 (91%) 744 (45%) 539 (33%) Methanobacterium thermotautotrophicum 1.75 2,008 (90%) 1,010 (54%) 496 (27%) Methanococcus jannaschii 1.66 1,783 (87%) 1,076 (62%) 525 (30%) M. tuberculosis CSU#93 4.41 4,275 (92%) 1,521 (39%) 606 (15%) M. genitalium 0.58 483 (91%) 173 (37%) 7 (2%) M. pneumoniae 0.81 680 (89%) 248 (37%) 67 (10%) N. meningitidis MC58 2.24 2,155 (83%) 856 (40%) 517 (24%) Pyrococcus horikoshii OT3 1.74 1,994 (91%) 859 (42%) 453 (22%) Rickettsia prowazekii Madrid E 1.11 878 (75%) 311 (37%) 209 (25%) Synechocystis sp. 3.57 4,003 (87%) 2,384 (75%) 1,426 (45%) T. maritima MSB8 1.86 1,879 (95%) 863 (46%) 373 (26%) T. pallidum 1.14 1,039 (93%) 461 (44%) 280 (27%) Vibrio cholerae El Tor N1696 800 4.03 3,890 (88%) 1,806 (46%) 934 (24%) 50.60 52,462 (89%) 22,358 (43%) 12,161 (23%) © 2000 Macmillan Magazines Ltd NATURE | VOL 406 | 17 AUGUST 2000 | www.nature.com Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

After the Genomes • Better analysis and annotation • Comparative genomics • Functional genomics (Experimental analysis of gene function on a genome scale) • Genome-wide gene expression studies • Proteomics • Genome wide genetic experiments Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Add a comment

Related presentations

Related pages

UC Davis EVE161 Lecture 10 by @phylogenomics - Education

1.Lecture 10:EVE 161: Microbial Phylogenomics !Lecture #10: Era III: Genome Sequencing ! UC Davis, Winter 2014 Instructor: Jonathan EisenSlides for UC ...
Read more

UC Davis EVE161 Lecture 16 by @phylogenomics - Education

Slides for Lecture 16 in EVE 161 Course by Jonathan Eisen at UC Davis. Docslide.us. Upload Login ... UC Davis EVE161 Lecture 16 by @phylogenomics ...
Read more

UC Davis EVE161 Lecture 11 by @phylogenomics - Education

1.Lecture 10:EVE 161: Microbial Phylogenomics !Lecture #10: Era III: Genome Sequencing ! UC Davis, Winter 2014 Instructor: Jonathan EisenSlides for UC ...
Read more

UC Davis EVE161 Lecture 13 by @phylogenomics - Education

1.Lecture 13:EVE 161: Microbial Phylogenomics !Lecture #13: Era III: Genome Sequencing and Phylogenomic Analysis ! UC Davis, Winter 2014 Instructor ...
Read more

EVE161 Class at UCDavis Winter 2014 Lecture 10 - Jonathan ...

Skip navigation Upload. Sign in
Read more

Uc Davis | LinkedIn

... UC Davis Medical Center. Sacramento, California Area. Hospital & Health Care. Current Plastic & Reconstructive Surgery Chief Resident at UC Davis ...
Read more

Lecture 9 SectionB BIS2C UC Davis Spring 2014 Jonathan ...

Lecture 9 Section B BIS2C UC Davis Spring 2014 Jonathan ... Lecture 10 SectionB BIS2C UC Davis Spring ... 2011 Sheffrin Lecture at UC Davis ...
Read more