Metagenomics and cloud_computing_london_january_2014_era7_bioinformatics

25 %
75 %
Information about Metagenomics and cloud_computing_london_january_2014_era7_bioinformatics
Health & Medicine

Published on January 18, 2014

Author: era7bioinformatics

Source: slideshare.net

Description

Traditional microbial genome sequencing relies upon clonal cultures, but the new era of genomics is facing a new challenge: the metagenomics analysis. In the next few years it is probable that metagenomics will be used in clinical diagnostic settings. Thus, metagenomics has the potential to revolutionize pathogen detection in public health laboratories by allowing the simultaneous detection of all microorganisms in a clinical sample. For viruses, unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. The use of metagenomics for virus discovery in clinical samples has opened new opportunities for understanding the aetiology of unexplained illness. For bacteria, it should be reminded that only a small fraction of the phylogenetic diversity of Bacteria and Archaea is represented by cultivated organisms. Hence, metagenomics will probably serve to identify new pathogens, and new infections caused by consortiums. In chronic infections metagenomics will give us information about the relevance of biofilms and other bacterial organizations that would be important in such infections. As an example, metagenomics for Mycobacterium infections have demonstrated undetected, plural, strains in the same patient. Microbiome analysis has been one of the most important applications of metagenomics.
Two major strategies have been applied in the past years for bacterial metagenomics: 16S and shotgun metagenomics. 16S metagenomics tells us about microbial diversity and relative abundance of species and taxa. Shotgun metagenomics is a much more massive approach able to inform about the functional profile of the different genes present in the sample and even to obtain assembled genomes if the sample is not very complex.
Metagenomics has brought new challenges to bioinformatics. Cloud computing can solve the problem of massive data analysis providing scalable, real time, on demand computing for metagenomics data analysis. However, Cloud Computing infrastructure is not easy to manage and publicly available software solutions would be needed to extend the use of cloud for the analysis of huge metagenomics data sets.
MG7 is a new system for analysis of reads from metagenomics based on the use of cloud computing for the parallel computation of the BLAST similarity in which is based the inference of function and the assignment of taxonomic origin. A special peculiarity of MG7 system is the utilization of a non relational model database. MG7 uses a graph database to store the results of the analysis and to facilitate the querying and the access to the data organized in the hierarchic structure of the taxonomy tree. MG7 is an open source project that is licensed under AGPLV3 license.

A New Era in Diagnostic Microbiology Pathogen Genomics. Whole Genome Sequencing 15 January 2014. The Royal College of Pathologists. A New Cloud Computing System for Massive Analysis of Reads from Metagenomics Samples http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 A New Cloud Computing System for Massive Analysis of Reads from Metagenomics Samples - A bit of context: - The metagenomics bioinformatics challenge: • What is Era7 • High computational cost • What is Oh no sequences! Research group • Bining for reducing computation • Research lines / Research projects • Reducing reference database - Clonal cultures versus Metagenomics - Microbiome - MG7 - Microbiome in health and disease • Cloud computing - Metagenomics in a clinical sample • MG7 algorithms and pipeline - 16S and shotgun metagenomics • Lowest Common Ancestor assignment - Metagenomics for detection of viruses • MG7 uses Graph databases - Metagenomics for detection of bacteria • MG7 uses NCBI taxonomy tree MG7 for metagenomics analysis

The Royal College of Pathologists 15 January 2014 A bit of context http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 What is Era7 Bioinformatics http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 • • • • Research driven SME Open Source Cloud Computing Next Generation Sequencing http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 • • • • • • • Bacterial Genomics projects Comparative Genomics Metagenomics Microbiome RNA-seq (and Dual RNA-seq) Cancer Genomics Big Data management and integration http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 What is Era7 Oh no sequences! http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 A New Cloud Computing System for Massive Analysis of Reads from Metagenomics Samples Research Lines: Software Research Ptojects • BG7 • Algorithms for assembly • Bio4j • Methods for bacterial genome annotation • Nextmicro • New Cloud Computing Architectures • Statika • Graph Databases for Biological data • Nispero • Comparative genomics and bacterial evolution • Genome Plasticity • Big Data integration and visualization • Host Immune System and infection • MG7 (All of them are Open Source AGPLv3 projects) MG7 for metagenomics analysis

The Royal College of Pathologists 15 January 2014 Traditional microbial genome sequencing relies upon clonal cultures, but the new era of genomics is facing a new challenge: the metagenomics analysis http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Microbiome analysis is possible by metagenomics approaches. • • • • Health and Disease Therapeutic Interventions Transplant Immune system http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Microbiome in Health and Disease • • • • • Inflamatory Bowel Disease Diabetes Obesity Cardiovascular Disease Colon Cancer http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Modifying the Microbiome • Prebiotics • Probiotics • Microbiome Transplant (Clostridium Difficile) http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 For bacteria, it should be reminded that only a small fraction of the phylogenetic diversity of Bacteria and Archaea is represented by cultivated organisms http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Metagenomics has the potential to revolutionize pathogen detection in public health laboratories by allowing the simultaneous detection of all microorganisms in a clinical sample http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Metagenomic analysis after PCR amplification of different gene regions Shotgun Metagenomics http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Metagenomic analysis after PCR amplification of different gene regions: • 16S rRNA • • • • • Gyrase Ribosomal proteins Elongation Fctors RNA Polymerase ………. 16S metagenomics tells us about microbial diversity and relative abundance of species and taxa http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Shotgun Metagenomics Shotgun metagenomics is a much more massive approach able to inform about the functional profile of the different genes present in the sample and even to obtain assembled genomes if the sample is not very complex http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 Thechnology • 454 in the past • illumina today (approaches overlaping paired reads) • Preprocessing steps very important http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 For viruses: Unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. The use of metagenomics for virus discovery in clinical samples has opened new opportunities for understanding the aetiology of unexplained illness http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 For Bacteria: Metagenomics will probably serve to identify new pathogens, and new infections caused by consortiums. In chronic infections metagenomics will give us information about the relevance of biofilms and other bacterial organizations that would be important in such infections.. Microbiome analysis has been one of the most important applications of metagenomics. http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 For Bacteria: As an example, metagenomics for Mycobacterium infections have demonstrated undetected, plural, strains in the same patient http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 1. One approach is to reduce the need of computation 2. The other is to be more efficient http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 1. Reducing the computation • Binning (clustering) the reads 16S and Shotgun. Operational Taxonomic Units (OTUs) in 16S http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 1. Reducing the computation • Reducing the size of the reference database: It is frequent to use only the complete bacterial genomes Shotgun http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 2. The other is to be more efficient: http://ohnosequences.com MG7 www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Cloud computing can solve the problem of massive data analysis providing scalable, real time, on demand computing for metagenomics data analysis. However, Cloud Computing infrastructure is not easy to manage and publicly available software solutions would be needed to extend the use of cloud for the analysis of huge metagenomics data sets. http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 • • • • • Based in Cloud Computing (AWS) Parallel computation Each read is compared with the complete database: • No binning, all the reads • All the known sequences (nt database) for shotgun NCBI taxonomy Graph database for analyzing the assignment results http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Based in Cloud Computing (AWS) • • • • • EC2 S3 SQS SNS …… http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Based in Cloud Computing (AWS) parallel computation • • • A Cloud Master machine creates tasks and set Qeues A set (hundreds, it could be thousands) of Cloud instances (usually micro cloud EC2 instances) are launched After the parallel computation, results are modeled in a graph database. This allows to further analysis http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 https://github.com/pablopareja/MG7/wiki http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 https://github.com/pablopareja/MG7/wiki http://ohnosequences.com Data Model for the Graph DatabaseNeo4j www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Based in Cloud Computing (AWS) • Storage , another challenge. AWS Cloud is very useful: • S3 for inmediate access • Glacier for archiving . http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Each read is compared with the complete database: • Direct Assignment Best Blast Hit It can be done by: • E value • Depending on similarity % and length of the hit • Lowest Common Ancestor http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor First step: We start from a set of nodes with an arbitrary length – 4 in this sample, which are spread through the taxonomy tree http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Second step: We fetch then the first node from the set and calculate its whole ancestor list to the main root of the taxonomy. http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Third step: Now that we have the list, we take the second node of the set and check if it’s contained in it, if not, we keep going up through its ancestors until we find a marked node. Once it has been found, we get rid of the previous elements in the list (if any) so that they are not taken into account for the next iterations in the algorithm. http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Fourth step: We keep going trough our node set, and node C also removes some elements of the list… http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Fifth step: Finally we reach the last node of our set, but no element is removed from our list as a result. http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Here we have our lowest common ancestor! http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 All the known sequences (nt database) for shotgun Nt database is the largest nucleotide database. It contains nucleotide sequences from all the organisms. This is important to detect: • • Unexpected organism Contamination http://ohnosequences.com www.era7bioinformatics.com

The Royal College of Pathologists 15 January 2014 MG7 NCBI taxonomy This Taxonomy is probably the best and most comprehensive A Graph Database is very appropriate to model a Taxonomy tree http://ohnosequences.com www.era7bioinformatics.com

Thanks for your attention! Marina Manrique Eduardo Pareja-Tobes Pablo Pareja-Tobes Raquel Tobes Eduardo Pareja epareja@era7.com http://ohnosequences.com www.era7bioinformatics.com

Add a comment

Related presentations

Related pages

Metagenomics and cloud_computing_london_january_2014_era7 ...

Home Health & Medicine Metagenomics and cloud_computing_london_january_2014_era7_bioinformatics 3. The Royal College of Pathologists 15 January 2014A bit ...
Read more

Future of metagenomics - Technology - documents.mx

METAGENOMICS BY NGS: ... Metagenomics and cloud_computing_london_january_2014_era7_bioinformatics. Metagenomics and Industrial Application. Future of the ...
Read more