HDF5 in Bioinformatics

50 %
50 %
Information about HDF5 in Bioinformatics

Published on February 18, 2014

Author: HDFEOS



DNA sequencing workflows can be very complex, and face a number of data management challenges. Typical workflows are characterized by diverse formats, highly redundant data, multiple levels of information, complex associations, repeated file processing, non-scalable storage, and lack of persistence. Recent work has investigated the use of HDF5 to manage such data.

Two strengths of HDF5 in particular are exploited in these studies: the ability of HDF5 to store and access very large arrays efficiently, and the ability of HDF5 to serve as a container for heterogeneous data. A possible data model was developed for describing the objects involved in a genome experiment, and some experiments were conducted to investigate the use of HDF5 for three applications. One is the use of HDF5 as a project file containing all data involved in a genome experiment. The second is for storing very large tables of haplotype data. The third is for creating, storing and accessing a very large "linkage disequilibrium" matrix.

Bioinformatics caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctcacaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgttgacattgcgacct aatacagcccagcaagcagaat Managing genomic data

DNA sequencing workflows • • • • • Diverse formats Redundant data Repeated file processing In-core processing models Lack of persistence

Multiple Levels of Information SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match

HDF5 as format for bioinformatics

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Bioinformatics publications using HDF5

Examples of HDF5 as Used in Bioinformatics. The title of each list item is a link to the company or project website. Applied Biosystems; The primary image ...
Read more

Bioinformatics at The HDF Group

Bioinformatics at The HDF Group. The HDF Group has a strong interest in the use of HDF5 in bioinformatics. The links to the left will direct you to several ...
Read more

HDF-EOS Tools and Information Center

HDF5 in Bioinformatics. DNA sequencing workflows can be very complex, and face a number of data management challenges. Typical workflows are characterized ...
Read more

Bloginar: Standardizing Bioinformatics with BioHDF (HDF5)

Section 1. The first section introduces HDF5 (Hierarchical Data Format) as a software platform for working with scientific data. The ...
Read more

Standardizing the Next Generation of Bioinformatics ...

Standardizing the next generation of bioinformatics software development ... For these reasons, HDF5 and its BioHDF extension are well suited for ...
Read more

BIOINFORMATICS Pages 1–2 - PMGenomics

BIOINFORMATICS Vol. 00 no. 00 2010 Pages 1–2 The Genomedata format for storing large-scale functional ... HDF5 1.8, and PyTables 2.1. USING GENOMEDATA
Read more

HDF5 - open model data - Google Sites

HDF5 is an open data format that is widely used in advanced scientific fields such as particle physics and bioinformatics. It is very feature-rich, well ...
Read more

rhdf5 - HDF5 interface for R - Bioconductor

rhdf5 - HDF5 interface for R Bernd Fischer May 3, 2016 Contents 1 Introduction 1 2 Installation of the HDF5 package1 3 High level R -HDF5 functions1
Read more

Bioconductor - rhdf5

HDF5 interface to R. ... The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, ...
Read more