Biological Database Systems

70 %
30 %
Information about Biological Database Systems

Published on November 20, 2007

Author: denshe

Source: slideshare.net

Biological Database Systems Denis Shestakov, University of Turku/Tampere

Course Information Course structure: Lectures: approx. 12 (plus today’s intro and review lecture in the end of the course) Project work: details will be given next time Exam: easy to pass if project is done URL:

Course structure:

Lectures: approx. 12 (plus today’s intro and review lecture in the end of the course)

Project work: details will be given next time

Exam: easy to pass if project is done

URL:

Course Information Dates: Period 2: 27.11, 4.12, 11.12 Period 3: 10 meetings on Mondays/Wednesdays Contact info: Email: ICT, B6019: at 15-18 on Tuesdays

Dates:

Period 2: 27.11, 4.12, 11.12

Period 3: 10 meetings on Mondays/Wednesdays

Contact info:

Email:

ICT, B6019: at 15-18 on Tuesdays

Course Information: Literature Slides References in the end of slides Books: Bioinformatics: Managing Scientific Data by Lacroix & Critchlow, Morgan Kaufmann, 2003 ISBN-10: 155860829X Database Systems Concepts, 5 th edition by Silbershatz, Korth & Sudarshan, McGraw-Hill, 2005 ISBN-10: 0072958863 Articles: Biological database design and implementation by Birney & Clamp (the Ensembl project), Briefings in Bioinformatics, 5(1):31-38, 2004

Slides

References in the end of slides

Books:

Bioinformatics: Managing Scientific Data by Lacroix & Critchlow, Morgan Kaufmann, 2003 ISBN-10: 155860829X

Database Systems Concepts, 5 th edition by Silbershatz, Korth & Sudarshan, McGraw-Hill, 2005 ISBN-10: 0072958863

Articles:

Biological database design and implementation by Birney & Clamp (the Ensembl project), Briefings in Bioinformatics, 5(1):31-38, 2004

Biological Database Systems 1.1. Course Content 1.2. Course Objectives 1.3. Database and DBMS 1.4. Biological Databases

Course content: main topics Database concepts, database design process Relational data model Introduction to SQL XML and XML-based databases Data structures for biological data: storage and querying Model organism databases

Database concepts, database design process

Relational data model

Introduction to SQL

XML and XML-based databases

Data structures for biological data: storage and querying

Model organism databases

Course content: main topics LIMS, BioPostgres Analysis workflows, web services Integration of biological data Integration of biological data, example of integration system Research issues in scientific databases * Project discussion, exam preparation

LIMS, BioPostgres

Analysis workflows, web services

Integration of biological data

Integration of biological data, example of integration system

Research issues in scientific databases

* Project discussion, exam preparation

Course focus Database issues: Biology -specific Representation of biological data Design of biological databases NOT about: Usage of existing databases Accessing/retrieving data from bio-databases

Database issues:

Biology -specific

Representation of biological data

Design of biological databases

NOT about:

Usage of existing databases

Accessing/retrieving data from bio-databases

Course goal Give basic knowledge of biological* database design * - for molecular biology

Give basic knowledge of biological* database design

Do you need to know that? Work in “wet” laboratory: One bioinformatician and many biologists Likely to be IT guru for others Expect to answer IT-related questions Work in bioinformatics lab: Many bioinformaticians Group may maintain several dbs Basics are helpful Create/maintain biological databases Start learning! Ask for more information

Work in “wet” laboratory:

One bioinformatician and many biologists

Likely to be IT guru for others

Expect to answer IT-related questions

Work in bioinformatics lab:

Many bioinformaticians

Group may maintain several dbs

Basics are helpful

Create/maintain biological databases

Start learning!

Ask for more information

Database? From Merriam-Webster dictionary: (http://www.merriam-webster.com/dictionary/database)

Database? A collection of data: structured searchable (i.e., indexable) updated cross-referenced Objective: Transform “meaningless” raw data into useful information which can be accessed and analysed in the best way Data b ase Management System (DBMS): software designed for the purpose of managing databases (access, insert, delete, update, etc.)

A collection of data:

structured

searchable (i.e., indexable)

updated

cross-referenced

Objective:

Transform “meaningless” raw data into useful information which can be accessed and analysed in the best way

Data b ase Management System (DBMS):

software designed for the purpose of managing databases (access, insert, delete, update, etc.)

DBMS A set of tools that: Store Extract Modify Database Store Extract Modify USERS

A set of tools that:

Store

Extract

Modify

Biological Databases? Explosive growth in biological data E.g., tremendous increase in nucleotide sequences (first increase in data due to the polymerase chain reaction (PCR) technique development in 1983) 1980: 80 genes fully sequenced …

Explosive growth in biological data

E.g., tremendous increase in nucleotide sequences (first increase in data due to the polymerase chain reaction (PCR) technique development in 1983)

1980: 80 genes fully sequenced



Biological Databases? EMBL Database Growth: Total nucleotides (Nov 07: 188,490,792,445 ) Number of entries (Nov 07: 106,144,026 )

EMBL Database Growth:

Biological Databases? Data (genomic sequences, 3D structures, 2D gel analysis, microarrays….) directly submitted to databases Essential tools for biological research, like reading relevant literature

Data (genomic sequences, 3D structures, 2D gel analysis, microarrays….) directly submitted to databases

Essential tools for biological research, like reading relevant literature

Biological Databases: History 1965 Margaret Dayhoff et al. publish “Atlas of Protein Sequences and Structures” 1982 EMBL initiates DNA sequence databases, followed within a year by GenBank and in 1984 by the DNA Database of Japan 1988 EMBL/GenBank/DDBJ agree on common format for data elements

1965

Margaret Dayhoff et al. publish “Atlas of Protein Sequences and Structures”

1982

EMBL initiates DNA sequence databases, followed within a year by GenBank and in 1984 by the DNA Database of Japan

1988

EMBL/GenBank/DDBJ agree on common format for data elements

Biological Databases: some statistics More than 1000 different databases 968 databases reported in The Molecular Biology Database Collection: 2007 update by Galperin, Nucleic Acids Research, 2007, Vol. 35, Database issue D3-D4 Metabase: database of biological databases, http://biodatabase.org/index.php/Main_Page Database sizes: <100kB to >100GB (EMBL >500GB) DNA: >100GB Protein: 1GB 3D structure: 5GB Update frequency: daily to annyally Freely accessible (as a rule)

More than 1000 different databases

968 databases reported in The Molecular Biology Database Collection: 2007 update by Galperin, Nucleic Acids Research, 2007, Vol. 35, Database issue D3-D4

Metabase: database of biological databases, http://biodatabase.org/index.php/Main_Page

Database sizes: <100kB to >100GB (EMBL >500GB)

DNA: >100GB

Protein: 1GB

3D structure: 5GB

Update frequency: daily to annyally

Freely accessible (as a rule)

Some databases in the field of molecular biology AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD, Beanref, Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc … Find more at http://biodatabase.org

AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb,

ARR, AsDb, BBDB, BCGD, Beanref, Biolmage,

BioMagResBank, BIOMDB, BLOCKS, BovGBASE,

BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,

CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,

ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,

CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,

Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,

ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,

ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,

GCRDB, GDB, GENATLAS, Genbank, GeneCards,

Genline, GenLink, GENOTK, GenProtEC, GIFTS,

GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,

HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,

HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,

HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,

KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,

Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5

Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,

MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,

OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,

PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,

PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,

PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,

SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,

SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,

SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-

MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,

TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,

VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,

YPM, etc …

Categories of Biological Databases Nucleotide sequences Genomics Mutation/polymorphism Protein seqiences Protein domain/family Proteomics (2D gel, MS)

Nucleotide sequences

Genomics

Mutation/polymorphism

Protein seqiences

Protein domain/family

Proteomics (2D gel, MS)

Categories of Biological Databases Microarray Organism-specific 3D structure Metabolism Bibliography Others

Microarray

Organism-specific

3D structure

Metabolism

Bibliography

Others

Categories of Biological Databases Microarray Organism-specific 3D structure Metabolism Bibliography Others

Microarray

Organism-specific

3D structure

Metabolism

Bibliography

Others

Biological Databases: special features Autonomous: many independent maintainers Heterogeneous data formats: e.g., various data formats for the same data elements Dynamic: frequent and continous changes in data content (and, more importnatly, in data schema) Broad domain knowledge Workflow-oriented: databases + rich set of analysis tools Information integration is essential: aggregate data from several databases

Autonomous: many independent maintainers

Heterogeneous data formats: e.g., various data formats for the same data elements

Dynamic: frequent and continous changes in data content (and, more importnatly, in data schema)

Broad domain knowledge

Workflow-oriented: databases + rich set of analysis tools

Information integration is essential: aggregate data from several databases

Biological Databases: integration Figure is taken from Bioinformatics: Managing Scientific Data by Lacroix & Critchlow, p.20

Add a comment

Related presentations

Related pages

Biological Databases - charite.de

Outline biological databases I. Introduction & Overview II.Examples III. Sequence alignment & fragment search IV. Database tools and implementation
Read more

List of biological databases - Wikipedia, the free ...

List of biological databases ... Biological databases are stores of biological information. ... Barcode of Life Data Systems, a database of DNA barcodes;
Read more

Biological system - Wikipedia, the free encyclopedia

A biological system is a complex network of biologically relevant entities. As biological organization spans several scales, examples of biological systems ...
Read more

Biological Database Systems - PowerPoint PPT Presentation

(in case of relational DBMS): most tables store information about domain of your ... be on a linked page and make clickable icons convey their function ...
Read more

Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis M. Madan Babu, Center for Biotechnology, Anna University, Chennai – 25, India Introduction
Read more

Chemical Effects in Biological Systems (CEBS)

Chemical Effects in Biological Systems (CEBS) The CEBS database houses data of interest to environmental health scientists. CEBS is a public ...
Read more

Biological Databases for Gene Expression, Pathway & NGS ...

BIOBASE Biological Databases Biological databases for gene expression, ... POSSUMweb is a dysmorphology database of syndromes including multiple ...
Read more

Home - BioSystems - NCBI

The NCBI BioSystems Database provides integrated access to biological systems and their component genes, proteins, and small molecules, as well as ...
Read more

Biological Database Systems and Data Mining - UT Dallas ...

CS6379 - Biological Database Systems and Data Mining. CS 6379 Biological Database Systems and Data Mining (3 semester credit hours) Relational data models ...
Read more