advertisement

Indy2007 Yunlong

50 %
50 %
advertisement
Information about Indy2007 Yunlong
Education

Published on January 21, 2008

Author: Rinald

Source: authorstream.com

advertisement

High-throughput Technologies in Genomic Studies:  High-throughput Technologies in Genomic Studies 2007 Indy Bioinformatics Conference Pre-meeting Workshop Speaker: Yunlong Liu Division of Biostatistics The Center for Computational Biology and Bioinformatics Center for Medical Genomics Indiana University School of Medicine Outline:  Outline DNA pre-mRNA Protein 1. transcription 3. translation mRNA 2. RNA processing Figures from: 1. http://web.uconn.edu/mcb201, 2. Cheng, N Engl J Med, 2005 Topics: Expression microarray ChIP-on-chip analysis Promoter array, CpG island array, Methylation array… Tiling array High throughput sequencing microRNA microarray Fishing expeditions vs. hypothesis-driven:  Fishing expeditions vs. hypothesis-driven “It (the human genome project) was no more than a big fishing expedition, a mindless factory project that no scientists in their right minds would join.” Data- and technology-driven studies are not alternatives to hypothesis-driven studies, but are complimentary and iterative partners with them. Hypothesis/data-driven research:  Hypothesis/data-driven research Kitano, Science 2002: Vol. 295. no. 5560, pp. 1662 - 1664 Microarray technology (expression array):  Microarray technology (expression array) Microarray platforms Microarray data analysis Biological question:  Biological question Measuring global gene expression pattern in a genome-wide scale Identify differentially expressed genes before and after certain biological perturbation; Compare gene expression profiles of two or multiple samples, such as normal tissue vs. cancerous tissue. Two-color array vs. one-color array:  Two-color array vs. one-color array Microarray data analysis:  Microarray data analysis control experiment/treatment Microarray experiment: Biological sample selection Biological replicates, # of replicates Sample pooling Array selection - Affymetrix arrays, cDNA arrays, … … Statistical analysis: Fold change p-value FDR (multiple hypothesis testing) … Results interpretation: Gene ontology/KEGG pathway Network analysis (Ingenuity …) Gene set enrichment analysis (GSEA) MotifModeler … Gene Set Enrichment Analysis:  Gene Set Enrichment Analysis No individual gene meets the threshold for statistical significance; Too many genes without underlying biological scheme; Reproducibility of the list of genes. Single gene analysis may miss important effects on pathways; Determination of the cut-off points for statistical criteria is difficult. p = 0.01 good? How about p=0.011? Gene sets: groups of genes that share common biological function, chromosomal location, or regulation. Subramanian, et al. PNAS, 2005 Developed by the Broad Institute of MIT Cluster analysis:  Cluster analysis Goal: Looking for biomarkers for a specific disease; Group the genes with similar function Group the samples with similar gene expression profiles Co-expression of genes of known function with novel genes may provide leads to functions for unknowns Software Bioconductor Matlab Cluster 3.0 (http://www.geo.vu.nl/~huik/) Eisen et al. PNAS 1998 Clustered display of data from time course of serum stimulation of primary human fibroblasts. (A) cholesterol biosynthesis, (B) the cell cycle, (C) the immediate-early response, (D) signaling and angiogenesis, and (E) wound healing and tissue remodeling. Transcriptional mechanisms:  Transcriptional mechanisms Motif Finding tools Searching for over-represented motifs in the promoter of co-regulated genes MEME, Consensus, MDScan, … Review and comparison in Tompa et al. Nature Biotech.  23, 137-144 (2005) Evolutionarily conserved motifs Expression patterns REDUCE, MotifModeler, … MotifModeler (www.motifmodeler.org):  MotifModeler (www.motifmodeler.org) Aim: Effect of alcohol exposure on the fetal alcohol syndrome (FAS) – related abnormalities during critical periods of brain development; Model system: Whole mouse embryos, culture with ethanol for 44 hours beginning on gestational day 8.25 Gene: 269 (94 up-regulated + 175 down-regulated), p<0.05, FC>±1.5 Figure from Zhou et al. Journal of Molecular Neuroscience, 2004 Through collaboration with Dr. Feng Zhou, Department of Anatomy and Cell Biology, IUSM Question: Which set of transcription factors are responsible for this change? Does microRNA play a role in the alcohol-induced gene expression alteration? If it does, who are they? MotifModeler:  MotifModeler Comparing with 741 PWMs in TRANSFAC Comparing with 375 documented mouse microRNAs in microRNA registry microRNAs that contain perfect match with predicted 7-bp motif microRNAs that contain matches with two predicted 7-bp motif (one G-U pair allowed) Wang G. et al. Biocomp, 2007 MotifModeler:  MotifModeler Not likely to be caused by random (1/3 genes up-regulated and 2/3 genes down-regulated) Alcohol disrupts the function of transcription factors to initiate transcription Alcohol disrupts the function of microRNAs to degradate mRNA Further hypothesis: are the up-regulated genes the ones that express in the previous developmental stage and should be shut down here at E8.25. But they were not properly shut down due to disabled function of microRNA under alcohol treatment? Estimated functions of predicted motifs Wang G. et al. Biocomp, 2007 Microarray data analysis:  Microarray data analysis control experiment/treatment Microarray experiment: Biological sample selection Biological replicates, # of replicates Sample pooling Array selection - Affymetrix arrays, cDNA arrays, … … Statistical analysis: Fold change p-value FDR (multiple hypothesis testing) … Results interpretation: Gene ontology/KEGG pathway Network analysis (Ingenuity …) Gene set enrichment analysis (GSEA) MotifModeler … ChIP-on-chip:  Chromatin Immuno-Precipitation (ChIP) on microarray technology (chip) Biological question: Genome-wide identification of binding site of DNA-binding proteins in vivo DIP chip – DNA immunoprecipation microarray (in vitro) Applications Transcription factors and other regulatory proteins Histone modification DNA methylation Similar assay can be used to study RNA binding protein RIP-chip (RNA-immunoprecipitation microarray) CLIP-chip (Cross-linked immunoprecipitation microarray) ChIP-on-chip ChIP-on-chip (wet lab portion):  ChIP-on-chip (wet lab portion) Figure from Wikipedia.com ChIP-on-chip (dry lab):  ChIP-on-chip (dry lab) Two important components in dry lab Signal extraction (statistics) – figure out the genomic enriched regions Mann–Whitney U test Welch t test, following Hidden Markov Model TileMap (Stanford Univ) TiMAT (Lawrence Berkeley National Lab) MAT (Model-based Analysis of Tiling array, Harvard Univ) Informatics extraction Figure from Wikipedia.com ChIP-on-chip (one example):  ChIP-on-chip (one example) Reference: Carroll et al. Genome-wide analysis of estrogen receptor binding sites, Nature Genetics (2006) Biological question: Genome-wide analysis of estrogen receptor binding sites in breast cancer cell lines (MCF7 cells) Estrogen (E2: estradiol) affect gene expression in one of three ways Direct binding on promoter* Indirect binding on promoter* Signaling pathways What are the target genes? Bioinformatics Biochemistry ChIP-on-chip (one example):  ChIP-on-chip (one example) Reference: Carroll et al. Genome-wide analysis of estrogen receptor binding sites, Nature Genetics (2006) Array platforms for ChIP-on-chip:  Array platforms for ChIP-on-chip Whole genome tiling array 7 arrays for affy (2.0), resolution: 25bp every 35bp 38 arrays for NimbleGen, resolution: 50 bp with spacing ~100bp Promoter array CpG island array NimbleGen workflow ChIP-on-chip analysis:  ChIP-on-chip analysis Signal extraction (Statistics) Informatics extraction No good open source software is available yet. Pyro-sequencing technology:  Pyro-sequencing technology Massively Parallel Pyro-sequencing Sequence more than 20 million bases per 4.5-hour instrument run. It can sequence a typical bacterial genome in days with one person - without cloning and colony picking. Short DNA fragments Each fragment is ~100bp (GS20-454), newer version is ~250bp Genome assembly is a big deal Study protein-DNA/RNA interaction by follow up immunoprecipitation-based assays Posters to look at: 28, 36, 59 ChIP-pyrosequencing:  ChIP-pyrosequencing Comparing with tiling array technology Detection is not restricted by array platform Potentially identify signals in the repetitive regions 32% of p53 have ERV1 19% of Oct4-Sox2 have ERVK 18% of ER have MIR repeats one example of studying RNA-binding protein:  one example of studying RNA-binding protein LLLLLLLNNNNNN……NNNNNNLLLLLLL Length with linkers 152,952 fragments RNA fragments:  RNA fragments HMGN2 (high-mobility group nucleosomal binding domain) Chromosome 1 17 fragments Unpublished results, PI: Sanford Improve the detection resolution?:  Improve the detection resolution? RNase digestion is sequence specific Can we improve detection resolution under such circumstances? amplicons RNA transcripts amplicons RNA transcripts Wang X. et al. Biocomp, 2007 Yes, we have hope!:  Yes, we have hope! By using the likelihood of RNase digestion within local genomic region All the detected amplicons should: Encompass protein binding site 50-70 nt in length Wang X. et al. Biocomp, 2007 Improve the detection resolution?:  Improve the detection resolution? Wang X. et al. Biocomp, 2007 Pyro-sequencing technology:  Pyro-sequencing technology Signal extraction (Statistics) Informatics extraction No good algorithm and open source software is available yet. microRNA microarray:  microRNA microarray Aim: Measure global microRNA expression profiles. Platforms Agilent (human microRNA) Illumina Invitrogen (human, plus additional human predicted miRNAs, mouse, rat, Drosophila, C.elegans, and zebrafish) miRCURY™ LNA Array Customized arrays pyrosequencing (Nature Methods - 4, 2007) microRNA:  microRNA Small non-coding RNA (20~25nt) Endogeneous Accounts for ~1% of genome 326 human microRNAs documented 234 are experimentally evaluated (microRNA registry 7.1) Function – silencing genes Increase mRNA degradation Inhibit translation microRNA-dependent mRNA localization Combinatorial regulation Figure from Cheng, N Engl J Med, 2005 Hierarchical clustering of miRNA expression:  Hierarchical clustering of miRNA expression Lu et al. MicroRNA expression profiles classify human cancers Nature 2005 Conclusion: microRNA microarrays are more effective in cancer classification than mRNA microarrays containing more than 16,000 protein-coding genes. microRNA microarray:  microRNA microarray Bioinformatics analysis of microRNA microarray data Signal extraction (statistics) Informatics extraction Combine multiple domain information including Expression array data Tiling array data Protein-protein interaction information … Outline:  Outline DNA pre-mRNA Protein 1. transcription 3. translation mRNA 2. RNA processing Figures from: 1. http://web.uconn.edu/mcb201, 2. Cheng, N Engl J Med, 2005 Topics: Expression microarray ChIP-on-chip analysis Promoter array, CpG island array, Methylation array… Tiling array High throughput sequencing microRNA microarray Interesting stuff I haven’t mentioned:  Interesting stuff I haven’t mentioned Copy number variation Exon arrays to study alternative splicing Chr 1 Public data set:  Public data set Figures from wikipedia.com Hypothesis/data-driven research:  Hypothesis/data-driven research Kitano, Science 2002: Vol. 295. no. 5560, pp. 1662 - 1664

Add a comment

Related presentations