October 20, 2008

AI & Molecular Biology:A Growing Success Story : AI & Molecular Biology:A Growing Success Story How has AI been successful in molecular biology? : How has AI been successful in molecular biology? Wide, daily use of AI-based tools by biologists Thriving AI/MolBio community Intelligent Systems for Molecular Biology (ISMB) conference now 11 years old, with >1,000 attendees Significant scientific publications, e.g. Successful businesses based on AI techniques http://www.medicalscientists.com Medical Scientists, Inc. : Medical Scientists, Inc. Predictive modeling in health care cost domain Patented multistrategy constructive induction algorithm Privately held and profitable. MolBio even creeping into mainstream AI : MolBio even creeping into mainstream AI KDD cup competition last two years involved learning in molecular biology domains TREC launched genomics track this year. AI Magazine special issue on MolBio in Spring ‘04 Why success in biology? : Why success in biology? Big open questions, e.g. Drug design, engineering novel organisms, evolution Rich sources of new information about life, e.g. Genome sequencing Expression array chips No common sense issues Everything anyone knows about MolBio is written down Significant community investment Biologists built a gene ontology, construct “curated” knowledge bases, and are eager consumers of software The Irony of AI & MolBio : The Irony of AI & MolBio Human understanding of the overwhelming complexity of our own genome will require partnership with biognostic machines What is a biognostic machine? : What is a biognostic machine? From the Greek(life) and(knowing) Two kinds of biognostic machines: Instruments that produce data about a living things in molecular detail and with genomic breadth Bioinformatics systems that bring to bear existing knowledge in the computational analysis of data Biognostic instruments : Biognostic instruments High throughput SNP genotyping automation Finds millions of tiny genetic differences among people Gene chips read out the expression of each gene in a tissue sample 10,000+ genes/chip anddozens of chips per study Drinking from a firehose : Drinking from a firehose 150 published genomes, 19 Eukaryotes (human, mouse, wheat, rice, fruit fly, etc.); 798 ongoing projects (243 Eukaryotes) 12,661,480 articles in MedLine; 12,824 new in the last week; 372 journals provide free full text (>100,000 full text articles) What AI technologiesare used in bioinformatics? : What AI technologiesare used in bioinformatics? Some of the key AI technologies that have been broadly adopted in computational biology: Hidden Markov Models Ontologies and related knowledge-based computation Clustering, e.g. Self-Organizing Maps Supervised learning, e.g. Support Vector Machines Information extraction / natural language parsing HMMs in molecular biology : HMMs in molecular biology HMMs (trained with E/M) are the main mechanism used to represent patterns in DNA and protein sequences The Gene Ontology : The Gene Ontology Actively developed, community curated ontology http://geneontology.org About 12,000 defined concepts, in a DAG with two link types (part-of, is-a) under three roots: Cellular component Biological process Molecular function. Used as annotations for genes (>80,000 so far), HMMs of domain patterns, etc. A closer look at a biognostic instrument : A closer look at a biognostic instrument Gene expression arrays (“gene chips”) Produces 10,000+ measurements/chip, generally 10s-100s of chips/experiment Huge computational challenges Many novel statistical and data management issues Interpretation of results can be overwhelming: must transcend “one gene at a time” methods. Linking data to prior knowledge is crucial. What is gene expression? : What is gene expression? Not all of the genes in a genome are used in all circumstances In order for a gene to play a role in a cell, it must be expressed. A gene is expressed when the protein it encodes is synthesized Transcription of DNA to mRNA is the first step in protein production Measuring abundance of mRNA assays the level gene expression Expression is central because... : Expression is central because... Differentiation: All cells in a body have the same genome. Expression is what differentiates, e.g. brain cells from liver. Physiology: Cells do their business (dividing, sending signals, digesting, etc.) largely via changes in expression Response to stimuli: Environmental changes (like drugs or disease) often cause changes in expression Disease markers and drug targets: changes in expression associated with disease can be diagnostic markers and/or suggest novel pharmaceutical approaches. Laboratory robotics, too… : Laboratory robotics, too… One form of expression array places controlled quantities (and shapes) of thousands of different DNA sequences on glass slides Statistical challenges! : Statistical challenges! Many basic tools for analysis of expression data (normalization, statistical tests, visualization, clustering) are open source in the R language, see http://bioconductor.org Novel approaches stillneeded, e.g. for multiple testing corrections, finding gene-gene interaction terms, etc. Clustering approaches : Clustering approaches Gene expression changes are coordinated, so levels should cluster meaningfully, but… Clusters change with situation (biclustering) Expression levels have complex correlational structure Distance measures unknown Approaches include SOMs (Slonim) PRMs (Koller & Friedman) Trajectory clustering Discrimination tasks : Discrimination tasks Given expression array results from e.g. tumors that were successfully treated vs. not, develop a predictive model High dimensionality,interactions, but Feature selection Support vector machines Interesting kernels! Meet FDA regulations? Understanding expression changes in context : Understanding expression changes in context Long lists of differentially expressed genes are difficult to interpret meaningfully Much knowledge about structure,function and interactions of genes Hundreds of public databaseshttp://nar.oupjournals.org/ Best information in the literature. Key computational challenge: Bring prior knowledge to bear on understanding expression (and other high-throughput) data Data integration : Data integration Just tracking down all of the information about a list of genes isn’t easy: Dozens of general and hundreds of specialized data sources available (many public & free) No universal IDs; Sometimes heuristic key matching is necessary to link data sources Inference is often required (e.g. about the applicability of information from a different species). Rapid change as new information becomes available Errors and inconsistencies abound. Semantic interpretation tools : Semantic interpretation tools Mapping gene lists to the Gene Ontology… Literature-based approaches : Literature-based approaches Many active areas of research: Information extraction to transform the biomedical literature into more computationally useful form Information retrieval and presentation: making large collections of relevant documents comprehensible Document meta-analysis: finding potential linkages among biomolecules from patterns of use in documents. Great resources: PubMed & NLM indexers (e.g. GeneRIFs) Growing full text repositories Meta-analysis for gene-gene interactions : Meta-analysis for gene-gene interactions Towards The Biological Knowledge-base : Towards The Biological Knowledge-base Inferential potential of a unified knowledge-base transcends human ability Even heroic bioscientists can’t keep up with flood of information as disciplinary boundaries break down. Integrated database search isn’t enough Semantic issues in integration Meta-analysis Making a compelling story from disparate bits of evidence A grand challenge for AI Minsky, AI & Common Sense : Minsky, AI & Common Sense Marvin Minksy in the August ’03 Wired on “Why AI is brain dead” “There is no computer that has common sense. We're only getting the kinds of things that are capable of making an airline reservation.” “The elderly segment of the population is growing to the point where there won't be enough doctors, nurses, and nurses' aides. We should be working to get robots to pick up the slack.” I think Marvin has the right diagnosis, but the wrong prescription… But AI isn’t psychology : But AI isn’t psychology AI should be about general principles of intelligence; people are just one example Turing test: Is this program indistinguishable from a person? Human idiosyncracies as the sine qua non of intelligence? My alternative approach: Is this a mind worth wanting to know? Also an approach to the “other minds” problem… Pharmacology as a test of intelligence? : Pharmacology as a test of intelligence? Making a contribution to inventing a new drug as a test for computational theories of intelligence Lots of existing, declarative background knowledge Clear metric for success: FDA approval Credit assignment exists (but note Hollywood accounting) $$$ and improvements in human health riding on it Reasonable incremental tasks: Passing graduate pharmacology exams… Making contributions to subtasks Pharmacology 101 : Pharmacology 101 Find a target: a naturallyoccurring molecule to beenhanced or inhibited Find a lead: a “drug-like”molecule that interacts specifically with the target Optimize: find a compound in the same family as the lead that is specific and effective enough to be a drug ADMET: absorption, distribution, metabolism, excretion, and toxicity Biognosticopoeia : Biognosticopoeia Our first steps… Integrate human-curated databases Exploit $10Ms + years of effort Requires dynamic and heuristic approaches Extend GO to many other relationships IE from literature using DMAP Explicit representation of procedural computation tasks IBM p690 w/ 8x Power4 processors & 64GB RAM “Lisp Machine” Come visit! : Come visit! The UCHSC Center for Computational Pharmacology, http://compbio.uchsc.edu International Society for Computational Biologyhttp://iscb.org Medical Scientists, Inc.http://medicalscientists.com Larry HunterLarry.Hunter@uchsc.edu

