icml kdd2003

50 %
50 %
Information about icml kdd2003

Published on September 27, 2007

Author: Talya

Source: authorstream.com

Statistical Learning from Relational Data:  Statistical Learning from Relational Data Daphne Koller Stanford University Joint work with many many people Relational Data is Everywhere:  Relational Data is Everywhere The web Webpages (& the entities they represent), hyperlinks Social networks People, institutions, friendship links Biological data Genes, proteins, interactions, regulation Bibliometrics Papers, authors, journals, citations Corporate databases Customers, products, transactions Relational Data is Different:  Relational Data is Different Data instances not independent Topics of linked webpages are correlated Data instances are not identically distributed: Heterogeneous instances (papers, authors) No IID assumption  This is a good thing  New Learning Tasks:  New Learning Tasks Collective classification of related instances Labeling an entire website of related webpages Relational clustering Finding coherent clusters in the genome Link prediction & classification Predicting when two people are likely to be friends Pattern detection in network of related objects Finding groups (research groups, terrorist groups) Probabilistic Models :  Probabilistic Models Uncertainty model: space of “possible worlds”; probability distribution over this space. Worlds: often defined via a set of state variables medical diagnosis: diseases, symptoms, findings, … each world: an assignment of values to variables Number of worlds is exponential in # of vars 2n if we have n binary variables Outline:  Outline Relational Bayesian networks* Relational Markov networks Collective Classification Relational clustering * with Avi Pfeffer, Nir Friedman, Lise Getoor Bayesian Networks:  Bayesian Networks nodes = variables edges = direct influence Graph structure encodes independence assumptions: Letter conditionally independent of Intelligence given Grade Job Grade SAT Intelligence Difficulty BN semantics:  BN semantics Compact & natural representation: nodes have  k parents  2kn vs. 2n params parameters natural and easy to elicit conditional independencies in BN structure + local probability models full joint distribution over domain = Reasoning using BNs:  Full joint distribution specifies answer to any query: P(variable | evidence about others) Reasoning using BNs Letter Grade SAT Intelligence Difficulty Letter SAT Bayesian Networks: Problem:  Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Difficulty Grade A C These “instances” are not independent Relational Schema:  Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Teach Student Intelligence Course Difficulty Professor Teaching-Ability In Take Classes Relations Attributes St. Nordaf University:  St. Nordaf University Teaches Teaches In-course In-course Registered In-course Prof. Smith Prof. Jones George Jane Welcome to CS101 Registered Registered Grade Grade Grade Satisfac Satisfac Satisfac World  Relational Bayesian Networks:  Relational Bayesian Networks Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions [K. & Pfeffer; Poole; Ngo & Haddawy] RBN Semantics:  Prof. Smith Prof. Jones Welcome to CS101 RBN Semantics George Jane Ground model: variables: attributes of all objects dependencies: determined by relational links & template model Welcome to CS101 The Web of Influence:  low / high The Web of Influence easy / hard Likelihood Function:  Likelihood Function Likelihood of a BN with shared parameters Joint likelihood is a product of likelihood terms One for each attribute X.A and its family For each X.A, the likelihood function aggregates counts from all occurrences x.A in world  [Friedman, Getoor, K., Pfeffer, 1999] Likelihood Function: Multinomials:  Likelihood Function: Multinomials Sufficient statistics: Log-likelihood: RBN Parameter Estimation:  RBN Parameter Estimation MLE parameters: Bayesian estimation: Prior for each attribute X.A Posterior uses aggregated sufficient statistics aggregated sufficient statistics Learning RBN Structure:  Learning RBN Structure Define set of legal RBN structures Ones with legal class dependency graphs Define scoring function  Bayesian score Product of family scores: One for each X.A Uses aggregated sufficient statistics Search for high-scoring legal structure [Friedman, Getoor, K., Pfeffer, 1999] Learning RBN Structure:  Learning RBN Structure All operations done at class level Dependency structure = parents for X.A Acyclicity checked using class dependency graph Score computed at class level Individual objects only contribute to sufficient statistics Can be obtained efficiently using standard DB queries Outline:  Outline Relational Bayesian networks* Relational Markov networks† Collective Classification Relational clustering * with Avi Pfeffer, Nir Friedman, Lise Getoor † with Ben Taskar, Pieter Abbeel Why Undirected Models?:  Why Undirected Models? Symmetric, non-causal interactions E.g., web: categories of linked pages are correlated Cannot introduce direct edges because of cycles Patterns involving multiple entities E.g., web: “triangle” patterns Directed edges not appropriate “Solution”: Impose arbitrary direction Not clear how to parameterize CPD for variables involved in multiple interactions Very difficult within a class-based parameterization [Taskar, Abbeel, K. 2001] Markov Networks:  Markov Networks A Markov network is an undirected graph over some set of variables V Graph associated with a set of potentials i Each potential is factor over subset Vi Variables in Vi must be a (sub)clique in network Markov Networks:  Markov Networks Laura Noah Mary James Kyle Relational Markov Networks:  Relational Markov Networks Universals: Probabilistic patterns hold for all groups of objects Locality: Represent local probabilistic dependencies Sets of links give us possible interactions RMN Semantics:  Welcome to CS101 RMN Semantics Grade Grade Intelligence Intelligence George Jane Jill Intelligence Geo Study Group CS Study Group Grade Grade Outline:  Outline Relational Bayesian Networks Relational Markov Networks Collective Classification* Discriminative training Web page classification Link prediction Relational clustering * with Ben Taskar, Carlos Guestrin, Ming Fai Wong, Pieter Abbeel Collective Classification:  Model Structure Probabilistic Relational Model Training Data New Data Learning Inference Conclusions Collective Classification Train on one year of student intelligence, course difficulty, and grades Given only grades in following year, predict all students’ intelligence Example: Features: .x Labels: .y* Features: ’.x Labels: ’.y Learning RMN Parameters:  Learning RMN Parameters Template potential  Parameterize potentials as log-linear model Max Likelihood Estimation:  Max Likelihood Estimation maximizew Estimation Classification argmaxy .x .y* We don’t care about the joint distribution P(.x, .y) Web  KB:  Web  KB [Craven et al.] Web Classification Experiments:  Web Classification Experiments WebKB dataset Four CS department websites Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations Standard Classification:  Professor department extract information computer science machine learning … Standard Classification Categories: faculty course project student other Page Standard Classification:  Standard Classification working with Tom Mitchell … Page test set error 4-fold CV: Trained on 3 universities Tested on 4th Discriminatively trained naïve Markov = Logistic Regression Power of Context:  Power of Context Professor? Student? Post-doc? Collective Classification:  Collective Classification Collective Classification:  Collective Classification Classify all pages collectively, maximizing the joint label probability test set error [Taskar, Abbeel, K., 2002] More Complex Structure:  More Complex Structure More Complex Structure:  More Complex Structure Collective Classification: Results:  Collective Classification: Results Logistic Links Section Link+Section [Taskar, Abbeel, K., 2002] test set error 35.4% error reduction over logistic Max Conditional Likelihood:  Max Conditional Likelihood maximizew Estimation Classification argmaxy .x .y* We don’t care about the conditional distribution P(.y |.x) Max Margin Estimation:  margin # labeling mistakes in y Max Margin Estimation [Taskar, Guestrin, K., 2003] (see also [Collins, 2002; Hoffman 2003]) Quadratic program  Exponentially many constraints  What we really want: correct class labels Max Margin Markov Networks:  Max Margin Markov Networks We use structure of Markov network to provide equivalent formulation of QP Exponential only in tree width of network Complexity = max-likelihood classification Can solve approximately in networks where induced width is too large Analogous to loopy belief propagation Can use kernel-based features! SVMs meet graphical models [Taskar, Guestrin, K., 2003] WebKB Revisited:  WebKB Revisited 16.1% relative reduction in error relative to cond. likelihood RMNs Predicting Relationships:  Predicting Relationships Even more interesting: relationships between objects Tom Mitchell Professor WebKB Project Sean Slattery Student Predicting Relations:  Predicting Relations Introduce exists/type attribute for each potential link Learn discriminative model for this attribute Collectively predict its value in new world Relation ... Page Word1 WordN From- ... Page Word1 WordN To- Exists/ Type ... LinkWord1 LinkWordN Category Category 72.9% error reduction over flat [Taskar, Wong, Abbeel, K., 2003] Outline:  Outline Relational Bayesian Networks Relational Markov Networks Collective Classification Relational clustering Movie data* Biological data† * with Ben Taskar, Eran Segal † with Eran Segal, Nir Friedman, Aviv Regev, Dana Pe’er, Haidong Wang, Micha Shapira, David Botstein Relational Clustering:  Model Structure Probabilistic Relational Model Unlabeled Relational Data Learning Relational Clustering Given only students’ grades, cluster similar students Example: Clustering of instances Learning w. Missing Data: EM:  Learning w. Missing Data: EM EM Algorithm applies essentially unchanged E-step computes expected sufficient statistics, aggregated over all objects in class M-step uses ML (or MAP) parameter estimation Key difference: In general, the hidden variables are not independent Computation of expected sufficient statistics requires inference over entire network Learning w. Missing Data: EM:  Learning w. Missing Data: EM low / high easy / hard [Dempster et al. 77] Movie Data:  Movie Data Internet Movie Database http://www.imdb.com Discovering Hidden Types:  Discovering Hidden Types Type Type Type [Taskar, Segal, K., 2001] Learn model using EM Discovering Hidden Types:  Discovering Hidden Types [Taskar, Segal, K., 2001] Biology 101: Gene Expression:  Biology 101: Gene Expression DNA Swi5 Cells express different subsets of their genes in different tissues and under different conditions Gene Expression Microarrays:  Gene Expression Microarrays Measure mRNA level for all genes in one condition Hundreds of experiments Highly noisy Expression of gene i in experiment j Experiments Genes Induced Repressed Standard Analysis:  Standard Analysis Cluster genes by similarity of expression profiles Manually examine clusters to understand what’s common to genes in cluster General Approach:  General Approach Expression level is a function of gene properties and experiment properties Learn model that best explains the data Observed properties: gene sequence, array condition, … Hidden properties: gene cluster Assignment to hidden variables (e.g., module assignment) Expression level as function of properties Clustering as a PRM:  Level Gene Experiment Cluster Expression ID Clustering as a PRM Modular Regulation:  Modular Regulation Learn functional modules: Clusters of genes that are similarly controlled Learn control program for modules Expression as function of control genes Module Network PRM:  [Segal, Regev, Pe’er, Koller, Friedman, 2003] Level Gene Controlk Experiment Cluster Expression Control2 Control1 Module Network PRM Activity level of control gene in experiment Experimental Results:  Experimental Results Yeast Stress Data (Gasch et al.) 2355 genes that showed activity 173 experiments (microarrays): Diverse environmental stress conditions (e.g. heat shock) Learned module network with 50 modules: Cluster assignments are hidden variables Structure of dependency trees unknown Learned model using structural EM algorithm Segal et al., Nature Genetics, 2003 Biological Evaluation:  Biological Evaluation Find sets of co-regulated genes (regulatory module) Find the regulators of each module [Segal et al., Nature Genetics, 2003] 46/50 30/50 Experimental Results:  Experimental Results Hypothesis: Regulator ‘X’ regulates process ‘Y’ Experiment: Knock out ‘X’ and rerun the experiment X [Segal et al., Nature Genetics, 2003] Differentially Expressed Genes:  Differentially Expressed Genes [Segal et al., Nature Genetics, 2003] Biological Experiments Validation:  Were the differentially expressed genes predicted as targets? Rank modules by enrichment for diff. expressed genes Biological Experiments Validation [Segal et al., Nature Genetics, 2003] Biology 102: Pathways:  Biology 102: Pathways Pathways are sets of genes that act together to achieve a common function Finding Pathways: Attempt I:  Finding Pathways: Attempt I Use protein-protein interaction data Finding Pathways: Attempt I:  Finding Pathways: Attempt I Use protein-protein interaction data Finding Pathways: Attempt I:  Finding Pathways: Attempt I Use protein-protein interaction data Problems: Data is very noisy Structure is lost: Large connected component in interaction graph (3527/3589 genes) Finding Pathways: Attempt II:  Finding Pathways: Attempt II Use expression microarray clusters Pathway I Pathway II Problems: Expression is only ‘weak’ indicator of interaction Interacting pathways are not separable Finding Pathways: Our Approach:  Finding Pathways: Our Approach Use both types of data to find pathways Find “active” interactions using gene expression Find pathway-related co-expression using interactions Pathway I Pathway II Pathway III Pathway IV [Segal, Wang, K., 2003] Probabilistic Model:  Probabilistic Model ... Pathway Exp1 ExpN Gene Interacts [Segal, Wang, K., 2003] 1 Expression level in N arrays protein product interaction Compatibility potential Cluster all genes collectively, maximizing the joint model likelihood Capturing Protein Complexes:  Capturing Protein Complexes Independent data set of interacting proteins 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 70 80 90 100 Complex Coverage (%) Num Complexes Our method Standard expression clustering 124 complexes covered at 50% for our method 46 complexes covered at 50% for clustering [Segal, Wang, K., 2003] RNAse Complex Pathway:  YHR081W RRP40 RRP42 MTR3 RRP45 RRP4 RRP43 DIS3 TRM7 SKI6 RRP46 CSL4 RNAse Complex Pathway YHR081W SKI6 RRP42 RRP45 RRP46 RRP43 TRM7 RRP40 MTR3 RRP4 DIS3 CSL4 Includes all 10 known pathway genes Only 5 genes found by clustering [Segal, Wang, K., 2003] Interaction Clustering:  Interaction Clustering RNAse complex found by interaction clustering as part of cluster with 138 genes [Segal, Wang, K., 2003] Truth in Advertising:  Truth in Advertising Huge graphical models: 3000-50,000 hidden variables Hundreds of thousands of observed nodes Very densely connected Learning: Multiple iterations of model updates Each requires running inference on the model Inference: Exact inference is intractable Use belief propagation Single inference iteration: 1-6 hours Algorithmic ideas key to scaling Relational Data: A New Challenge:  Relational Data: A New Challenge Data consists of different types of instances Instances are related in complex networks Instances are not independent New tasks for machine learning Collective classification Relational clustering Link prediction Group detection Opportunity Slide78:  http://robotics.stanford.edu/~koller/

Add a comment

Related presentations

Related pages


SIGKDD promotes basic research and development in KDD, adoption of "standards" in the market in terms of terminology, evaluation, methodology and ...
Read more

KDD-2003 Workshop on Link Analysis and Group Detection ...

Workshop on Link Analysis and Group Detection (LinkKDD2004) August 22, 2004 To be held at KDD-2004, The Tenth ACM SIGKDD International Conference on ...
Read more

KDD-2003 Workshop on Link Analysis for Detecting Complex ...

Workshop on Link Analysis for Detecting Complex Behavior (LinkKDD2003) August 27, 2003 ... ICML-99 Workshop on Machine Learning in Text Data Analysis;
Read more

Module F – Markov Analysis Module Topics | Many PPT

Module F - Markov Analysis Module Topics Module F - Markov Analysis . ... http://robotics.stanford.edu/%257Ekoller/talks/icml-kdd2003.ppt. Preview. Download.
Read more

Proceedings of the Ninth ACM International Conference on ...

274-281 2003 conf/icml/2003 ICML db/conf/icml/icml2003.html#JensenNH03 http://www.aaai.org/Library/ICML/2003/icml03-038 ... 956830 db/conf/kdd/kdd2003.html
Read more

Relational Databases and Statistical Processing | Many PPT

Relational Databases and Statistical Processing Relational Databases and Statistical Processing . Andrew Westlake. Survey & Statistical Computing +44 20
Read more

Learning Relational Probability Trees | PPT Directory

Learning Relational Probability Trees Learning Relational Probability Trees . Jennifer Neville. David Jensen. Lisa Friedland. Michael Hay . Presented by
Read more

Learning to Extract Proteins and their Interactions from ...

Learning to Extract Proteins and their Interactions from University of Texas at Austin . Machine Learning Group . Learning to Extract Proteins and their
Read more