Predikin and PredikinDB: tools to predict protein kinase peptide specificity

67 %
33 %
Information about Predikin and PredikinDB: tools to predict protein kinase peptide...

Published on June 12, 2008

Author: neilfws

Source: slideshare.net

Description

Talk given at Bioinformatics Australia 2007 meeting in Brisbane. Note: the ROC analyses are out of date now, but the conclusions still hold.

Outline of talk Introduction to protein kinases Prediction of substrate specificity Predikin and PredikinDB Evaluation Neil Saunders School of Molecular and Microbial Sciences University of Queensland

Introduction to protein kinases

Prediction of substrate specificity

Predikin and PredikinDB

Evaluation

Introduction to protein kinases kinase ATP protein OH + protein OPi kinase ADP + Biochemistry Two major (eukaryotic) types: (1) Ser/Thr; (2) Tyr ~ 2% of human genes encode a protein kinase At least 30-50% of human proteins phosphorylated Regulate essentially every cellular process

Two major (eukaryotic) types: (1) Ser/Thr; (2) Tyr

~ 2% of human genes encode a protein kinase

At least 30-50% of human proteins phosphorylated

Regulate essentially every cellular process

Complex signalling networks How do protein kinases find their targets?

Kinase specificity – substrate recruitment Remenyi et al. (2006) Docking interactions in protein kinase and phosphatase networks. Curr Opin Struct Biol 16: 676-685 LOCATE calcium/calmodulin-dependent protein kinase IV Substrate recruitment Any process that brings substrate to kinase - docking - binding to scaffolding protein(s) - colocalisation - coregulation Docking interactions Colocalisation

Substrate recruitment

Any process that brings substrate to kinase

- docking

- binding to scaffolding protein(s)

- colocalisation

- coregulation

Kinase specificity - peptide specificity Amino acid frequency in substrate sequences at X{7}[ST]X{7} sites CK-2 PKA MAPK

Structural basis for peptide specificity Substrate heptapeptide binding to protein kinase A PKA surface + heptapeptide RRASIHD Schematic of heptapeptide + PKA SDRs

Accurate location of key residues using HMMER *->Yellkkl GkG aFGkVylardkktgrlv AiK vik..........eril Y+++k+lG+G+FGkV+la+++ tg++vA+K+i+++ +++ + ri+ snf1p 55 YQIVKTL GE GS F GKVKLAYHTTTGQKV ALK IINkkvl aks dmqGRIE 101 rEikiLkk.dHPNIVkLydvfed.dklylVmEyceGdl GdL fdllkkrgr rEi+ L+ +HP+I+kLydv+ ++d++ +V Ey+++ +Lfd++++r + snf1p 102 REISYLRLlRHPHIIKLYDVIKSkDEIIMVIEYAGN-- - E L FD YIVQRDK 148 rglrkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvK +sE+ear++f+Qi+sa+eY+H+++I+HRDLKPeN+LLd++ +vK snf1p 149 ------MSEqEARRFFQQIISAVEYCHRHKIVHRDLKPENLLLDEhlNVK 192 la DFG lArql......ttfvGTpeYm APE vl...gYgkpavDiWSlGcil +aDFGl+ ++++++ +t +G+p+Y APEv++++ Y +p+vD+WS+G+il snf1p 193 IA DFG L SNIMtdgnflK TS CG S P NY A APE VIsgkLYAGPEVDVWSCGVIL 242 yElltGkpPFp..qldlifkkig..........SpeakdLikklLvkdPe y +l+++ PF+++ + ++fk+i ++ ++ ++ Sp a Lik++L ++P snf1p 243 YVMLCRRLPFDdeSIPVLFKNISngvytlpkflSPGAAGLIKRMLIVNPL 292 kRlta.eaLedeldikaHPff<-* +R++++e+++ + +f snf1p 293 NRISIhEIMQ-------DDWF 306 GkG, AiK, GdL, DFG, APE anchor positions -3 +3 Substrate heptapeptide X X X [ST] X X X

Predikin: components PredikinDB : database of phosphorylation sites Predikin.pm : Perl module to process kinases Web server

PredikinDB : database of phosphorylation sites

Predikin.pm : Perl module to process kinases

Web server

Why not phospho.ELM? +------+-----------+--------+----+-------+------------+------+----------------------+ | acc | sequence | position | code | pmids | kinases | source | entry_date | +------+-----------+--------+----+-------+------------+------+----------------------+ | P04083 |AMVSEFLK...| 20 | Y |2457390| Abl;Src;EGFR |LTP |2004-12-31 00:00:00+01| +------+-----------+--------+----+-------+------------+------+----------------------+ A phosphoELM entry Problems Incorrect/missing accession numbers Phosphoresidues not at given positions Multiple kinase entries per substrate Inconsistent names for kinase families No way to link kinase name with kinase sequence FT MOD_RES 26 26 Phospho serine ( by PKC ). phospho.ELM is derived from SwissProt entries http://phospho.elm.eu.org

Problems

Incorrect/missing accession numbers

Phosphoresidues not at given positions

Multiple kinase entries per substrate

Inconsistent names for kinase families

No way to link kinase name with kinase sequence

PredikinDB construction Substrate UniProt entry ID IF2A_MOUSE AC Q6ZWX6 ; Q3TIQ0; OS Mus musculus (Mouse) . FT MOD_RES 49 49 Phospho serine (by HRI ) ( By similarity ). FT MOD_RES 52 52 Phospho serine (by EIF2AK3 , GCN2 , HRI and PKR ). Entries in table_kinases that match kinase name and species Q9Z2R9 EIF2AK1 Eif2ak1; Hri Q9Z2B5 EIF2AK3 Eif2ak3;Pek;Perk Q9QZ05 EIF2AK4 Eif2ak4; Gcn2 ;Kiaa1338 Q03963 EIF2AK2 Eif2ak2; Pkr ;Prkr;Tik Entry in table_psites substrate_ac residue posn hepta conf kinase_name kinase_ac Q6ZWX6 S 49 ILLSELS 2 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 EIF2AK3 Q9Z2B5 Q6ZWX6 S 52 SELSRRR 1 GCN2 Q9QZ05 Q6ZWX6 S 52 SELSRRR 1 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 PKR Q03963 PredikinDB links phosphorylation sites to their specific kinase sequences

PredikinDB – table schema table_kinases kinase_ac (AC) kinase_id (ID) domain domain_seq kinase_type kinase_name (GN Name) kinase_syn (GN Synonyms) panther_name panther_ac panther_evalue ksd_name ksd_ac ksd_evalue species (OS) kingdom (OC) + 38 SDR-related residues table_psites ID substrate_ac (AC) residue (MOD_RES) position (MOD_RES) hepta confidence (MOD_RES) kinase_name (MOD_RES) kinase_ac (AC) table_substrates substrate_ac (AC) substrate_id (ID) species (OS) kingdom (OC)

The Predikin Perl module External tools - HMMER + HMM libraries - pantherScore - DisEMBL, TMHMM (filters) Bioperl libraries ( http://www.bioperl.org) protein kinase sequence find catalytic domains assign kinase type locate SDRs assign KSD family assign PANTHER family find substrate XXX[STY]XXX make kinase scoring matrix score XXX[STY]XXX sites

External tools

- HMMER + HMM libraries

- pantherScore

- DisEMBL, TMHMM (filters)

Bioperl libraries ( http://www.bioperl.org)

Scoring matrices: SDR method Query kinase: GEL+1 = E GEL+3 = F GEL+4 = S Type = Ser/Thr SQL query for heptapeptide position -3: select hepta from psites, kinases where kinase_type = 'Ser/Thr' and psites.kinase_ac = kinases.kinase_ac and GELp1 rlike '[ D E N ] ' and GELp3 rlike '[ F WY ] ' and GELp4 rlike '[ AN S T ]' Heptapeptides : Q FSTVKG E QFSTVK R SVSEAA R SGSSPN R HDSGLD R RMSDEF A RGSFDA Repeat for positions -2 to +3 and corresponding SDRs Frequency matrix PWM (weights) matrix score substrates

Scoring matrices: filters and cutoffs Residue Phosphosites Disordered 1 TM Helix 2 S 24 637 23 081 16 T 5 405 4 898 5 Y 4 285 3 318 12 Total 34 327 31 297 ( 91.2% ) 33 ( 0.1% ) Most sites disordered (DisEMBL prediction) Most sites not in TM helix (TMHMM prediction)

Most sites disordered (DisEMBL prediction)

Most sites not in TM helix (TMHMM prediction)

Evaluation of Predikin A brief area under ROC curve primer Outline of evaluation procedure Obtain kinase-substrate pairs from PredikinDB Construct scoring matrix for kinase (don't include its substrates) Score all XXX[ST]XXX sites in corresponding substrate Label sites as 1 (known, annotated) or 0 (unknown, unannotated) Generate AROC values using R package ROCR TN TP FP FN unannotated sites annotated sites scores ROC curve

Outline of evaluation procedure

Obtain kinase-substrate pairs from PredikinDB

Construct scoring matrix for kinase (don't include its substrates)

Score all XXX[ST]XXX sites in corresponding substrate

Label sites as 1 (known, annotated) or 0 (unknown, unannotated)

Generate AROC values using R package ROCR

Evaluation of comparable methods Comparison with existing methods is not easy Existing tools take a substrate and score sites based on a kinase family Predikin takes kinase(s) + substrate(s) and scores sites based on kinase sequence Problems to solve Determine the kinase families common to other tools Relate families to kinase sequences in PredikinDB Submit corresponding substrates to each server - (no API, standalone tools, web services...) Collate scored XXX[ST]XXX sites common to all methods Format data for AROC analysis Example submission using HTML::Form (NetPhosK) # get the form my $ua = LWP::UserAgent->new; my $response = $ua->get($url); my @forms = HTML::Form->parse($response); # set the values $forms[0]->value(' SEQSUB ', “myfile.fa”); $forms[0]->value(' threshold ', '0.00'); # submit the form my $output = $ua->request($form[0]-> click ); # parse output

Existing tools take a substrate and score sites based on a kinase family

Predikin takes kinase(s) + substrate(s) and scores sites based on kinase sequence

Problems to solve

Determine the kinase families common to other tools

Relate families to kinase sequences in PredikinDB

Submit corresponding substrates to each server

- (no API, standalone tools, web services...)

Collate scored XXX[ST]XXX sites common to all methods

Format data for AROC analysis

Evaluation results Predikin performance equals or exceeds that of existing methods Performance may depend on type of kinase Ser/Thr kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.93 0.90 75/5334 KinasePhos 0.80 0.88 76/5663 GPS 0.88 0.87 72/5307 PPSP 0.92 0.87 75/3778 Scansite 0.96 0.87 55/2936 CMGC kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.62 0.96 211/9146 KinasePhos 0.94 0.96 211/9106 GPS 0.93 0.96 211/9146 PPSP 0.94 0.96 208/8158 Scansite 0.95 0.97 175/5039

Predikin performance equals or exceeds that of existing methods

Performance may depend on type of kinase

Usage cases kinase substrate score CLA4 1 CLA4 727 KRA T MVG 92.93 CLA4 1 YOL113W 541 KRATMVG 92.93 CLA4 1 YHL021C 129 KGSSFVS 91.87 CLA4 1 YKR010C 527 KRNSITE 91.70 CLA4 1 YNL049C 526 RATSFFG 90.14 CLA4 1 YDL056W 477 KRKSTTP 88.70 CLA4 1 YOL157C 527 KLFSFTK 88.25 CLA4 1 YBR198C 157 RAYSMLK 87.71 CLA4 1 YML076C 878 HRESMTG 87.62 CLA4 1 YOR181W 619 KRKTKVG 87.37 kinase substrate score NP_001547 1 COA1 80 SSM S GLH 85.49 NP_001269 1 COA1 80 SSM S GLH 85.49 XP_042066 1 COA1 80 SSMSGLH 75.77 XP_001128827 1 COA1 80 SSMSGLH 75.77 NP_001013725 1 COA1 80 SSMSGLH 74.72 NP_004064 1 COA1 80 SSMSGLH 73.84 NP_006613 1 COA1 80 SSMSGLH 73.84 NP_001778 1 COA1 80 SSMSGLH 72.21 XP_001128005 1 COA1 80 SSMSGLH 72.21 NP_277021 1 COA1 80 SSMSGLH 72.21 Substrates for CLA4 A PAK/STE-20 kinase in S. cerevisiae Phosphorylates own activation loop T727? Evidence for this in literature Kinases for acetyl CoA carboxylase Known phosphorylation site on S80 Phosphorylated in AMPK knockout mice Suggested alternate kinases: IKK α/β Experimental evidence (Bruce Kemp)

Substrates for CLA4

A PAK/STE-20 kinase in S. cerevisiae

Phosphorylates own activation loop T727?

Evidence for this in literature

Kinases for acetyl CoA carboxylase

Known phosphorylation site on S80

Phosphorylated in AMPK knockout mice

Suggested alternate kinases: IKK α/β

Experimental evidence (Bruce Kemp)

The Predikin webserver: implementation http://predikin.biosci.uq.edu.au perl.so MySQL PredikinDB PHP Predikin.pm Apache Server DisEMBL TMHMM BLAST pantherScore HMMER Client (browser)

The Predikin webserver: screenshots Kinase sequence submission

The Predikin webserver: screenshots Frequency and weight matrices

The Predikin webserver: screenshots Scored sites

Acknowledgements Funding & advice (UQ) Testing Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne) Thomas Huber Brenda Andrews (U. Toronto) Predikin 1.0 (UQ) General Ross Brinkworth Kobe Lab Robert Breinl

Funding & advice (UQ) Testing

Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne)

Thomas Huber Brenda Andrews (U. Toronto)

Predikin 1.0 (UQ) General

Ross Brinkworth Kobe Lab

Robert Breinl

Add a comment

Related presentations

Related pages

Predikin and PredikinDB: a computational framework for the ...

Predikin and PredikinDB: a computational framework for the ... to predict protein kinase peptide specificity ... tools to explore the PredikinDB ...
Read more

Predikin and PredikinDB: tools to predict protein kinase ...

Predikin and PredikinDB: tools to predict protein kinase peptide specificity. Outline of talk . Introduction to protein kinases ; Prediction of substrate ...
Read more

The Predikin webserver: improved prediction of protein ...

The Predikin webserver: improved prediction of protein kinase peptide ... Predikin, to predict the peptide ... protein kinase and ...
Read more

Predikin and PredikinDB: A computational framework for the ...

... Predikin and PredikinDB: A computational framework for the prediction of protein kinase peptide specificity and an ... and predict phosphorylated ...
Read more

The Predikin webserver: improved prediction of protein ...

The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information
Read more

The Predikin webserver: improved prediction of protein ...

... improved prediction of protein kinase peptide specificity ... The Predikin webserver allows users to predict ... Predikin and PredikinDB: ...
Read more

BMC Bioinformatics BioMed Central

prediction of protein kinase peptide specificity ... tools use different names for protein kinase ... Predikin, to predict protein kinase peptide ...
Read more

Predikin | Bioinformatics.ca Links Directory

The Predikin web server allows users to predict ... Predikin and PredikinDB: ... improved prediction of protein kinase peptide specificity using ...
Read more

PeptideMine - A webserver for the design of peptides for ...

... affinity and/or specificity, peptide models for ... protein Kinase phosphorylation ... to predict and validate protein-peptide interactions is ...
Read more