SUMOylation site prediction

75 %
25 %
Information about SUMOylation site prediction
Education

Published on October 30, 2008

Author: allPowerde

Source: slideshare.net

Description

This presentation is about predicting the sites within the primary sequence of a protein that are involved in the SUMOylation process.

SUMOylation-site Prediction Denis C. Bauer Fabian A. Buske Mikael Bod én

Overview Background SUMOylation - what is that ? Published predictors Our approach What makes SUMO hard to tackle

Background

SUMOylation - what is that ?

Published predictors

Our approach

What makes SUMO hard to tackle

SUMO is not 相撲 S mall U biquitin-related Mo difier is a small protein of 97 amino acids. 20% homology to ubiquitin Post-translational modification Covalently attached to Lysines Involved in many pathways/mechanisms Transcriptional regulation Compartmentisation

S mall U biquitin-related Mo difier is a small protein of 97 amino acids.

20% homology to ubiquitin

Post-translational modification

Covalently attached to Lysines

Involved in many pathways/mechanisms

Transcriptional regulation

Compartmentisation

SUMOylation pathway

SUMOylation motif One consensus motif [ILV]K.E for about 60% of known sites However Not all [ILV]K.E -sites are SUMOylated Not all SUMOylated sites have the consensus motif TP FP FN

One consensus motif [ILV]K.E for about 60% of known sites

However

Not all [ILV]K.E -sites are SUMOylated

Not all SUMOylated sites have the consensus motif

Baseline prediction Method CC Regular Expression scanner 0.68

Comparison with existing predictors + Xu J., BMC Bioinformatics 2008, 9:8 ‡ Xue Y., Nucleic Acid Res 2006, W254 -W 257 † http://www.abgent.com/doc/sumoplot (commercial) Method CC Regular Expression scanner 0.68 SUMOpre + 0.64 SUMOsp ‡ 0.26 SUMOplot † 0.48

Case study : Core histones in yeast Identified SUMOylation sites + H2B : K6/7, K16/17 H2A : K2, K126 H4 : somewhere in the tail No SUMOylation consensus site Predictor to date are not able to predict even a single SUMOylation site in the histone sequence + Nathan D., Genes Dev 2006, 20(8):966-76

Identified SUMOylation sites +

H2B : K6/7, K16/17

H2A : K2, K126

H4 : somewhere in the tail

No SUMOylation consensus site

Predictor to date are not able to predict even a single SUMOylation site in the histone sequence

Our approach Identify window size which ML method is best Voil á: better predictor ! Sequence xxxx K xxxx SUMOylation 1/0 ML

Identify

window size

which ML method is best

Voil á: better predictor !

Training in more Detail w U w D Protein Sequence K Imbalance in the dataset - more negatives than positives SUMOylated K Not SUMOylated K K K ML T 0 1 0 P 1 1 0 K K

Prediction in more Detail w U w D Protein Sequence K K K Trained ML 1 1 0 K K SUMOylated K Not SUMOylated K K K

ML methods Bidirectional Recurrent Neural Network (BRNN) Using information of flanking windows Decaying with distance to center window Prone to overfit Support Vector Machine (SVM) regularized requires suitable kernel and feature representation Standard Kernels Linear, Polynomial, RBF String Kernel P-kernel, local-alignment kernel

Bidirectional Recurrent Neural Network (BRNN)

Using information of flanking windows

Decaying with distance to center window

Prone to overfit

Support Vector Machine (SVM)

regularized

requires suitable kernel and feature representation

Standard Kernels

Linear, Polynomial, RBF

String Kernel

P-kernel, local-alignment kernel

Data set Training/Testing data 144 proteins with 241 SUMOylation sites 5,741 non-SUMOylated Lysines 68% of the SUMOulated sites confom to the consensus motif Hold-out 13 proteins with 27 SUMOylation sites 48% consensus motif Xu J., BMC Bioinformatics 2008, 9:8

Training/Testing data

144 proteins with

241 SUMOylation sites

5,741 non-SUMOylated Lysines

68% of the SUMOulated sites confom to the consensus motif

Hold-out

13 proteins with

27 SUMOylation sites

48% consensus motif

Evaluation 5-fold cross-validation Matthews correlation coefficient (CC) Sensitivity, Specificity, Accuracy Area under the curve ( AUC )

5-fold cross-validation

Matthews correlation coefficient (CC)

Sensitivity, Specificity, Accuracy

Area under the curve ( AUC )

Performance overview SUMOsvm

Comparison with existing methods

Quest to improve performance Protein structural features and evolutionary features Separating SUMOylation sites from different species or compartment Clustering for other motifs using kernel hierarchical clustering

Protein structural features and evolutionary features

Separating SUMOylation sites from different species or compartment

Clustering for other motifs using kernel hierarchical clustering

Summary Regular Expression Scanner is still the best classifier. SUMO more versatile than expected ! The road to better predictions Are there other motifs? Which features can discriminate? Is the dataset biased? http://spot.colorado.edu/~colemab/Theatre_Resources/SumoBallerina.jpg

Regular Expression Scanner is still the best classifier.

SUMO more versatile than expected !

The road to better predictions

Are there other motifs?

Which features can discriminate?

Is the dataset biased?

Acknowledgment Predictor/Analysis Mikael Bod én Fabian Buske Dataset Xu et al. PhD Supervisors Tim Bailey Andrew Perkins Mikael Bod én Other Bioinformatic tools: STREAM – a practical workbench for modeling transcriptional regulation. www.bioinformatics.org.au/stream/

Predictor/Analysis

Mikael Bod én

Fabian Buske

Dataset

Xu et al.

PhD Supervisors

Tim Bailey

Andrew Perkins

Mikael Bod én

Add a comment

Related presentations

Related pages

GPS-SUMO: Prediction of SUMOylation Sites & SUMO ...

GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO ... Systematic study of protein sumoylation: Development of a site -specific ...
Read more

GPS-SUMO: Prediction of SUMOylation Sites & SUMO ...

The performance of GPS-SUMO in different threshold; Threshold: Sumoylation: SUMO-interaction: Ac: Sn: Sp: MCC: Pr: Ac: Sn: Sp: MCC: Pr: High: 93.65%: 61.09 ...
Read more

SUMOylation site and SIM prediction software tools | PTM ...

Find and compare the best bioinformatics software for identifying SUMOylation sites and SUMO-interacting motifs in protein sequences. Tools are ranked by ...
Read more

SUMOsp: a web server for sumoylation site prediction.

1. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W254-7. SUMOsp: a web server for sumoylation site prediction. Xue Y(1), Zhou F, Fu C, Xu Y, Yao X.
Read more

SUMOsp: a web server for sumoylation site prediction

INTRODUCTION. Sumoylation, a reversible post-translational modification (PTM) of proteins by the small ubiquitin-related modifiers (SUMOs ...
Read more

Predicting protein sumoylation sites from sequence features.

The accurate prediction of protein sumoylation sites may help biomedical ... provide enough context information for sumoylation site prediction.
Read more

GPS-SUMO | SUMOylation site prediction: PTM analysis ...

A tool for the prediction of both sumoylation sites and SUMO-interaction motifs (SIMs) in proteins. To obtain an accurate performance, a new generation ...
Read more

SUMOsp: a web server for sumoylation site prediction

Abstract. Systematic dissection of the sumoylation proteome is emerging as an appealing but challenging research topic because of the ...
Read more

SUMOplot™ Analysis Program | Abgent

The SUMOplot™ Analysis Program predicts and scores sumoylation sites in your protein.
Read more

Mapping of SUMO sites and analysis of SUMOylation changes ...

Mapping of SUMO sites and analysis of SUMOylation changes ... In addition to the identification of the SUMO site per se, a comparison of the SUMOylation ...
Read more