Site classification at PDB scale

Classification of binding sites at PDB scale with MEDP-SiteClassifier

MED-SuMo has an interesting and original approach to detect structural and functional similarities between protein binding sites [1-3]. We decided to use its ability to classify datasets of structures. This new method is called MED-SuMo_Multi Approach (MED-SMA renamed in MEDP-SiteClassifier) [4-6]. It enables the comparison of all the binding sites of a dataset using a pairwise comparison system. To then build a similarity matrix which is classified with Markov Clustering algorithm.
To begin, a list of proteins is selected. Then, two strategies can be adopted to create the MED-SuMo database: (i) the database contains all binding sites of the selected proteins, (i.e. binding sites where the co-crystallized ligands obey to certain prefixed rules (maximum (or minimum) number of atoms, number of residues if it is a small peptide…)); (ii) the database contains only specified binding sites (i.e. only ATP binding sites).
Once the database is created, the pairwise comparison is launched using MED-SuMo comparison procedure. These comparisons outlines the similar SCF between pairs of binding sites. Groups of SCFs between binding sites are gathered; they formed patches. Patches associated to the same binding sites are analyzed: if two patches share enough SCFs (threshold named covering factor), they are merged in a multipatch. Multipatches represent the true meaningful common regions of binding sites. They ensured two properties: (i) enough SCFs are in common, i.e., binding sites are really similar and (ii) they output subpockets similarity. To compute the similarity Matrix, the MED-SuMo score between matching multipatches is calculated. At the end, MCL interprets the matrix and classifies the protein binding site dataset into clusters of sub-sites. A 2D plot of the clusters can be visualized using Biolayout [7-8].

Application to protein binding sites which are clearly structurally and functionally different: serine proteases, kinases and lectins

We show that the binding sites are correctly classified. Classification of an unrelated protein binding sites subset: serine proteases, kinases and lectins. MED-SuMo/MEDP-SiteClassifier classifies perfectly each family into 3 separate clusters in a short computing time (26 seconds on a single CPU).


 

 

Application to protein binding sites families which are related as they bind the same ligand ATP: HSP90, Topoisomerase, HSP70, mutL, HSP70, Actin, Kinesin

200 structures are classified in a short computing time (10 minutes on a single CPU). The families are grouped into clusters which are interconnected in some cases:

Conclusion

The 2 case studies presented here highlight MED-SuMo/MEDP-SiteClassifier efficiency to classify structural subsets. MED-SuMo can not only separate families (first case study) that are not related but it also indicates functional links between related ones (second case study). In the protein data bank, topoisomerase and HSP90 are shown to have the same binding mode with two co-cristallized structures resolved with the same ligand RADICOL. Here we outline the link between these two families using MEDP-SiteClassifier. In addition to the short computing time (10’ on 1CPU to treat 200 structures), this finding gives perspective to our classification method.

In 2008/2009, we are applying this fast and accurate approach to classify all the binding sites of the PDB In the POPS project.

MEDP-SIteClassifier brochure

References

[1] Jambon M., Imberty A., Deleage G., Geourjon C. (2003) A new bioinformatic approach to detect common 3D sites in protein structures, Proteins, 52:137-134.

[2] Jambon M., Andrieu O., Combet C., Deleage G., Delfaud F., Geourjon C. (2005) The SuMo server: 3D search for protein functional sites, Bioinformatic, 21:3929-3930.

[3] Doppelt O., Moriaud F., Bornot A., de Brevern A.G. (2007) Functionnal annotation for protein structures, Bioinformation, 1:357-359 PDF from bioinformation website

[4] Doppelt O. et al. “Classification of binding sites with MED-SuMo: application to the purinome” to be published

[5] Olivia Doppelt, Julien Castillan, Olivier Andrieu, Alexandre G. de Brevern and Fabrice Moriaud, A new functional classification method based on local protein surface comparison using MED-SuMo software, GGMM, 2007, Grenoble, France.

[6] Olivia Doppelt, Julien Castillan, François Delfaud, Alexandre G. de Brevern and Fabrice Moriaud, Structural Classification of diverse binding sites using 3D Surface chemical features

[7] Enright AJ, Ouzounis CA. "BioLayout--an automatic graph layout algorithm for similarity visualization." Bioinformatics. 2001 Sep;17(9):853-4.

[8] Goldovsky L, Cases I, Enright AJ, Ouzounis CA. "BioLayout(Java): versatile network visualisation of structural and functional relationships." Appl Bioinformatics. 2005;4(1):71-4.


Printable Page
Print this page