Structure-function predictions based on scores derived from delaunay tessellations

For a detailed description of the method and results, please refer to :

- Mathe, E, Olivier, M., Kato, S., Ishioka, C., Vaisman, II, Hainaut, P. 2006 Predicting the transactivation activity of p53 missense mutants using a four-body potential score derived from Delaunay tessellations. Human Mutat 27(2):163-72.

Here is a brief description:

The Delaunay tessellation of a protein allows the identification of all four nearest neighbor residues (quadruplets), where each residue is represented by the position of its center of mass (determined from the 3D coordinates extracted from a PDB structure file). Delaunay tessellations represent an objective and unambiguous definition of nearest residue neighbors and provide a framework for calculating empirical potentials.

The amino acid composition of quadruplets is evaluated to analyze which clusters of four amino acids tend to be close together in folded protein structures. The log-likelihood of the quadruplets derived from a Delaunay Tessellation were calculated from a training set of 1417 protein structures solved via x-ray crystallography (extracted from PDB), with low sequence homology and high resolution. Previous analyses of the distribution of these log likelihood scores showed that it is is non random. Quadruplets with highest log likelihoods contain cysteines, which are structurally important as they form sulfur bridges and they are involved in metal binding motifs [Vaisman II, 1998].

Potential scores for each residue are calculated by summing up the log-likelihoods of the quadruplets in which that residue participates, while potential scores for the entire protein are computed by taking the sum of the log-likelihoods of all quadruplets found in the protein. These scores were used for the identification of tertiary packing motifs and functional signature motifs common to structures belonging to the same protein family [Tropsha, et al., 2003]. Furthermore, the distribution of the residue potential scores indicates that low scores are associated with surface residues, which have less structural neighbors, and high scores are associated with residues in the hydrophobic core, which are structurally important for maintaining the conformation of a protein [Carter, et al., 2001; Masso and Vaisman, 2003]. Potential score profiles for a protein are also derived by constructing a vector with N elements, where N is the number of amino acids in a given protein. Each element in the vector represents the potential score of the corresponding residue.

To simulate the introduction of a missense mutation, the amino acid letter code is modified at the appropriate residue position, which changes the amino acid composition of the quadruplets in which it participates. The potential score for the mutant is calculated using the Delaunay tessellation of the wild-type structure with the modified primary sequence. Potential score differences, called Residual Scores (RS) are then calculated, for each residue or for the entire protein, by subtracting the potential score of mutants from that of the wild-type. RS has been shown to correlate well with the stability of mutants [Carter, et al., 2001] and have also been successfully applied to classify non-synonymous SNPs as disease-associated or not [Barenboim, et al., 2005]. Similar to potential profiles, Residual Score Profiles (RSP) are vectors of N (number of amino acids in a given protein) elements, each representing the residual score for every amino acid in the protein under study.

References:

- Mathe, E, Olivier, M., Kato, S., Ishioka, C., Vaisman, II, Hainaut, P. 2006 Predicting the transactivation activity of p53 missense mutants using a four-body potential score derived from Delaunay tessellations. Human Mutat 27(2):163-72.

- Poupon, A. (2004) Voronoi and Voronoi-related tessellations in studies of protein structure and interaction. Curr Opin Struct Biol, 14, 233-241.

- Krishnamoorthy, B. and Tropsha, A. (2003) Development of a four-body statistical pseudo-potential to discriminate native from non-native protein conformations. Bioinformatics, 19, 1540-1548.

- Masso, M. and Vaisman, II. (2003) Comprehensive mutagenesis of HIV-1 protease: a computational geometry approach. Biochem Biophys Res Commun, 305, 322-326.

- Tropsha, A., Carter, C.W., Jr., Cammer, S. and Vaisman, II. (2003) Simplicial neighborhood analysis of protein packing (SNAPP): a computational geometry approach to studying proteins. Methods Enzymol, 374, 509-544.

- Carter, C.W., Jr., LeFebvre, B.C., Cammer, S.A., Tropsha, A. and Edgell, M.H. (2001) Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. J Mol Biol, 311, 625-638.

- Singh, R.K., Tropsha, A. and Vaisman, II. (1996) Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol, 3, 213-221.

- Vaisman II, T.A., Zheng W. (1996) Compositional Preferences in Quadruplets of Nearest Neighbor Residues in Protein Structures: Statistical Geometry Analysis. IEEE Symposia on Intelligence and Systems, 163-168.

- Cho, Y., Gorina, S., Jeffrey, P.D. and Pavletich, N.P. (1994) Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science, 265, 346-355.