The prediction of the structure of a protein sequence is a
diffcult problem. We have developed a number of methods to faciliate
and automate the current prediction schemes. These methods use
machine-learning techniques to assign different types of quality
measures to protein models (ProQ) or to parts of protein models
(ProQres). The score presented in the result list is the ProQ score. Each ProQ score is also linked
to the corresponding ProQres score for each residue in protein model.
What is a good score?
Currently two scores are reported the global ProQ (see below) and the sum over predicted local S-scores (ProQres) divided by the query sequence length, this score is 1 for a perfect model and 0 for a useless model. In CASP7 it was better the select models based on the sum over local S-scores, since this score have a tendency to select longer models (see below). For Pmodeller the models are sorted by this score.
ProQ predicts -log of the LGscore measure. LGscore is essential a P-value for the significance of a structural similarity match. The statistics between different version of LGscore has changed, but for the version used for optimizing ProQ a significant LGscore is 1.e-1.5 corresponding to a ProQ score of 1.5. Thus a score above 1.5 should be good.
ProQres predicts the so called S-score for each individual residue (You will get a plot over this score when you click the link on each score.)
The S-score is a transformation of the normal RMSD for each residue
using the following formula: S_i=(1/sqrt(1+RMSD_i^2/5)), where RMSD_i
is the local RMSD deviation for residue i based on a global superposition
trying to maximize essentially the sum of S-score over the whole model (in reality the superposition is a trade off between getting a high sum of S-score and the length of the structural alignment).
It was shown in CASP7 that using the sum of predicted local S-scores (ProQres) on
average selected better than models than using the global ProQ score. The reason for this
is that the ProQ score is made to be independent of length. However, in many cases
when there are marginally differences in ProQ score it is better to select a longer model.
This is especially true for proteins with more than one domain, where the set of plausable models
consists of a mixture of models with either one of the domains or more domains. In this case the models,
containing fewer domains might actually get a better ProQ score, since it lacks problematic linker regions and domain
interfaces which might contain unfavorable interactions.
Short description, links and references
- is a
neural network based predictor that based on a number of structural
features predicts the quality of a protein model. For more details
see: Can correct protein models be identified? Björn Wallner and
Arne Elofsson (2003). Protein Sci. 12(5):1073-1086. PDF
- is a neural network based predictor that based on a number of
structural features predicts the quality of different parts of protein
model. For more details see:
Identification of correct regions in protein models using structural,
alignment and consensus information. Björn Wallner and Arne Elofsson
(2005). Protein Sci., 15(4):900-913 PDF