Global distance test
{{short description|Measure of similarity between two protein structures}}
The global distance test (GDT), also written as GDT_TS to represent "total score", is a measure of similarity between two protein structures with known amino acid correspondences (e.g. identical amino acid sequences) but different tertiary structures. It is most commonly used to compare the results of protein structure prediction to the experimentally determined structure as measured by X-ray crystallography, protein NMR, or, increasingly, cryoelectron microscopy.
The GDT metric was developed by Adam Zemla at Lawrence Livermore National Laboratory and originally implemented in the Local-Global Alignment (LGA) program.{{cite journal |author=Zemla A |year=2003 |title=LGA: A method for finding 3D similarities in protein structures |journal=Nucleic Acids Research |volume=31 |issue=13 |pages=3370–3374 |pmid=12824330 |pmc=168977 |doi=10.1093/nar/gkg571}}{{cite patent |country=US |number=8024127 B2 |status=patent |title=Local-Global Alignment for Finding 3D Similarities in Protein Structures |pubdate= |gdate=20 September 2011 |fdate= |pridate= |inventor=Adam Zemla |invent1= |invent2= |assign1=Lawrence Livermore National Security, LLC |assign2= |class= |url=https://patentimages.storage.googleapis.com/81/61/2f/85771c7d6df4b3/US8024127.pdf}} It is intended as a more accurate measurement than the common root-mean-square deviation (RMSD) metric - which is sensitive to outlier regions created, for example, by poor modeling of individual loop regions in a structure that is otherwise reasonably accurate. The conventional GDT_TS score is computed over the alpha carbon atoms and is reported as a percentage, ranging from 0 to 100. In general, the higher the GDT_TS score, the more closely a model approximates a given reference structure.
GDT_TS measurements are used as major assessment criteria in the production of results from the Critical Assessment of Structure Prediction (CASP), a large-scale experiment in the structure prediction community dedicated to assessing current modeling techniques.{{cite journal |vauthors=Zemla A, Venclovas C, Moult J, Fidelis K |year=1999 |title=Processing and analysis of CASP3 protein structure predictions |journal=Proteins |volume=S3 |issue=S3 |pages=22–29 |pmid=10526349 |doi=10.1002/(SICI)1097-0134(1999)37:3+<22::AID-PROT5>3.0.CO;2-W |s2cid=29803757 }}{{cite journal |vauthors=Zemla A, Venclovas C, Moult J, Fidelis K |year=2001 |title=Processing and evaluation of predictions in CASP4 |journal=Proteins |volume=45 |issue=S5 |pages=13–21 |pmid=11835478 |doi=10.1002/prot.10052|s2cid=28166260 }} The metric was first introduced as an evaluation standard in the third iteration of the biannual experiment (CASP3) in 1998. Various extensions to the original method have been developed; variations that accounts for the positions of the side chains are known as global distance calculations (GDC).{{cite journal |author=Keedy, D.A. |year=2009 |last2=Williams |first2=CJ |last3=Headd |first3=JJ |last4=Arendall |first4=WB |last5=Chen |first5=VB |last6=Kapral |first6=GJ |last7=Gillespie |first7=RA |last8=Block |first8=JN |last9=Zemla |first9=A |last10=Richardson |first10=DC |last11=Richardson |first11=JS |title=The other 90% of the protein: Assessment beyond the α-carbon for CASP8 template-based and high-accuracy models |journal=Proteins |volume=77 |issue=Suppl 9 |pages=29–49 |pmid=19731372 |doi=10.1002/prot.22551 |pmc=2877634}}
Calculation
The GDT score is calculated as the largest set of amino acid residues' alpha carbon atoms in the model structure falling within a defined distance cutoff of their position in the experimental structure, after iteratively superimposing the two structures. By the original design the GDT algorithm calculates 20 GDT scores, i.e. for each of 20 consecutive distance cutoffs (0.5 Å, 1.0 Å, 1.5 Å, ... 10.0 Å). For structure similarity assessment it is intended to use the GDT scores from several cutoff distances, and scores generally increase with increasing cutoff. A plateau in this increase may indicate an extreme divergence between the experimental and predicted structures, such that no additional atoms are included in any cutoff of a reasonable distance. The conventional GDT_TS total score in CASP is the average result of cutoffs at 1, 2, 4, and 8 Å.{{cite journal |last1=Kryshtafovych |first1=A |last2=Prlic |first2=A |last3=Dmytriv |first3=Z |last4=Daniluk |first4=P |last5=Milostan |first5=M |last6=Eyrich |first6=V |last7=Hubbard |first7=T |last8=Fidelis |first8=K |title=New tools and expanded data analysis capabilities at the Protein Structure Prediction Center. |journal=Proteins |date=2007 |volume=69 Suppl 8 |issue=S8 |pages=19–26 |doi=10.1002/prot.21653 |pmid=17705273 |pmc=2656758}}{{cite web |title=Results Table Help |url=https://predictioncenter.org/casp14/doc/help.html |website=14th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction |access-date=27 December 2020}}
Variations and extensions
The original GDT_TS is calculated based on the superimpositions and GDT scores produced by the Local-Global Alignment (LGA) program. A "high accuracy" version called GDT_HA is computed by selection of smaller cutoff distances (half the size of GDT_TS) and thus more heavily penalizes larger deviations from the reference structure. It was used in the high accuracy category of CASP7.{{cite journal |first=Randy J. |last=Read |author2=Chavali, Gayatri |title=Assessment of CASP7 predictions in the high accuracy template-based modeling category |journal=Proteins |volume=69 |issue=S8 |pages=27–37 |doi= 10.1002/prot.21662 |year=2007 |pmid=17894351|s2cid=33172629 |doi-access=free }} CASP8 defined a new "TR score", which is GDT_TS minus a penalty for residues clustered too close, meant to penalize steric clashes in the predicted structure, sometimes to game the cutoff measure of GDT.{{cite journal |last1=Shi |first1=S |last2=Pei |first2=J |last3=Sadreyev |first3=RI |last4=Kinch |first4=LN |last5=Majumdar |first5=I |last6=Tong |first6=J |last7=Cheng |first7=H |last8=Kim |first8=BH |last9=Grishin |first9=NV |title=Analysis of CASP8 targets, predictions and assessment methods. |journal=Database: The Journal of Biological Databases and Curation |date=2009 |volume=2009 |pages=bap003 |doi=10.1093/database/bap003 |pmid=20157476 |pmc=2794793}}. [http://prodata.swmed.edu/CASP8/evaluation/Scores.htm Related page]{{cite journal |last1=Sadreyev |first1=RI |last2=Shi |first2=S |last3=Baker |first3=D |last4=Grishin |first4=NV |title=Structure similarity measure with penalty for close non-equivalent residues. |journal=Bioinformatics |date=15 May 2009 |volume=25 |issue=10 |pages=1259–63 |doi=10.1093/bioinformatics/btp148 |pmid=19321733 |pmc=2677741}}
The primary GDT assessment uses only the alpha carbon atoms. To apply superposition‐based scoring to the amino acid residue side chains, a GDT‐like score called "global distance calculation for sidechains" (GDC_sc) was designed and implemented within the LGA program in 2008. Instead of comparing residue positions on the basis of alpha carbons, GDC_sc uses a predefined "characteristic atom" near the end of each residue for the evaluation of inter-residue distance deviations. An "all atoms" variant of the GDC score (GDC_all) is calculated using full-model information, and is one of the standard measures used by CASP's organizers and assessors to evaluate accuracy of predicted structural models.{{cite journal |vauthors=Modi V, Xu QF, Adhikari S, Dunbrack RL |year=2016 |title=Assessment of template-based modeling of protein structure in CASP11 |journal=Proteins |volume=84 |issue=Suppl 1 |pages=200–220 |pmid=27081927 |pmc=5030193 |doi=10.1002/prot.25049 }}
GDT scores are generally computed with respect to a single reference structure. In some cases, structural models with lower GDT scores to a reference structure determined by protein NMR are nevertheless better fits to the underlying experimental data.{{cite journal |last1=MacCallum |first1=Justin L. |last2=Hua |first2=Lan |last3=Schnieders |first3=Michael J. |last4=Pande |first4=Vijay S. |last5=Jacobson |first5=Matthew P. |last6=Dill |first6=Ken A. |title=Assessment of the protein-structure refinement category in CASP8 |journal=Proteins: Structure, Function, and Bioinformatics |date=2009 |volume=77 |issue=S9 |pages=66–80 |doi=10.1002/prot.22538|pmid=19714776 |pmc=2801025 |doi-access=free }} Methods have been developed to estimate the uncertainty of GDT scores due to protein flexibility and uncertainty in the reference structure.{{cite journal |last1=Li |first1=Wenlin |last2=Schaeffer |first2=R. Dustin |last3=Otwinowski |first3=Zbyszek |last4=Grishin |first4=Nick V. |title=Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models |journal=PLOS ONE |date=5 May 2016 |volume=11 |issue=5 |pages=e0154786 |doi=10.1371/journal.pone.0154786|pmid=27149620 |pmc=4858170 |bibcode=2016PLoSO..1154786L |doi-access=free }}
See also
- Root mean square deviation (bioinformatics) — A different structure comparison measure.
- TM-score — A different structure comparison measure.
References
{{reflist|30em}}
External links
- [https://predictioncenter.org/casp14/results.cgi CASP14 results] - summary tables of the latest CASP experiment run in 2020, including example plots of GDT score as a function of cutoff distance
- [http://proteinmodel.org/AS2TS/LGA/lga.html GDT, GDC, LCS and LGA description] services and documentation on structure comparison and similarity measures.