Protein subcellular localization prediction
{{Short description|Prediction of where a protein resides in a cell}}
Protein subcellular localization prediction (or just protein localization prediction) involves the prediction of where a protein resides in a cell, its subcellular localization.
In general, prediction tools take as input information about a protein, such as a protein sequence of amino acids, and produce a predicted location within the cell as output, such as the nucleus, Endoplasmic reticulum, Golgi apparatus, extracellular space, or other organelles. The aim is to build tools that can accurately predict the outcome of protein targeting in cells.
Prediction of protein subcellular localization is an important component of bioinformatics based prediction of protein function and genome annotation, and it can aid the identification of drug targets.
Background
Experimentally determining the subcellular localization of a protein can be a laborious and time consuming task. Immunolabeling or tagging (such as with a green fluorescent protein) to view localization using fluorescence microscope are often used. A high throughput alternative is to use prediction.
Through the development of new approaches in computer science, coupled with an increased dataset of proteins of known localization, computational tools can now provide fast and accurate localization predictions for many organisms. This has resulted in subcellular localization prediction becoming one of the challenges being successfully aided by bioinformatics, and machine learning.
Many prediction methods now exceed the accuracy of some high-throughput laboratory methods for the identification of protein subcellular localization.{{cite journal |last1=Kaleel |first1=M |last2=Zheng |first2=Y |last3=Chen |first3=J |last4=Feng |first4=X |last5=Simpson |first5=JC |last6=Pollastri |first6=G |last7=Mooney |first7=C |title=SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks. |journal=Bioinformatics |date=6 March 2020 |volume=36 |issue=11 |pages=3343–3349 |doi=10.1093/bioinformatics/btaa156 |pmid=32142105|hdl=10197/12182 |hdl-access=free }}{{cite journal | vauthors = Rey S, Gardy JL, Brinkman FS | title = Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria | journal = BMC Genomics | volume = 6 | pages = 162 | year = 2005 | pmid = 16288665 | pmc = 1314894 | doi = 10.1186/1471-2164-6-162 | doi-access = free }}{{cite journal |last1=Kaleel |first1=Manaz |last2=Ellinger |first2=Liam |last3=Lalor |first3=Clodagh |last4=Pollastri |first4=Gianluca |last5=Mooney |first5=Catherine |title=SCLpred-MEM: Subcellular localization prediction of membrane proteins by deep N-to-1 convolutional neural networks |journal=Proteins: Structure, Function, and Bioinformatics |year=2021 |volume=89 |issue=10 |pages=1233–1239 |language=en |doi=10.1002/prot.26144|pmid=33983651 |s2cid=234484678 |doi-access=free |hdl=2346/90320 |hdl-access=free }} Particularly, some predictors have been developed{{cite journal | vauthors = Chou KC, Shen HB | title = Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms | journal = Nature Protocols | volume = 3 | issue = 2 | pages = 153–62 | year = 2008 | pmid = 18274516 | doi = 10.1038/nprot.2007.494 | s2cid = 226104 }} that can be used to deal with proteins that may simultaneously exist, or move between, two or more different subcellular locations. Experimental validation is typically required to confirm the predicted localizations.
Tools
{{main article|List of Protein subcellular localization prediction tools}}
In 1999 PSORT was the first published program to predict subcellular localization.{{Cite web|url=https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/subcellular/|title=Protein Subcellular Localization Prediction|website=www.ncbi.nlm.nih.gov|access-date=2016-12-31}} Subsequent tools and websites have been released using techniques such as artificial neural networks, support vector machine and protein motifs. Predictors can be specialized for proteins in different organisms. Some are specialized for eukaryotic proteins,{{cite journal | vauthors = Chou KC, Wu ZC, Xiao X | title = iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins | journal = PLOS ONE | volume = 6 | issue = 3 | pages = e18258 | year = 2011 | pmid = 21483473 | pmc = 3068162 | doi = 10.1371/journal.pone.0018258 | bibcode = 2011PLoSO...618258C | doi-access = free }} some for human proteins,{{cite journal | vauthors = Shen HB, Chou KC | title = A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0 | journal = Analytical Biochemistry | volume = 394 | issue = 2 | pages = 269–74 | date = Nov 2009 | pmid = 19651102 | doi = 10.1016/j.ab.2009.07.046 }} and some for plant proteins.{{cite journal | vauthors = Chou KC, Shen HB | title = Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization | journal = PLOS ONE | volume = 5 | issue = 6 | pages = e11335 | year = 2010 | pmid = 20596258 | pmc = 2893129 | doi = 10.1371/journal.pone.0011335 | bibcode = 2010PLoSO...511335C | doi-access = free }} Methods for the prediction of bacterial localization predictors, and their accuracy, have been reviewed.{{cite journal | vauthors = Gardy JL, Brinkman FS | title = Methods for predicting bacterial protein subcellular localization | journal = Nature Reviews. Microbiology | volume = 4 | issue = 10 | pages = 741–51 | date = Oct 2006 | pmid = 16964270 | doi = 10.1038/nrmicro1494 | s2cid = 62781755 }} In 2021, SCLpred-MEM, a membrane protein prediction tool powered by artificial neural networks was published.{{cite journal |last1=Kaleel |first1=Manaz |last2=Ellinger |first2=Liam |last3=Lalor |first3=Clodagh |last4=Pollastri |first4=Gianluca |last5=Mooney |first5=Catherine |title=SCLpred-MEM: Subcellular localization prediction of membrane proteins by deep N-to-1 convolutional neural networks |journal=Proteins: Structure, Function, and Bioinformatics |year=2021 |volume=89 |issue=10 |pages=1233–1239 |language=en |doi=10.1002/prot.26144|pmid=33983651 |s2cid=234484678 |doi-access=free |hdl=2346/90320 |hdl-access=free }} SCLpred-EMS is another tool powered by Artificial neural networks that classify proteins into endomembrane system and secretory pathway (EMS) versus all others.{{cite journal |last1=Kaleel |first1=Manaz |last2=Zheng |first2=Yandan |last3=Chen |first3=Jialiang |last4=Feng |first4=Xuanming |last5=Simpson |first5=Jeremy C |last6=Pollastri |first6=Gianluca |last7=Mooney |first7=Catherine |title=SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks |url=https://academic.oup.com/bioinformatics/article/36/11/3343/5788524 |journal=Bioinformatics |pages=3343–3349 |doi=10.1093/bioinformatics/btaa156 |date=1 June 2020|volume=36 |issue=11 |pmid=32142105 |hdl=10197/12182 |hdl-access=free }} Similarly, Light-Attention uses machine learning methods to predict ten different common subcellular locations.{{cite journal |last1=Rost |first1=Stark |last2=Heinzinger |first2=Dallago |title=Light Attention Predicts Protein Location from the Language of Life |url=https://www.biorxiv.org/content/early/2021/04/26/2021.04.25.441334 |website=Biorxiv |doi=10.1101/2021.04.25.441334 |date=26 April 2021|s2cid=233449747 |doi-access=free }}
The first model to generalize protein subcellular localization to all cell line does so by leveraging images of subcellular landmark stains (i.e., nuclear, plasma membrane, and endoplasmic reticulum markers) across multiple cell stains. Coupling multimodal data of landmark stains along with a pre-trained protein language model, the Prediction of Unseen Proteins' Subcellular Localization (PUPS) model is capable of generative subcellular localization prediction of any protein in any cell line given the protein's amino acid sequence and reference stains of the cell line. {{cite journal |last1=Zhang |first1=Xinyi |last2=Tseo |first2=Yitong|last3=Bai |first3=Yunhao|last4=Chen |first4=Fei|last5=Uhler |first5=Caroline |title=Prediction of protein subcellular localization in single cells |url=https://pmc.ncbi.nlm.nih.gov/articles/PMC11291118/ |website=Biorxiv |doi=10.1101/2024.07.25.605178 |date=25 July 2024 |pmc=11291118 }}
The development of protein subcellular location prediction has been summarized in two comprehensive review articles.Nakai, K. Protein sorting signals and prediction of subcellular localization. Adv. Protein Chem., 2000, 54, 277-344.Chou, K. C.; Shen, H. B. Review: Recent progresses in protein subcellular location prediction" Anal. Biochem 2007, 370, 1-16. Recent tools and an experience report can be found in a recent paper by [http://bioinformatics.ysu.edu/tools/subcell.html Meinken and Min (2012)].
Application
Knowledge of the subcellular localization of a protein can significantly improve target identification during the drug discovery process. For example, secreted proteins and plasma membrane proteins are easily accessible by drug molecules due to their localization in the extracellular space or on the cell surface.
Bacterial cell surface and secreted proteins are also of interest for their potential as vaccine candidates or as diagnostic targets. Aberrant subcellular localization of proteins has been observed in the cells of several diseases, such as cancer and Alzheimer's disease. Secreted proteins from some archaea that can survive in unusual environments have industrially important applications.
By using prediction a high number of proteins can be assessed in order to find candidates that are trafficked to the desired location.
Databases
The results of subcellular localization prediction can be stored in databases. Examples include the multi-species database [https://compartments.jensenlab.org/ Compartments], FunSecKB2, a fungal database;{{Cite web|url=http://bioinformatics.ysu.edu/secretomes/fungi2/index.php|title=FunSecKB2 (The Fungal Secretome and Subcellular Proteome KnowledgeBase 2.1)|website=bioinformatics.ysu.edu|access-date=2017-09-17|archive-url=https://web.archive.org/web/20160410112728/http://bioinformatics.ysu.edu/secretomes/fungi2/index.php|archive-date=2016-04-10|url-status=dead}} PlantSecKB, a plant database;{{Cite web|url=http://bioinformatics.ysu.edu/secretomes/plant/index.php|title=PlantSecKB (The Plant Secretome and Subcellular Proteome KnowledgeBase)|website=bioinformatics.ysu.edu|access-date=2017-09-17|archive-url=https://web.archive.org/web/20160406104932/http://bioinformatics.ysu.edu/secretomes/plant/index.php|archive-date=2016-04-06|url-status=dead}} MetazSecKB, an animal and human database;{{Cite web|url=http://bioinformatics.ysu.edu/secretomes/animal/index.php|title=MetazSecKB (The Metazoa (Human & Animal) Protein Subcelluar Location, Secretome and Subcellular Proteome Database)|website=bioinformatics.ysu.edu|access-date=2017-09-17|archive-url=https://web.archive.org/web/20160406104921/http://bioinformatics.ysu.edu/secretomes/animal/index.php|archive-date=2016-04-06|url-status=dead}} and ProtSecKB, a protist database.{{Cite web|url=http://proteomics.ysu.edu/secretomes/protist/index.php|title=ProtSecKB (The Protist Secretome and Subcellular Proteome KnowledgeBase)|website=proteomics.ysu.edu|access-date=2017-09-17}}
References
{{reflist|33em}}
Further reading
{{refbegin|33em}}
- {{cite journal | vauthors = Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y | title = Predicting function: from genes to genomes and back | journal = Journal of Molecular Biology | volume = 283 | issue = 4 | pages = 707–25 | date = Nov 1998 | pmid = 9790834 | doi = 10.1006/jmbi.1998.2144 }}
- {{cite journal | vauthors = Nakai K | title = Protein sorting signals and prediction of subcellular localization | journal = Advances in Protein Chemistry | volume = 54 | pages = 277–344 | year = 2000 | pmid = 10829231 | doi = 10.1016/s0065-3233(00)54009-1 | isbn = 0120342545 }}
- {{cite journal | vauthors = Emanuelsson O | title = Predicting protein subcellular localisation from amino acid sequence information | journal = Briefings in Bioinformatics | volume = 3 | issue = 4 | pages = 361–76 | date = Dec 2002 | pmid = 12511065 | doi = 10.1093/bib/3.4.361 | doi-access = free }}
- {{cite journal | vauthors = Schneider G, Fechner U | title = Advances in the prediction of protein targeting signals | journal = Proteomics | volume = 4 | issue = 6 | pages = 1571–80 | date = Jun 2004 | pmid = 15174127 | doi = 10.1002/pmic.200300786 | s2cid = 7217647 }}
- {{cite journal | vauthors = Gardy JL, Brinkman FS | title = Methods for predicting bacterial protein subcellular localization | journal = Nature Reviews. Microbiology | volume = 4 | issue = 10 | pages = 741–51 | date = Oct 2006 | pmid = 16964270 | doi = 10.1038/nrmicro1494 | s2cid = 62781755 }}
- {{cite journal | vauthors = Chou KC, Shen HB | title = Recent progress in protein subcellular location prediction | journal = Analytical Biochemistry | volume = 370 | issue = 1 | pages = 1–16 | date = Nov 2007 | pmid = 17698024 | doi = 10.1016/j.ab.2007.07.006 }}
{{refend}}
Category:Biochemistry detection methods
Category:Computational science