protein domain

{{short description|Self-stable region of a protein's chain that folds independently from the rest}}

File:Pyruvate kinase protein domains.png, a protein with three domains ({{PDB|1PKN}}).]]

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length.{{cite journal | vauthors = Xu D, Nussinov R | title = Favorable domain size in proteins | journal = Folding & Design | volume = 3 | issue = 1 | pages = 11–7 | date = 1998-02-01 | pmid = 9502316 | doi = 10.1016/S1359-0278(98)00004-2 | doi-access = free }} The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

Background

The concept of the domain was first proposed in 1973 by Wetlaufer after X-ray

crystallographic studies of hen lysozyme{{cite journal | vauthors = Phillips DC | title = The three-dimensional structure of an enzyme molecule | journal = Scientific American | volume = 215 | issue = 5 | pages = 78–90 | date = November 1966 | pmid = 5978599 | doi = 10.1038/scientificamerican1166-78 | bibcode = 1966SciAm.215e..78P | s2cid = 39959172 }} and papain{{cite journal | vauthors = Drenth J, Jansonius JN, Koekoek R, Swen HM, Wolthers BG | title = Structure of papain | journal = Nature | volume = 218 | issue = 5145 | pages = 929–32 | date = June 1968 | pmid = 5681232 | doi = 10.1038/218929a0 | bibcode = 1968Natur.218..929D | s2cid = 4169127 }}

and by limited proteolysis studies of immunoglobulins.{{cite journal | vauthors = Porter RR | title = Structural studies of immunoglobulins | journal = Science | volume = 180 | issue = 4087 | pages = 713–6 | date = May 1973 | pmid = 4122075 | doi = 10.1126/science.180.4087.713 | bibcode = 1973Sci...180..713P }}{{cite journal | vauthors = Edelman GM | title = Antibody structure and molecular immunology | journal = Science | volume = 180 | issue = 4088 | pages = 830–40 | date = May 1973 | pmid = 4540988 | doi = 10.1126/science.180.4088.830 | bibcode = 1973Sci...180..830E }} Wetlaufer defined domains as stable units of protein structure that could fold autonomously. In the past domains have been described as units of:

compact structure{{cite journal | vauthors = Richardson JS | title = The anatomy and taxonomy of protein structure | journal = Advances in Protein Chemistry | volume = 34 | pages = 167–339 | year = 1981 | pmid = 7020376 | doi = 10.1016/S0065-3233(08)60520-3 | url = http://kinemage.biochem.duke.edu/teaching/anatax/index.html | isbn = 9780120342341 | access-date = 3 January 2009 | archive-date = 10 February 2019 | archive-url = https://web.archive.org/web/20190210191643/http://kinemage.biochem.duke.edu/teaching/anatax/index.html | url-status = dead }}
function and evolution{{cite journal | vauthors = Bork P | title = Shuffled domains in extracellular proteins | journal = FEBS Letters | volume = 286 | issue = 1–2 | pages = 47–54 | date = July 1991 | pmid = 1864378 | doi = 10.1016/0014-5793(91)80937-X | s2cid = 22126481 | doi-access = free | bibcode = 1991FEBSL.286...47B }}
folding.{{cite journal | vauthors = Wetlaufer DB | title = Nucleation, rapid folding, and globular intrachain regions in proteins | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 70 | issue = 3 | pages = 697–701 | date = March 1973 | pmid = 4351801 | pmc = 433338 | doi = 10.1073/pnas.70.3.697 | bibcode = 1973PNAS...70..697W | doi-access = free }}

Each definition is valid and will often overlap, i.e. a compact structural domain that is found amongst diverse proteins is likely to fold independently within its structural environment. Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities.{{cite journal | vauthors = Chothia C | title = Proteins. One thousand families for the molecular biologist | journal = Nature | volume = 357 | issue = 6379 | pages = 543–4 | date = June 1992 | pmid = 1608464 | doi = 10.1038/357543a0 | author-link = Cyrus Chothia | bibcode = 1992Natur.357..543C | s2cid = 4355476 | doi-access = free }} In a multidomain protein, each domain may fulfill its own function independently, or in a concerted manner with its neighbours. Domains can either serve as modules for building up large assemblies such as virus particles or muscle fibres, or can provide specific catalytic or binding sites as found in enzymes or regulatory proteins.

Example: Pyruvate kinase

An appropriate example is pyruvate kinase (see first figure), a glycolytic enzyme that plays an important role in regulating the flux from fructose-1,6-biphosphate to pyruvate. It contains an all-β nucleotide-binding domain (in blue), an α/β-substrate binding domain (in grey) and an α/β-regulatory domain (in olive green),{{cite journal | vauthors = Bakszt R, Wernimont A, Allali-Hassani A, Mok MW, Hills T, Hui R, Pizarro JC | title = The crystal structure of Toxoplasma gondii pyruvate kinase 1 | journal = PLOS ONE | volume = 5 | issue = 9 | pages = e12736 | date = September 2010 | pmid = 20856875 | pmc = 2939071 | doi = 10.1371/journal.pone.0012736 | bibcode = 2010PLoSO...512736B | doi-access = free }} connected by several polypeptide linkers.{{cite journal | vauthors = George RA, Heringa J | title = An analysis of protein domain linkers: their classification and role in protein folding | journal = Protein Engineering | volume = 15 | issue = 11 | pages = 871–9 | date = November 2002 | pmid = 12538906 | doi = 10.1093/protein/15.11.871 | doi-access = free }} Each domain in this protein occurs in diverse sets of protein families.{{Cite web|url=https://proteinstructures.com/Structure/Structure/protein-domains.html|title=Protein Domains, Domain Assignment, Identification and Classification According to CATH and SCOP Databases|website=proteinstructures.com|access-date=2018-10-14}}

The central α/β-barrel substrate binding domain is one of the most common enzyme folds. It is seen in many different enzyme families catalysing completely unrelated reactions.{{cite journal | vauthors = Hegyi H, Gerstein M | title = The relationship between protein structure and function: a comprehensive survey with application to the yeast genome | journal = Journal of Molecular Biology | volume = 288 | issue = 1 | pages = 147–64 | date = April 1999 | pmid = 10329133 | doi = 10.1006/jmbi.1999.2661 | citeseerx = 10.1.1.217.9806 }} The α/β-barrel is commonly called the TIM barrel named after triose phosphate isomerase, which was the first such structure to be solved.{{cite journal | vauthors = Banner DW, Bloomer AC, Petsko GA, Phillips DC, Pogson CI, Wilson IA, Corran PH, Furth AJ, Milman JD, Offord RE, Priddle JD, Waley SG | display-authors = 6 | title = Structure of chicken muscle triose phosphate isomerase determined crystallographically at 2.5 angstrom resolution using amino acid sequence data | journal = Nature | volume = 255 | issue = 5510 | pages = 609–14 | date = June 1975 | pmid = 1134550 | doi = 10.1038/255609a0 | bibcode = 1975Natur.255..609B | s2cid = 4195346 }} It is currently classified into 26 homologous families in the CATH domain database.{{cite journal | vauthors = Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM | title = CATH--a hierarchic classification of protein domain structures | journal = Structure | volume = 5 | issue = 8 | pages = 1093–108 | date = August 1997 | pmid = 9309224 | doi = 10.1016/S0969-2126(97)00260-8 | doi-access = free }} The TIM barrel is formed from a sequence of β-α-β motifs closed by the first and last strand hydrogen bonding together, forming an eight stranded barrel. There is debate about the evolutionary origin of this domain. One study has suggested

that a single ancestral enzyme could have diverged into several families,{{cite journal | vauthors = Copley RR, Bork P | title = Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways | journal = Journal of Molecular Biology | volume = 303 | issue = 4 | pages = 627–41 | date = November 2000 | pmid = 11054297 | doi = 10.1006/jmbi.2000.4152 | doi-access = free }} while another suggests that a stable TIM-barrel structure has evolved

through convergent evolution.{{cite journal | vauthors = Lesk AM, Brändén CI, Chothia C | title = Structural principles of alpha/beta barrel proteins: the packing of the interior of the sheet | journal = Proteins | volume = 5 | issue = 2 | pages = 139–48 | year = 1989 | pmid = 2664768 | doi = 10.1002/prot.340050208 | s2cid = 15340449 }}

The TIM-barrel in pyruvate kinase is 'discontinuous', meaning that more than one segment of the polypeptide is required to form the domain. This is likely to be the result of the insertion of one domain into another during the protein's evolution. It has been shown from known structures that about a quarter of structural domains are discontinuous.{{cite journal | vauthors = Jones S, Stewart M, Michie A, Swindells MB, Orengo C, Thornton JM | title = Domain assignment for protein structures using a consensus approach: characterization and analysis | journal = Protein Science | volume = 7 | issue = 2 | pages = 233–42 | date = February 1998 | pmid = 9521098 | pmc = 2143930 | doi = 10.1002/pro.5560070202 }}{{cite journal | vauthors = Holm L, Sander C | title = Parser for protein folding units | journal = Proteins | volume = 19 | issue = 3 | pages = 256–68 | date = July 1994 | pmid = 7937738 | doi = 10.1002/prot.340190309 | s2cid = 525264 }} The inserted β-barrel regulatory domain is 'continuous', made up of a single stretch of polypeptide.{{cn|date=November 2023}}

Units of protein structure

The primary structure (string of amino acids) of a protein ultimately encodes its uniquely folded three-dimensional (3D) conformation.{{cite journal | vauthors = Anfinsen CB, Haber E, Sela M, White FH | title = The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 47 | issue = 9 | pages = 1309–14 | date = September 1961 | pmid = 13683522 | pmc = 223141 | doi = 10.1073/pnas.47.9.1309 | bibcode = 1961PNAS...47.1309A | doi-access = free }} The most important factor governing the folding of a protein into 3D structure is the distribution of polar and non-polar side chains.{{cite journal | vauthors = Cordes MH, Davidson AR, Sauer RT | title = Sequence space, folding and protein design | journal = Current Opinion in Structural Biology | volume = 6 | issue = 1 | pages = 3–10 | date = February 1996 | pmid = 8696970 | doi = 10.1016/S0959-440X(96)80088-1 }} Folding is driven by the burial of hydrophobic side chains into the interior of the molecule so to avoid contact with the aqueous environment. Generally proteins have a core of hydrophobic residues surrounded by a shell of hydrophilic residues. Since the peptide bonds themselves are polar they are neutralised by hydrogen bonding with each other when in the hydrophobic environment. This gives rise to regions of the polypeptide that form regular 3D structural patterns called secondary structure. There are two main types of secondary structure: α-helices and β-sheets.{{cn|date=November 2023}}

Some simple combinations of secondary structure elements have been found to frequently occur in protein structure and are referred to as supersecondary structure or motifs. For example, the β-hairpin motif consists of two adjacent antiparallel β-strands joined by a small loop. It is present in most antiparallel β structures both as an isolated ribbon and as part of more complex β-sheets. Another common super-secondary structure is the β-α-β motif, which is frequently used to connect two parallel β-strands. The central α-helix connects the C-termini of the first strand to the N-termini of the second strand, packing its side chains against the β-sheet and therefore shielding the hydrophobic residues of the β-strands from the surface.{{cn|date=November 2023}}

Covalent association of two domains represents a functional and structural advantage since there is an increase in stability when compared with the same structures non-covalently associated.{{cite journal | vauthors = Ghélis C, Yon JM | title = [Conformational coupling between structural units. A decisive step in the functional structure formation] | journal = Comptes Rendus de l'Académie des Sciences, Série D | volume = 289 | issue = 2 | pages = 197–9 | date = July 1979 | pmid = 117925 }} Other, advantages are the protection of intermediates within inter-domain enzymatic clefts that may

otherwise be unstable in aqueous environments, and a fixed stoichiometric ratio of the enzymatic activity necessary for a sequential set of reactions.{{Cite book | journal=Adv Protein Chem | volume=55 | pages=29–77 | year=2000 | vauthors = Ostermeier M, Benkovic SJ | title=Evolutionary Protein Design | chapter=Evolution of protein function by domain swapping | series=Advances in Protein Chemistry | pmid=11050932 | doi = 10.1016/s0065-3233(01)55002-0 | isbn=9780120342556 }}

Structural alignment is an important tool for determining domains.{{cn|date=November 2023}}

=Tertiary structure=

Several motifs pack together to form compact, local, semi-independent units called domains.

The overall 3D structure of the polypeptide chain is referred to as the protein's tertiary structure. Domains are the fundamental units of tertiary structure, each domain containing an individual hydrophobic core built from secondary structural units connected by loop regions. The packing of the polypeptide is usually much tighter in the interior than the exterior of the domain producing a solid-like core and a fluid-like surface.{{cite journal | vauthors = Zhou Y, Vitkup D, Karplus M | title = Native proteins are surface-molten solids: application of the Lindemann criterion for the solid versus liquid state | journal = Journal of Molecular Biology | volume = 285 | issue = 4 | pages = 1371–5 | date = January 1999 | pmid = 9917381 | doi = 10.1006/jmbi.1998.2374 | s2cid = 8702994 }} Core residues are often conserved in a protein family, whereas the residues in loops are less conserved, unless they are involved in the protein's function. Protein tertiary structure can be divided into four main classes based on the secondary structural content of the domain.{{cite journal | vauthors = Levitt M, Chothia C | title = Structural patterns in globular proteins | journal = Nature | volume = 261 | issue = 5561 | pages = 552–8 | date = June 1976 | pmid = 934293 | doi = 10.1038/261552a0 | bibcode = 1976Natur.261..552L | s2cid = 4154884 }}

All-α domains have a domain core built exclusively from α-helices. This class is dominated by small folds, many of which form a simple bundle with helices running up and down.
All-β domains have a core composed of antiparallel β-sheets, usually two sheets packed against each other. Various patterns can be identified in the arrangement of the strands, often giving rise to the identification of recurring motifs, for example the Greek key motif.{{cite journal | vauthors = Hutchinson EG, Thornton JM | title = The Greek key motif: extraction, classification and analysis | journal = Protein Engineering | volume = 6 | issue = 3 | pages = 233–45 | date = April 1993 | pmid = 8506258 | doi = 10.1093/protein/6.3.233 }}
α+β domains are a mixture of all-α and all-β motifs. Classification of proteins into this class is difficult because of overlaps to the other three classes and therefore is not used in the CATH domain database.
α/β domains are made from a combination of β-α-β motifs that predominantly form a parallel β-sheet surrounded by amphipathic α-helices. The secondary structures are arranged in layers or barrels.

=Limits on size=

Domains have limits on size.{{cite journal | vauthors = Savageau MA | title = Proteins of Escherichia coli come in sizes that are multiples of 14 kDa: domain concepts and evolutionary implications | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 83 | issue = 5 | pages = 1198–202 | date = March 1986 | pmid = 3513170 | pmc = 323042 | doi = 10.1073/pnas.83.5.1198 | bibcode = 1986PNAS...83.1198S | doi-access = free }} The size of individual structural domains varies from 36 residues in E-selectin to 692 residues in lipoxygenase-1, but the majority, 90%, have fewer than 200 residues with an average of approximately 100 residues.{{cite journal | vauthors = Wheelan SJ, Marchler-Bauer A, Bryant SH | title = Domain size distributions can predict domain boundaries | journal = Bioinformatics | volume = 16 | issue = 7 | pages = 613–8 | date = July 2000 | pmid = 11038331 | doi = 10.1093/bioinformatics/16.7.613 | doi-access = free }} Very short domains, less than 40 residues, are often stabilised by metal ions or disulfide bonds. Larger domains, greater than 300 residues, are likely to consist of multiple hydrophobic cores.{{cite book |last=Garel |first=J. |year=1992 |chapter=Folding of large proteins: Multidomain and multisubunit proteins |editor1-last=Creighton |editor1-first=T. |title=Protein Folding |pages=405–454 |publisher=W.H. Freeman and Company |location=New York |edition= First |isbn=978-0-7167-7027-5 }}

=Quaternary structure=

Many proteins have a quaternary structure, which consists of several polypeptide chains that associate into an oligomeric molecule. Each polypeptide chain in such a protein is called a subunit. Hemoglobin, for example, consists of two α and two β subunits. Each of the four chains has an all-α globin fold with a heme pocket.{{cn|date=November 2023}}

==Domain swapping==

Domain swapping is a mechanism for forming oligomeric assemblies.{{cite journal | vauthors = Bennett MJ, Schlunegger MP, Eisenberg D | title = 3D domain swapping: a mechanism for oligomer assembly | journal = Protein Science | volume = 4 | issue = 12 | pages = 2455–68 | date = December 1995 | pmid = 8580836 | pmc = 2143041 | doi = 10.1002/pro.5560041202 }} In domain swapping, a secondary or tertiary element of a monomeric protein is replaced by the same element of another protein. Domain swapping can range from secondary structure elements to whole structural domains. It also represents a model of evolution for functional adaptation by oligomerisation, e.g. oligomeric enzymes that have their active site at subunit interfaces.{{cite journal | vauthors = Heringa J, Taylor WR | title = Three-dimensional domain duplication, swapping and stealing | journal = Current Opinion in Structural Biology | volume = 7 | issue = 3 | pages = 416–21 | date = June 1997 | pmid = 9204285 | doi = 10.1016/S0959-440X(97)80060-7 }}

Domains as evolutionary modules

Nature is a tinkerer and not an inventor,{{cite journal | vauthors = Jacob F | title = Evolution and tinkering | journal = Science | volume = 196 | issue = 4295 | pages = 1161–6 | date = June 1977 | pmid = 860134 | doi = 10.1126/science.860134 | url = https://semanticscholar.org/paper/e806647ac8f0ccc546934de20d536f14a2719738 | bibcode = 1977Sci...196.1161J | s2cid = 29756896 }} new sequences are adapted from pre-existing sequences rather than invented. Domains are the common material used by nature to generate new sequences; they can be thought of as genetically mobile units, referred to as 'modules'. Often, the C and N termini of domains are close together in space, allowing them to easily be "slotted into" parent structures during the process of evolution. Many domain families are found in all three forms of life, Archaea, Bacteria and Eukarya.{{cite journal | vauthors = Ren S, Yang G, He Y, Wang Y, Li Y, Chen Z | title = The conservation pattern of short linear motifs is highly correlated with the function of interacting protein domains | journal = BMC Genomics | volume = 9 | pages = 452 | date = October 2008 | pmid = 18828911 | pmc = 2576256 | doi = 10.1186/1471-2164-9-452 | doi-access = free }} Protein modules are a subset of protein domains which are found across a range of different proteins with a particularly versatile structure. Examples can be found among extracellular proteins associated with clotting, fibrinolysis, complement, the extracellular matrix, cell surface adhesion molecules and cytokine receptors.{{cite journal | vauthors = Campbell ID, Downing AK | title = Building protein structure and function from modular units | journal = Trends in Biotechnology | volume = 12 | issue = 5 | pages = 168–72 | date = May 1994 | pmid = 7764899 | doi = 10.1016/0167-7799(94)90078-7 }} Four concrete examples of widespread protein modules are the following domains: SH2, immunoglobulin, fibronectin type 3 and the kringle.{{Cite book|title=Molecular biology of the cell|last=Bruce|first=Alberts|isbn=9780815344322|edition= Sixth|location=New York, NY|oclc=887605755|date = 2014-11-18}}

Molecular evolution gives rise to families of related proteins with similar sequence and structure. However, sequence similarities can be extremely low between proteins that share the same structure. Protein structures may be similar because proteins have diverged from a common ancestor. Alternatively, some folds may be more favored than others as they represent stable arrangements of secondary structures and some proteins may converge towards these folds over the course of evolution. There are currently about 110,000 experimentally determined protein 3D structures deposited within the Protein Data Bank (PDB).{{cite web|url=http://www.pdb.org/|title=wwPDB: Worldwide Protein Data Bank|last=wwPDB.org|website=www.pdb.org|access-date=25 July 2007|archive-url=https://web.archive.org/web/20150407064348/http://www.pdb.org/|archive-date=7 April 2015|url-status=dead}} However, this set contains many identical or very similar structures. All proteins should be classified to structural families to understand their evolutionary relationships. Structural comparisons are best achieved at the domain level. For this reason many algorithms have been developed to automatically assign domains in proteins with known 3D structure (see {{slink||Domain definition from structural co-ordinates}}).{{cn|date=November 2023}}

The CATH domain database classifies domains into approximately 800 fold families; ten of these folds are highly populated and are referred to as 'super-folds'. Super-folds are defined as folds for which there are at least three structures without significant sequence similarity.{{cite journal | vauthors = Orengo CA, Jones DT, Thornton JM | title = Protein superfamilies and domain superfolds | journal = Nature | volume = 372 | issue = 6507 | pages = 631–4 | date = December 1994 | pmid = 7990952 | doi = 10.1038/372631a0 | bibcode = 1994Natur.372..631O | s2cid = 4330359 }} The most populated is the α/β-barrel super-fold, as described previously.

Multidomain proteins

The majority of proteins, two-thirds in unicellular organisms and more than 80% in metazoa, are multidomain proteins.{{cite journal | vauthors = Apic G, Gough J, Teichmann SA | title = Domain combinations in archaeal, eubacterial and eukaryotic proteomes | journal = Journal of Molecular Biology | volume = 310 | issue = 2 | pages = 311–25 | date = July 2001 | pmid = 11428892 | doi = 10.1006/jmbi.2001.4776 | s2cid = 11894663 }} However, other studies concluded that 40% of prokaryotic proteins consist of multiple domains while eukaryotes have approximately 65% multi-domain proteins.{{cite journal | vauthors = Ekman D, Björklund AK, Frey-Skött J, Elofsson A | title = Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions | journal = Journal of Molecular Biology | volume = 348 | issue = 1 | pages = 231–43 | date = April 2005 | pmid = 15808866 | doi = 10.1016/j.jmb.2005.02.007 }}

Many domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes,{{cite journal | vauthors = Davidson JN, Chen KC, Jamison RS, Musmanno LA, Kern CB | title = The evolutionary history of the first three enzymes in pyrimidine biosynthesis | journal = BioEssays | volume = 15 | issue = 3 | pages = 157–64 | date = March 1993 | pmid = 8098212 | doi = 10.1002/bies.950150303 | s2cid = 24897614 }} suggesting that domains in multidomain proteins have once existed as independent proteins. For example, vertebrates have a multi-enzyme polypeptide containing the GAR synthetase, AIR synthetase and GAR transformylase domains (GARs-AIRs-GARt; GAR: glycinamide ribonucleotide synthetase/transferase; AIR: aminoimidazole ribonucleotide synthetase). In insects, the polypeptide appears as GARs-(AIRs)2-GARt, in yeast GARs-AIRs is encoded separately from GARt, and in bacteria each domain is encoded separately.{{cite journal | vauthors = Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L | title = Gene families: the taxonomy of protein paralogs and chimeras | journal = Science | volume = 278 | issue = 5338 | pages = 609–14 | date = October 1997 | pmid = 9381171 | doi = 10.1126/science.278.5338.609 | citeseerx = 10.1.1.562.2262 | bibcode = 1997Sci...278..609H }}

{{Panorama

|image = File:ATRNL1 Bitmap.png

|height = 100

|caption = (scrollable image) Attractin-like protein 1 (ATRNL1) is a multi-domain protein found in animals, including humans.{{cite journal | vauthors = Walker WP, Aradhya S, Hu CL, Shen S, Zhang W, Azarani A, Lu X, Barsh GS, Gunn TM | display-authors = 6 | title = Genetic analysis of attractin homologs | journal = Genesis | volume = 45 | issue = 12 | pages = 744–56 | date = December 2007 | pmid = 18064672 | doi = 10.1002/dvg.20351 | s2cid = 20878849 }}{{Cite web|url=http://smart.embl.de|title=SMART: Main page|website=smart.embl.de|access-date=2017-01-01}} Each unit is one domain, e.g. the EGF or Kelch domains.

}}

=Origin=

Multidomain proteins are likely to have emerged from selective pressure during evolution to create new functions. Various proteins have diverged from common ancestors by different combinations and associations of domains. Modular units frequently move about, within and between biological systems through mechanisms of genetic shuffling:

transposition of mobile elements including horizontal transfers (between species);{{cite journal | vauthors = Bork P, Doolittle RF | title = Proposed acquisition of an animal protein domain by bacteria | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 89 | issue = 19 | pages = 8990–4 | date = October 1992 | pmid = 1409594 | pmc = 50050 | doi = 10.1073/pnas.89.19.8990 | bibcode = 1992PNAS...89.8990B | doi-access = free }}
gross rearrangements such as inversions, translocations, deletions and duplications;
homologous recombination;
slippage of DNA polymerase during replication.

=Types of organization=

File:Domain Homology.png modules (maroon) into two different proteins.]]

The simplest multidomain organization seen in proteins is that of a single domain repeated in tandem.{{cite journal | vauthors = Heringa J | title = Detection of internal repeats: how common are they? | journal = Current Opinion in Structural Biology | volume = 8 | issue = 3 | pages = 338–45 | date = June 1998 | pmid = 9666330 | doi = 10.1016/S0959-440X(98)80068-7 }} The domains may interact with each other (domain-domain interaction) or remain isolated, like beads on string. The giant 30,000 residue muscle protein titin comprises about 120 fibronectin-III-type and Ig-type domains.{{cite journal | vauthors = Politou AS, Gautel M, Improta S, Vangelista L, Pastore A | title = The elastic I-band region of titin is assembled in a "modular" fashion by weakly interacting Ig-like domains | journal = Journal of Molecular Biology | volume = 255 | issue = 4 | pages = 604–16 | date = February 1996 | pmid = 8568900 | doi = 10.1006/jmbi.1996.0050 }} In the serine proteases, a gene duplication event has led to the formation of a two β-barrel domain enzyme.{{cite journal | vauthors = McLachlan AD | title = Gene duplications in the structural evolution of chymotrypsin | journal = Journal of Molecular Biology | volume = 128 | issue = 1 | pages = 49–79 | date = February 1979 | pmid = 430571 | doi = 10.1016/0022-2836(79)90308-5 }} The repeats have diverged so widely that there is no obvious sequence similarity between them. The active site is located at a cleft between the two β-barrel domains, in which functionally important residues are contributed from each domain. Genetically engineered mutants of the chymotrypsin serine protease were shown to have some proteinase activity even though their active site residues were abolished and it has therefore been postulated that the duplication event enhanced the enzyme's activity.

Modules frequently display different connectivity relationships, as illustrated by the kinesins and ABC transporters. The kinesin motor domain can be at either end of a polypeptide chain that includes a coiled-coil region and a cargo domain.{{cite journal | vauthors = Moore JD, Endow SA | title = Kinesin proteins: a phylum of motors for microtubule-based motility | journal = BioEssays | volume = 18 | issue = 3 | pages = 207–19 | date = March 1996 | pmid = 8867735 | doi = 10.1002/bies.950180308 | s2cid = 46012215 }} ABC transporters are built with up to four domains consisting of two unrelated modules, ATP-binding cassette and an integral membrane module, arranged in various combinations.

Not only do domains recombine, but there are many examples of a domain having been inserted into another. Sequence or structural similarities to other

domains demonstrate that homologues of inserted and parent domains can exist independently. An example is that of the 'fingers' inserted into the 'palm' domain within the polymerases of the Pol I family.{{cite journal | vauthors = Russell RB | title = Domain insertion | journal = Protein Engineering | volume = 7 | issue = 12 | pages = 1407–10 | date = December 1994 | pmid = 7716150 | doi = 10.1093/protein/7.12.1407 }} Since a domain can be inserted into another, there should always be at least one continuous domain in a multidomain protein. This is the main difference between definitions of structural domains and evolutionary/functional domains. An evolutionary domain will be limited to one or two connections between domains, whereas structural domains can have unlimited connections, within a given criterion of the existence of a common core. Several structural domains could be assigned to an evolutionary domain.{{cn|date=November 2023}}

A superdomain consists of two or more conserved domains of nominally independent origin, but subsequently inherited as a single structural/functional unit.{{cite journal | vauthors = Haynie DT, Xue B | title = Superdomains in the protein structure hierarchy: The case of PTP-C2 | journal = Protein Science | volume = 24 | issue = 5 | pages = 874–82 | date = May 2015 | pmid = 25694109 | pmc = 4420535 | doi = 10.1002/pro.2664 }} This combined superdomain can occur in diverse proteins that are not related by gene duplication alone. An example of a superdomain is the protein tyrosine phosphatase–C2 domain pair in PTEN, tensin, auxilin and the membrane protein TPTE2. This superdomain is found in proteins in animals, plants and fungi. A key feature of the PTP-C2 superdomain is amino acid residue conservation in the domain interface.

Domains are autonomous folding units

= Folding =

Protein folding - the unsolved problem : Since the seminal work of Anfinsen in the early 1960s, the goal to completely understand the mechanism by which a polypeptide rapidly folds into its stable native conformation remains elusive. Many experimental folding studies have contributed much to our understanding, but the principles that govern protein folding are still based on those discovered in the very first studies of folding. Anfinsen showed that the native state of a protein is thermodynamically stable, the conformation being at a global minimum of its free energy.{{cn|date=November 2023}}

Folding is a directed search of conformational space allowing the protein to fold on a biologically feasible time scale. The Levinthal paradox states that if an averaged sized protein would sample all possible conformations before finding the one with the lowest energy, the whole process would take billions of years.{{cite journal | journal = J Chim Phys | volume = 65 | pages = 44–45 | year = 1968 | vauthors = Levinthal C | title = Are there pathways for protein folding? | url = http://www.biochem.wisc.edu/courses/biochem704/Reading/Levinthal1968.pdf |url-status = dead| archive-url = https://web.archive.org/web/20090902211239/http://www.biochem.wisc.edu/courses/biochem704/Reading/Levinthal1968.pdf | archive-date = 2 September 2009 | df = dmy-all | doi = 10.1051/jcp/1968650044 | bibcode = 1968JCP....65...44L }} Proteins typically fold within 0.1 and 1000 seconds. Therefore, the protein folding process must be directed some way through a specific folding pathway. The forces

that direct this search are likely to be a combination of local and global influences whose effects are felt at various stages of the reaction.{{cite journal | vauthors = Dill KA | title = Polymer principles and protein folding | journal = Protein Science | volume = 8 | issue = 6 | pages = 1166–80 | date = June 1999 | pmid = 10386867 | pmc = 2144345 | doi = 10.1110/ps.8.6.1166 }}

Advances in experimental and theoretical studies have shown that folding can be viewed in terms of energy landscapes,{{cite journal | vauthors = Leopold PE, Montal M, Onuchic JN | title = Protein folding funnels: a kinetic approach to the sequence-structure relationship | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 89 | issue = 18 | pages = 8721–5 | date = September 1992 | pmid = 1528885 | pmc = 49992 | doi = 10.1073/pnas.89.18.8721 | bibcode = 1992PNAS...89.8721L | doi-access = free }}{{cite journal | vauthors = Dill KA, Chan HS | title = From Levinthal to pathways to funnels | journal = Nature Structural Biology | volume = 4 | issue = 1 | pages = 10–9 | date = January 1997 | pmid = 8989315 | doi = 10.1038/nsb0197-10 | s2cid = 11557990 }} where folding kinetics is considered as a progressive organisation of an ensemble of partially folded structures through which a protein passes on its way to the folded structure. This has been described in terms of a folding funnel, in which an unfolded protein has a large number of conformational states available and there are fewer states available to the folded protein. A funnel implies that for protein folding there is a decrease in energy and loss of entropy with increasing tertiary structure formation. The local roughness of the funnel reflects kinetic traps, corresponding to the accumulation of misfolded intermediates. A folding chain progresses toward lower intra-chain free-energies by increasing its compactness. The chain's conformational options become increasingly narrowed ultimately toward one native structure.

=Advantage of domains in protein folding=

The organisation of large proteins by structural domains represents an advantage for protein folding, with each domain being able to individually fold, accelerating the folding process and reducing a potentially large combination of residue interactions. Furthermore, given the observed random distribution of hydrophobic residues in proteins,{{cite journal | vauthors = White SH, Jacobs RE | title = Statistical distribution of hydrophobic residues along the length of protein chains. Implications for protein folding and evolution | journal = Biophysical Journal | volume = 57 | issue = 4 | pages = 911–21 | date = April 1990 | pmid = 2188687 | pmc = 1280792 | doi = 10.1016/S0006-3495(90)82611-4 | bibcode = 1990BpJ....57..911W }} domain formation appears to be the optimal solution for a large protein to bury its hydrophobic residues while keeping the hydrophilic residues at the surface.{{cite journal | vauthors = George RA, Heringa J | title = SnapDRAGON: a method to delineate protein structural domains from sequence data | journal = Journal of Molecular Biology | volume = 316 | issue = 3 | pages = 839–51 | date = February 2002 | pmid = 11866536 | doi = 10.1006/jmbi.2001.5387 | citeseerx = 10.1.1.329.2921 }}{{cite journal | vauthors = George RA, Lin K, Heringa J | title = Scooby-domain: prediction of globular domains in protein sequence | journal = Nucleic Acids Research | volume = 33 | issue = Web Server issue | pages = W160-3 | date = July 2005 | pmid = 15980446 | pmc = 1160142 | doi = 10.1093/nar/gki381 }}

However, the role of inter-domain interactions in protein folding and in energetics of stabilisation of the native structure, probably differs for each protein. In T4 lysozyme, the influence of one domain on the other is so strong that the entire molecule is resistant to proteolytic cleavage. In this case, folding is a sequential process where the C-terminal domain is required to fold independently in an early step, and the other domain requires the presence of the folded C-terminal domain for folding and stabilisation.{{cite journal | vauthors = Desmadril M, Yon JM | title = Existence of intermediates in the refolding of T4 lysozyme at pH 7.4 | journal = Biochemical and Biophysical Research Communications | volume = 101 | issue = 2 | pages = 563–9 | date = July 1981 | pmid = 7306096 | doi = 10.1016/0006-291X(81)91296-1 }}

It has been found that the folding of an isolated domain can take place at the same rate or sometimes faster than that of the integrated domain,{{cite journal | vauthors = Teale JM, Benjamin DC | title = Antibody as immunological probe for studying refolding of bovine serum albumin. Refolding within each domain | journal = The Journal of Biological Chemistry | volume = 252 | issue = 13 | pages = 4521–6 | date = July 1977 | doi = 10.1016/S0021-9258(17)40192-X | pmid = 873903 | doi-access = free }} suggesting that unfavourable interactions with the rest of the protein can occur during folding. Several arguments suggest that the slowest step in the folding of large proteins is the pairing of the folded domains. This is either because the domains are not folded entirely correctly or because the small adjustments required for their interaction are energetically unfavourable,Creighton, T. E. (1983). Proteins: Structures and molecular properties. Freeman, New York. Second edition. such as the removal of water from the domain interface.

Domains and protein flexibility

Protein domain dynamics play a key role in a multitude of molecular recognition and signaling processes.

Protein domains, connected by intrinsically disordered flexible linker domains, induce long-range allostery via protein domain dynamics.

The resultant dynamic modes cannot be generally predicted from static structures of either the entire protein or individual domains. They can however be inferred by comparing different structures of a protein (as in Database of Molecular Motions). They can also be suggested by sampling in extensive molecular dynamics trajectories{{cite journal | vauthors = Potestio R, Pontiggia F, Micheletti C | title = Coarse-grained description of protein internal dynamics: an optimal strategy for decomposing proteins in rigid subunits | journal = Biophysical Journal | volume = 96 | issue = 12 | pages = 4993–5002 | date = June 2009 | pmid = 19527659 | pmc = 2712024 | doi = 10.1016/j.bpj.2009.03.051 | bibcode = 2009BpJ....96.4993P }} and principal component analysis,{{cite journal | vauthors = Baron R, Vellore NA | title = LSD1/CoREST is an allosteric nanoscale clamp regulated by H3-histone-tail molecular recognition | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 109 | issue = 31 | pages = 12509–14 | date = July 2012 | pmid = 22802671 | pmc = 3411975 | doi = 10.1073/pnas.1207892109 | bibcode = 2012PNAS..10912509B | doi-access = free }} or they can be directly observed using spectra{{cite journal | vauthors = Farago B, Li J, Cornilescu G, Callaway DJ, Bu Z | title = Activation of nanoscale allosteric protein domain motion revealed by neutron spin echo spectroscopy | journal = Biophysical Journal | volume = 99 | issue = 10 | pages = 3473–82 | date = November 2010 | pmid = 21081097 | pmc = 2980739 | doi = 10.1016/j.bpj.2010.09.058 | bibcode = 2010BpJ....99.3473F }}{{cite journal | vauthors = Bu Z, Biehl R, Monkenbusch M, Richter D, Callaway DJ | title = Coupled protein domain motion in Taq polymerase revealed by neutron spin-echo spectroscopy | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 102 | issue = 49 | pages = 17646–51 | date = December 2005 | pmid = 16306270 | pmc = 1345721 | doi = 10.1073/pnas.0503388102 | bibcode = 2005PNAS..10217646B | doi-access = free }}

measured by neutron spin echo spectroscopy.

Domain definition from structural co-ordinates

The importance of domains as structural building blocks and elements of evolution has brought about many automated methods for their identification and classification in proteins of known structure. Automatic procedures for reliable domain assignment is essential for the generation of the domain databases, especially as the number of known protein structures is increasing. Although the boundaries of a domain can be determined by visual inspection, construction of an automated method is not straightforward. Problems occur when faced with domains that are discontinuous or highly associated.{{cite journal | vauthors = Sowdhamini R, Blundell TL | title = An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins | journal = Protein Science | volume = 4 | issue = 3 | pages = 506–20 | date = March 1995 | pmid = 7795532 | pmc = 2143076 | doi = 10.1002/pro.5560040317 }} The fact that there is no standard definition of what a domain really is has meant that domain assignments have varied enormously, with each researcher using a unique set of criteria.{{cite journal | vauthors = Swindells MB | title = A procedure for detecting structural domains in proteins | journal = Protein Science | volume = 4 | issue = 1 | pages = 103–12 | date = January 1995 | pmid = 7773168 | pmc = 2142966 | doi = 10.1002/pro.5560040113 }}

A structural domain is a compact, globular sub-structure with more interactions within it than with the rest of the protein.{{cite journal | vauthors = Janin J, Wodak SJ | title = Structural domains in proteins and their role in the dynamics of protein function | journal = Progress in Biophysics and Molecular Biology | volume = 42 | issue = 1 | pages = 21–78 | year = 1983 | pmid = 6353481 | doi = 10.1016/0079-6107(83)90003-2 | doi-access = free }}

Therefore, a structural domain can be determined by two visual characteristics: its compactness and its extent of isolation.{{cite journal | vauthors = Tsai CJ, Nussinov R | title = Hydrophobic folding units derived from dissimilar monomer structures and their interactions | journal = Protein Science | volume = 6 | issue = 1 | pages = 24–42 | date = January 1997 | pmid = 9007974 | pmc = 2143523 | doi = 10.1002/pro.5560060104 }} Measures of local compactness in proteins have been used in many of the early methods of domain assignment{{cite journal | vauthors = Crippen GM | title = The tree structural organization of proteins | journal = Journal of Molecular Biology | volume = 126 | issue = 3 | pages = 315–32 | date = December 1978 | pmid = 745231 | doi = 10.1016/0022-2836(78)90043-8 }}{{cite journal | vauthors = Rossmann MG, Moras D, Olsen KW | title = Chemical and biological evolution of nucleotide-binding protein | journal = Nature | volume = 250 | issue = 463 | pages = 194–9 | date = July 1974 | pmid = 4368490 | doi = 10.1038/250194a0 | bibcode = 1974Natur.250..194R | s2cid = 4273028 }}{{cite journal | vauthors = Rose GD | title = Hierarchic organization of domains in globular proteins | journal = Journal of Molecular Biology | volume = 134 | issue = 3 | pages = 447–70 | date = November 1979 | pmid = 537072 | doi = 10.1016/0022-2836(79)90363-2 }}{{cite journal | vauthors = Go N, Taketomi H | title = Respective roles of short- and long-range interactions in protein folding | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 75 | issue = 2 | pages = 559–63 | date = February 1978 | pmid = 273218 | pmc = 411294 | doi = 10.1073/pnas.75.2.559 | bibcode = 1978PNAS...75..559G | doi-access = free }} and in several of the more recent methods.{{cite journal | vauthors = Islam SA, Luo J, Sternberg MJ | title = Identification and analysis of domains in proteins | journal = Protein Engineering | volume = 8 | issue = 6 | pages = 513–25 | date = June 1995 | pmid = 8532675 | doi = 10.1093/protein/8.6.513 }}{{cite journal | vauthors = Holm L, Sander C | title = Dali/FSSP classification of three-dimensional protein folds | journal = Nucleic Acids Research | volume = 25 | issue = 1 | pages = 231–4 | date = January 1997 | pmid = 9016542 | pmc = 146389 | doi = 10.1093/nar/25.1.231 }}{{cite journal | vauthors = Siddiqui AS, Barton GJ | title = Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions | journal = Protein Science | volume = 4 | issue = 5 | pages = 872–84 | date = May 1995 | pmid = 7663343 | pmc = 2143117 | doi = 10.1002/pro.5560040507 }}{{cite journal | vauthors = Zehfus MH | title = Identification of compact, hydrophobically stabilized domains and modules containing multiple peptide chains | journal = Protein Science | volume = 6 | issue = 6 | pages = 1210–9 | date = June 1997 | pmid = 9194181 | pmc = 2143719 | doi = 10.1002/pro.5560060609 }}{{cite journal | vauthors = Taylor WR | title = Protein structural domain identification | journal = Protein Engineering | volume = 12 | issue = 3 | pages = 203–16 | date = March 1999 | pmid = 10235621 | doi = 10.1093/protein/12.3.203 | doi-access = free }}

=Methods=

One of the first algorithms used a Cα-Cα distance map together with a hierarchical clustering routine that considered proteins as several small segments, 10 residues in length. The initial segments were clustered one after another based on inter-segment distances; segments with the shortest distances were clustered and considered as single segments thereafter. The stepwise clustering finally included the full protein. Go also exploited the fact that inter-domain distances are normally larger than intra-domain distances; all possible Cα-Cα distances were represented as diagonal plots in which there were distinct patterns for helices, extended strands and combinations of secondary structures.{{cn|date=November 2023}}

The method by Sowdhamini and Blundell clusters secondary structures in a protein based on their Cα-Cα distances and identifies domains from the pattern in

their dendrograms. As the procedure does not consider the protein as a continuous chain of amino acids there are no problems in treating discontinuous domains. Specific nodes in these dendrograms are identified as tertiary structural clusters of the protein, these include both super-secondary structures and domains. The DOMAK algorithm is used to create the 3Dee domain database. It calculates a 'split value' from the number of each type of contact when the protein is divided arbitrarily into two parts. This split value is

large when the two parts of the structure are distinct.{{cn|date=November 2023}}

The method of Wodak and Janin{{cite journal | vauthors = Wodak SJ, Janin J | title = Location of structural domains in protein | journal = Biochemistry | volume = 20 | issue = 23 | pages = 6544–52 | date = November 1981 | pmid = 7306523 | doi = 10.1021/bi00526a005 }} was based on the calculated interface areas between two chain segments repeatedly cleaved at various residue positions. Interface areas were calculated by comparing surface areas of the cleaved segments with that of the native structure. Potential domain boundaries can be identified at a site where the interface area was at a minimum. Other methods have used measures of solvent accessibility to calculate compactness.Rashin, 1985{{full citation needed|date=November 2021}}{{cite journal | vauthors = Zehfus MH, Rose GD | title = Compact units in proteins | journal = Biochemistry | volume = 25 | issue = 19 | pages = 5759–65 | date = September 1986 | pmid = 3778881 | doi = 10.1021/bi00367a062 }}

The PUU algorithm incorporates a harmonic model used to approximate inter-domain dynamics. The underlying physical concept is that many rigid interactions will occur within each domain and loose interactions will occur between domains. This algorithm is used to define domains in the FSSP domain database.

Swindells (1995) developed a method, DETECTIVE, for identification of domains in protein structures based on the idea that domains have a hydrophobic

interior. Deficiencies were found to occur when hydrophobic cores from different domains continue through the interface region.

[http://rigidfinder.molmovdb.org RigidFinder] is a novel method for identification of protein rigid blocks (domains and loops) from two different conformations. Rigid blocks are defined as blocks where all inter residue distances are conserved across conformations.

The method [http://ribfind.ismb.lon.ac.uk RIBFIND] developed by Pandurangan and Topf identifies rigid bodies in protein structures by performing spacial clustering of secondary structural elements in proteins.{{cite journal | vauthors = Pandurangan AP, Topf M | title = RIBFIND: a web server for identifying rigid bodies in protein structures and to aid flexible fitting into cryo EM maps | journal = Bioinformatics | volume = 28 | issue = 18 | pages = 2391–3 | date = September 2012 | pmid = 22796953 | doi = 10.1093/bioinformatics/bts446 | url = https://eprints.bbk.ac.uk/4980/1/4980.pdf }} The RIBFIND rigid bodies have been used to flexibly fit protein structures into cryo electron microscopy density maps.{{cite journal | vauthors = Pandurangan AP, Topf M | title = Finding rigid bodies in protein structures: Application to flexible fitting into cryoEM maps | journal = Journal of Structural Biology | volume = 177 | issue = 2 | pages = 520–31 | date = February 2012 | pmid = 22079400 | doi = 10.1016/j.jsb.2011.10.011 }}

A general method to identify dynamical domains, that is protein

regions that behave approximately as rigid units in the course of

structural fluctuations, has been introduced by Potestio et al. and, among other applications was also used

to compare the consistency of the dynamics-based domain

subdivisions with standard structure-based ones. The method,

termed [https://web.archive.org/web/20171204052439/http://pisqrd.escience-lab.org/ PiSQRD], is publicly available in the form of a webserver.{{cite journal | vauthors = Aleksiev T, Potestio R, Pontiggia F, Cozzini S, Micheletti C | title = PiSQRD: a web server for decomposing proteins into quasi-rigid dynamical domains | journal = Bioinformatics | volume = 25 | issue = 20 | pages = 2743–4 | date = October 2009 | pmid = 19696046 | doi = 10.1093/bioinformatics/btp512 | s2cid = 28106759 | doi-access = free }} The latter allows users to optimally subdivide single-chain

or multimeric proteins into quasi-rigid domains based on the collective modes of fluctuation of the system. By default the

latter are calculated through an elastic network model;Micheletti, C., Carloni, P. and Maritan, A. Accurate and efficient description of protein vibrational dynamics: comparing molecular dynamics and gaussian models, Proteins, 55, 635, 2004.

alternatively pre-calculated essential dynamical spaces can be

uploaded by the user.

= Example domains =

Armadillo repeats: named after the β-catenin-like Armadillo protein of the fruit fly Drosophila melanogaster.
Basic leucine zipper domain (bZIP domain): found in many DNA-binding eukaryotic proteins. One part of the domain contains a region that mediates sequence-specific DNA-binding properties and the Leucine zipper that is required for the dimerization of two DNA-binding regions. The DNA-binding region comprises a number of basic aminoacids such as arginine and lysine.
Cadherin repeats: Cadherins function as Ca²⁺-dependent cell–cell adhesion proteins. Cadherin domains are extracellular regions which mediate cell-to-cell homophilic binding between cadherins on the surface of adjacent cells.
Death effector domain (DED): allows protein–protein binding by homotypic interactions (DED-DED). Caspase proteases trigger apoptosis via proteolytic cascades. Pro-caspase-8 and pro-caspase-9 bind to specific adaptor molecules via DED domains, which leads to autoactivation of caspases.
EF hand: a helix-turn-helix structural motif found in each structural domain of the signaling protein calmodulin and in the muscle protein troponin-C.
Foldon domain: A small protein domain from fibritin in T4 bacteriophage that can cause proteins to trimerize.
Immunoglobulin-like domains: found in proteins of the immunoglobulin superfamily (IgSF).{{cite journal | vauthors = Barclay AN | title = Membrane proteins with immunoglobulin-like domains--a master superfamily of interaction molecules | journal = Seminars in Immunology | volume = 15 | issue = 4 | pages = 215–23 | date = August 2003 | pmid = 14690046 | doi = 10.1016/S1044-5323(03)00047-2 }} They contain about 70-110 amino acids and are classified into different categories (IgV, IgC1, IgC2 and IgI) according to their size and function. They possess a characteristic fold in which two beta sheets form a "sandwich" that is stabilized by interactions between conserved cysteines and other charged amino acids. They are important for protein–protein interactions in processes of cell adhesion, cell activation, and molecular recognition. These domains are commonly found in molecules with roles in the immune system.
Phosphotyrosine-binding domain (PTB): PTB domains usually bind to phosphorylated tyrosine residues. They are often found in signal transduction proteins. PTB-domain binding specificity is determined by residues to the amino-terminal side of the phosphotyrosine. Examples: the PTB domains of both SHC and IRS-1 bind to a NPXpY sequence. PTB-containing proteins such as SHC and IRS-1 are important for insulin responses of human cells.
Pleckstrin homology domain (PH): PH domains bind phosphoinositides with high affinity. Specificity for PtdIns(3)P, PtdIns(4)P, PtdIns(3,4)P2, PtdIns(4,5)P2, and PtdIns(3,4,5)P3 have all been observed. Given the fact that phosphoinositides are sequestered to various cell membranes (due to their long lipophilic tail) the PH domains usually causes recruitment of the protein in question to a membrane where the protein can exert a certain function in cell signalling, cytoskeletal reorganization or membrane trafficking.
Src homology 2 domain (SH2): SH2 domains are often found in signal transduction proteins. SH2 domains confer binding to phosphorylated tyrosine (pTyr). Named after the phosphotyrosine binding domain of the src viral oncogene, which is itself a tyrosine kinase. See also: SH3 domain.
Zinc finger DNA-binding domain (ZnF_GATA): ZnF_GATA domain-containing proteins are typically transcription factors that usually bind to the DNA sequence [AT]GATA[AG] of promoters.

Domains of unknown function

A large fraction of domains are of unknown function. A domain of unknown function (DUF) is a protein domain that has no characterized function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. There are now over 3,000 DUF families within the Pfam database representing over 20% of known families.{{cite journal | vauthors = Bateman A, Coggill P, Finn RD | title = DUFs: families in search of function | journal = Acta Crystallographica. Section F, Structural Biology and Crystallization Communications | volume = 66 | issue = Pt 10 | pages = 1148–52 | date = October 2010 | pmid = 20944204 | pmc = 2954198 | doi = 10.1107/S1744309110001685 }} Surprisingly, the number of DUFs in Pfam has increased from 20% (in 2010) to 22% (in 2019), mostly due to an increasing number of new genome sequences. Pfam release 32.0 (2019) contained 3,961 DUFs.{{cite journal | vauthors = El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer EL, Hirsh L, Paladin L, Piovesan D, Tosatto SC, Finn RD | display-authors = 6 | title = The Pfam protein families database in 2019 | journal = Nucleic Acids Research | volume = 47 | issue = D1 | pages = D427–D432 | date = January 2019 | pmid = 30357350 | pmc = 6324024 | doi = 10.1093/nar/gky995 }}

References

George, R. A. (2002) "Predicting Structural Domains in Proteins" Thesis, University College London (contributed by its author).

Key papers

{{cite journal | vauthors = Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE | display-authors = 6 | title = The Protein Data Bank | journal = Nucleic Acids Research | volume = 28 | issue = 1 | pages = 235–42 | date = January 2000 | pmid = 10592235 | pmc = 102472 | doi = 10.1093/nar/28.1.235 }}
{{cite book | last1 = Tooze | first1 = John | last2 = Brändén | first2 = Carl-Ivar | name-list-style = vanc | title = Introduction to protein structure | publisher = Garland Pub | location = New York | year = 1999 | isbn = 978-0-8153-2305-1 }}
{{cite journal | vauthors = Das S, Smith TF | title = Identifying nature's protein Lego set | journal = Advances in Protein Chemistry | volume = 54 | pages = 159–83 | year = 2000 | pmid = 10829228 | doi = 10.1016/S0065-3233(00)54006-6 | isbn = 978-0-12-034254-9 }}
{{cite journal | vauthors = Dietmann S, Park J, Notredame C, Heger A, Lappe M, Holm L | title = A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 | journal = Nucleic Acids Research | volume = 29 | issue = 1 | pages = 55–7 | date = January 2001 | pmid = 11125048 | pmc = 29815 | doi = 10.1093/nar/29.1.55 }}
{{cite journal | vauthors = Dyson HJ, Sayre JR, Merutka G, Shin HC, Lerner RA, Wright PE | title = Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding. II. Plastocyanin | journal = Journal of Molecular Biology | volume = 226 | issue = 3 | pages = 819–35 | date = August 1992 | pmid = 1507228 | doi = 10.1016/0022-2836(92)90634-V | author-link1 = Jane Dyson }}
{{cite journal | vauthors = Fersht AR | title = Nucleation mechanisms in protein folding | journal = Current Opinion in Structural Biology | volume = 7 | issue = 1 | pages = 3–9 | date = February 1997 | pmid = 9032066 | doi = 10.1016/S0959-440X(97)80002-4 | author-link = Alan Fersht }}
{{cite book | vauthors = George DG, Hunt LT, Barker WC | title = PIR-International Protein Sequence Database | chapter = [3] PIR-International protein sequence database | series = Methods in Enzymology | volume = 266 | pages = 41–59 | year = 1996 | pmid = 8743676 | pmc = 145575 | doi = 10.1016/S0076-6879(96)66005-4 | isbn = 978-0-12-182167-8 }}
{{cite journal | vauthors = Go M | title = Correlation of DNA exonic regions with protein structural units in haemoglobin | journal = Nature | volume = 291 | issue = 5810 | pages = 90–2 | date = May 1981 | pmid = 7231530 | doi = 10.1038/291090a0 | bibcode = 1981Natur.291...90G | s2cid = 4313732 }}
{{cite journal | vauthors = Hadley C, Jones DT | title = A systematic comparison of protein structure classifications: SCOP, CATH and FSSP | journal = Structure | volume = 7 | issue = 9 | pages = 1099–112 | date = September 1999 | pmid = 10508779 | doi = 10.1016/S0969-2126(99)80177-4 | doi-access = free }}
{{cite journal | vauthors = Hayward S | title = Structural principles governing domain motions in proteins | journal = Proteins | volume = 36 | issue = 4 | pages = 425–35 | date = September 1999 | pmid = 10450084 | doi = 10.1002/(SICI)1097-0134(19990901)36:4<425::AID-PROT6>3.0.CO;2-S | s2cid = 29808315 }}
{{cite journal | vauthors = Heringa J, Argos P | title = Side-chain clusters in protein structures and their role in protein folding | journal = Journal of Molecular Biology | volume = 220 | issue = 1 | pages = 151–71 | date = July 1991 | pmid = 2067014 | doi = 10.1016/0022-2836(91)90388-M }}
{{cite journal | vauthors = Honig B | title = Protein folding: from the levinthal paradox to structure prediction | journal = Journal of Molecular Biology | volume = 293 | issue = 2 | pages = 283–93 | date = October 1999 | pmid = 10550209 | doi = 10.1006/jmbi.1999.3006 | citeseerx = 10.1.1.332.955 }}
{{cite journal | vauthors = Kim PS, Baldwin RL | title = Intermediates in the folding reactions of small proteins | journal = Annual Review of Biochemistry | volume = 59 | issue = 1 | pages = 631–60 | year = 1990 | pmid = 2197986 | doi = 10.1146/annurev.bi.59.070190.003215 }}
{{cite journal | vauthors = Murvai J, Vlahovicek K, Barta E, Cataletto B, Pongor S | title = The SBASE protein domain library, release 7.0: a collection of annotated protein sequence segments | journal = Nucleic Acids Research | volume = 28 | issue = 1 | pages = 260–2 | date = January 2000 | pmid = 10592241 | pmc = 102474 | doi = 10.1093/nar/28.1.260 }}
{{cite journal | vauthors = Murzin AG, Brenner SE, Hubbard T, Chothia C | title = SCOP: a structural classification of proteins database for the investigation of sequences and structures | journal = Journal of Molecular Biology | volume = 247 | issue = 4 | pages = 536–40 | date = April 1995 | pmid = 7723011 | doi = 10.1016/S0022-2836(05)80134-2 | url = http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf | url-status = dead | archive-url = https://web.archive.org/web/20120426170732/http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf | df = dmy-all | archive-date = 26 April 2012 | author-link4 = Cyrus Chothia | author-link2 = Steven E. Brenner | author-link3 = Tim Hubbard }}
{{cite book | vauthors = Janin J, Chothia C | title = Diffraction Methods for Biological Macromolecules Part B | chapter = Domains in proteins: definitions, location, and structural principles | series = Methods in Enzymology | volume = 115 | pages = 420–30 | year = 1985 | pmid = 4079796 | doi = 10.1016/0076-6879(85)15030-5 | chapter-url = https://archive.org/details/diffractionmetho0000unse/page/420 | isbn = 978-0-12-182015-2 }}
{{cite journal | vauthors = Schultz J, Copley RR, Doerks T, Ponting CP, Bork P | title = SMART: a web-based tool for the study of genetically mobile domains | journal = Nucleic Acids Research | volume = 28 | issue = 1 | pages = 231–4 | date = January 2000 | pmid = 10592234 | pmc = 102444 | doi = 10.1093/nar/28.1.231 }}
{{cite journal | vauthors = Siddiqui AS, Dengler U, Barton GJ | title = 3Dee: a database of protein structural domains | journal = Bioinformatics | volume = 17 | issue = 2 | pages = 200–1 | date = February 2001 | pmid = 11238081 | doi = 10.1093/bioinformatics/17.2.200 | doi-access = }}
{{cite journal | vauthors = Srinivasarao GY, Yeh LS, Marzec CR, Orcutt BC, Barker WC, Pfeiffer F | title = Database of protein sequence alignments: PIR-ALN | journal = Nucleic Acids Research | volume = 27 | issue = 1 | pages = 284–5 | date = January 1999 | pmid = 9847202 | pmc = 148157 | doi = 10.1093/nar/27.1.284 }}
{{cite journal | vauthors = Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV | display-authors = 6 | title = The COG database: new developments in phylogenetic classification of proteins from complete genomes | journal = Nucleic Acids Research | volume = 29 | issue = 1 | pages = 22–8 | date = January 2001 | pmid = 11125040 | pmc = 29819 | doi = 10.1093/nar/29.1.22 }}
{{cite journal | vauthors = Taylor WR, Orengo CA | title = Protein structure alignment | journal = Journal of Molecular Biology | volume = 208 | issue = 1 | pages = 1–22 | date = July 1989 | pmid = 2769748 | doi = 10.1016/0022-2836(89)90084-3 }}
{{cite journal | vauthors = Yang AS, Honig B | title = Free energy determinants of secondary structure formation: I. alpha-Helices | journal = Journal of Molecular Biology | volume = 252 | issue = 3 | pages = 351–65 | date = September 1995 | pmid = 7563056 | doi = 10.1006/jmbi.1995.0502 | doi-access = free }}
{{cite journal | vauthors = Yang AS, Honig B | title = Free energy determinants of secondary structure formation: II. Antiparallel beta-sheets | journal = Journal of Molecular Biology | volume = 252 | issue = 3 | pages = 366–76 | date = September 1995 | pmid = 7563057 | doi = 10.1006/jmbi.1995.0503 | doi-access = free }}
{{cite journal | vauthors = Gough J, Chothia C | title = SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments | journal = Nucleic Acids Research | volume = 30 | issue = 1 | pages = 268–72 | date = January 2002 | pmid = 11752312 | pmc = 99153 | doi = 10.1093/nar/30.1.268 | author-link1 = Julian Gough (scientist) | author-link2 = Cyrus Chothia }}

External links

=Structural domain databases=

[https://www.ncbi.nlm.nih.gov/books/NBK21095/#A110 Conserved Domains at the National Center for Biotechnology website]
[http://www.compbio.dundee.ac.uk/3Dee/ 3Dee]
[http://www.cathdb.info/ CATH]
[http://ekhidna.biocenter.helsinki.fi/dali_server/ DALI]
{{webarchive |url=https://web.archive.org/web/20060911185355/http://realm.sdsc.edu/pdomains/ |title=Definition and assignment of structural domains in proteins |date=2006-09-11}}
[http://pfam.xfam.org/clan/browse PFAM clan browser]

=Sequence domain databases=

[http://www.ebi.ac.uk/interpro InterPro]
{{webarchive |url=http://webarchive.loc.gov/all/20110506030957/http%3A//pfam.sanger.ac.uk/ |title=Pfam |date=2011-05-06}}
[http://www.expasy.org/prosite/ PROSITE]
[http://prodom.prabi.fr ProDom]{{Dead link|date=May 2020 |bot=InternetArchiveBot |fix-attempted=yes }}
[http://SMART.embl-heidelberg.de SMART]
[https://www.ncbi.nlm.nih.gov/cdd NCBI Conserved Domain Database]
[http://supfam.org/SUPERFAMILY SUPERFAMILY] Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms

=Functional domain databases=

[http://supfam.org/SUPERFAMILY/dcGO dcGO] A comprehensive database of domain-centric ontologies on functions, phenotypes and diseases.

Category:Protein structure

Category:Protein families

Category:Protein superfamilies