DNA-binding domain

{{Short description|Self-stabilizing region of a protein that binds to specific DNA sequences}}

A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence (a recognition sequence) or have a general affinity to DNA.{{cite book | last = Lilley | first = David M. J. | name-list-style = vanc |title= DNA-protein: structural interactions |publisher= IRL Press at Oxford University Press |location= Oxford |year= 1995 |isbn= 0-19-963453-X }} Some DNA-binding domains may also include nucleic acids in their folded structure.

Function

File:LacI Dimer Structure Annotated.png is regulated by a C-terminal regulatory domain (labeled). The regulatory domain binds an allosteric effector molecule (green). The allosteric response of the protein is communicated from the regulatory domain to the DNA binding domain through the linker region.{{cite journal | vauthors = Swint-Kruse L, Matthews KS | title = Allostery in the LacI/GalR family: variations on a theme | journal = Current Opinion in Microbiology | volume = 12 | issue = 2 | pages = 129–37 | date = April 2009 | pmid = 19269243 | pmc = 2688824 | doi = 10.1016/j.mib.2009.01.009 }}]]

One or more DNA-binding domains are often part of a larger protein consisting of further protein domains with differing function. The extra domains often regulate the activity of the DNA-binding domain. The function of DNA binding is either structural or involves transcription regulation, with the two roles sometimes overlapping.{{citation needed|date=September 2024}}

DNA-binding domains with functions involving DNA structure have biological roles in DNA replication, repair, storage, and modification, such as methylation.{{citation needed|date=September 2024}}

Many proteins involved in the regulation of gene expression contain DNA-binding domains. For example, proteins that regulate transcription by binding DNA are called transcription factors. The final output of most cellular signaling cascades is gene regulation.{{citation needed|date=September 2024}}

The DBD interacts with the nucleotides of DNA in a DNA sequence-specific or non-sequence-specific manner, but even non-sequence-specific recognition involves some sort of molecular complementarity between protein and DNA. DNA recognition by the DBD can occur at the major or minor groove of DNA, or at the sugar-phosphate DNA backbone (see the structure of DNA). Each specific type of DNA recognition is tailored to the protein's function. For example, the DNA-cutting enzyme DNAse I cuts DNA almost randomly and so must bind to DNA in a non-sequence-specific manner. But, even so, DNAse I recognizes a certain 3-D DNA structure, yielding a somewhat specific DNA cleavage pattern that can be useful for studying DNA recognition by a technique called DNA footprinting.{{citation needed|date=September 2024}}

Many DNA-binding domains must recognize specific DNA sequences, such as DBDs of transcription factors that activate specific genes, or those of enzymes that modify DNA at specific sites, like restriction enzymes and telomerase. The hydrogen bonding pattern in the DNA major groove is less degenerate than that of the DNA minor groove, providing a more attractive site for sequence-specific DNA recognition.{{citation needed|date=September 2024}}

The specificity of DNA-binding proteins can be studied using many biochemical and biophysical techniques, such as gel electrophoresis, analytical ultracentrifugation, calorimetry, DNA mutation, protein structure mutation or modification, nuclear magnetic resonance, x-ray crystallography, surface plasmon resonance, electron paramagnetic resonance, cross-linking and microscale thermophoresis (MST).

DNA-binding protein in genomes

{{See also|DNA-binding protein}}

A large fraction of genes in each genome encodes DNA-binding proteins (see Table). However, only a rather small number of protein families are DNA-binding. For instance, more than 2000 of the ~20,000 human proteins are "DNA-binding", including about 750 Zinc-finger proteins.{{Cite web|url=https://www.uniprot.org/uniprot/?query=proteome:UP000005640%20reviewed:yes|title=reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640 in UniProtKB|website=www.uniprot.org|language=en|access-date=2017-10-25}}

class="wikitable"

!Species

!DNA-binding proteins{{cite journal | vauthors = Malhotra S, Sowdhamini R | title = Genome-wide survey of DNA-binding proteins in Arabidopsis thaliana: analysis of distribution and functions | journal = Nucleic Acids Research | volume = 41 | issue = 15 | pages = 7212–9 | date = August 2013 | pmid = 23775796 | doi = 10.1093/nar/gkt505 | pmc=3753632}}

!DNA-binding families

Arabidopsis thaliana (thale cress)

|4471

|300

Saccharomyces cerevisiae (yeast)

|720

|243

Caenorhabditis elegans (worm)

|2028

|271

Drosophila melanogaster (fruit fly)

|2620

|283

Types

= Helix-turn-helix =

{{Main|Helix-turn-helix}}

Originally discovered in bacteria, the helix-turn-helix motif is commonly found in repressor proteins and is about 20 amino acids long. In eukaryotes, the homeodomain comprises 2 helices, one of which recognizes the DNA (aka recognition helix). They are common in proteins that regulate developmental processes.{{cite web | title=HTH search at PROSITE| website=Expasy | url=https://prosite.expasy.org/cgi-bin/prosite/prosite_search_full.pl?SEARCH=hth | access-date=2024-06-17}}

= Helix-hairpin-helix =

{{Main|Helix-hairpin-helix}}

The helix-hairpin-helix is found in proteins that interact with DNA in a non-sequence-specific manner.{{cite journal |last1=Doherty |first1=AJ |last2=Serpell |first2=LC |last3=Ponting |first3=CP |title=The helix-hairpin-helix DNA-binding motif: a structural basis for non-sequence-specific recognition of DNA. |journal=Nucleic Acids Research |date=1 July 1996 |volume=24 |issue=13 |pages=2488–97 |doi=10.1093/nar/24.13.2488 |pmid=8692686|pmc=145986 }} It consists of two anti-parallel alpha-helices connected by a short hairpin loop. The two alpha-helices are packed at an acute angle of ~25–50° that dictates the characteristic pattern of hydrophobicity in the sequences, while other DNA-binding structures like the helix-turn-helix motif, which is also formed by a pair of helices, can be easily distinguished by the packing of the helices at an almost right angle.{{Cite journal |date=2000 |title=Common fold in helix-hairpin-helix proteins |journal=Nucleic Acids Research |pmid=10908318 |last1=Shao |first1=X. |last2=Grishin |first2=N. V. |volume=28 |issue=14 |pages=2643–2650 |doi=10.1093/nar/28.14.2643 |pmc=102670 }}

= Zinc finger =

{{Main|Zinc finger}}

Image:1r4o.png (top) bound to DNA (bottom). Zinc atoms are represented by grey spheres and the coordinating cysteine sidechains are depicted as sticks.]]

The zinc finger domain is mostly found in eukaryotes, but some examples have been found in bacteria.{{cite journal | vauthors = Malgieri G, Palmieri M, Russo L, Fattorusso R, Pedone PV, Isernia C | title = The prokaryotic zinc-finger: structure, function and comparison with the eukaryotic counterpart | journal = The FEBS Journal | volume = 282 | issue = 23 | pages = 4480–96 | date = December 2015 | pmid = 26365095 | doi = 10.1111/febs.13503 | doi-access = free }} The zinc finger domain is generally between 23 and 28 amino acids long and is stabilized by coordinating zinc ions with regularly spaced zinc-coordinating residues (either histidines or cysteines). The most common class of zinc finger (Cys2His2) coordinates a single zinc ion and consists of a recognition helix and a 2-strand beta-sheet.{{cite journal | vauthors = Pabo CO, Peisach E, Grant RA | title = Design and selection of novel Cys2His2 zinc finger proteins | journal = Annual Review of Biochemistry | volume = 70 | pages = 313–40 | year = 2001 | pmid = 11395410 | doi = 10.1146/annurev.biochem.70.1.313 }} In transcription factors these domains are often found in arrays (usually separated by short linker sequences) and adjacent fingers are spaced at 3 basepair intervals when bound to DNA.

= Leucine zipper =

{{Main|Leucine zipper}}

The basic leucine zipper (bZIP) domain is found mainly in eukaryotes and to a limited extent in bacteria. The bZIP domain contains an alpha helix with a leucine at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins. When binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves. It regulates gene expression.

= Winged helix =

Consisting of about 110 amino acids, the winged helix (WH) domain has four helices and a two-strand beta-sheet.

= Winged helix-turn-helix =

The winged helix-turn-helix (wHTH) domain {{SCOP|46785}} is typically 85-90 amino acids long. It is formed by a 3-helical bundle and a 4-strand beta-sheet (wing).

= Helix-loop-helix =

The basic helix-loop-helix (bHLH) domain is found in some transcription factors and is characterized by two alpha helices (α-helixes) connected by a loop. One helix is typically smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions.

= HMG-box =

HMG-box domains are found in high mobility group proteins which are involved in a variety of DNA-dependent processes like replication and transcription. They also alter the flexibility of the DNA by inducing bends.{{cite journal | vauthors = Murugesapillai D et al. | year = 2014 | title = DNA bridging and looping by HMO1 provides a mechanism for stabilizing nucleosome-free chromatin | journal = Nucleic Acids Res | volume = 42 | issue = 14| pages = 8996–9004 | doi=10.1093/nar/gku635 | pmid=25063301 | pmc=4132745}}{{cite journal | pmid = 28303166 | doi=10.1007/s12551-016-0236-4 | pmc=5331113 | volume=9 | issue=1 | title=Single-molecule studies of high-mobility group B architectural DNA bending proteins | year=2017 | journal=Biophys Rev | pages=17–40 | vauthors=Murugesapillai D, McCauley MJ, Maher LJ 3rd, Williams MC}} The domain consists of three alpha helices separated by loops.

= Wor3 domain =

Wor3 domains, named after the White–Opaque Regulator 3 (Wor3) in Candida albicans arose more recently in evolutionary time than most previously described DNA-binding domains and are restricted to a small number of fungi.{{cite journal | vauthors = Lohse MB, Hernday AD, Fordyce PM, Noiman L, Sorrells TR, Hanson-Smith V, Nobile CJ, DeRisi JL, Johnson AD | title = Identification and characterization of a previously undescribed family of sequence-specific DNA-binding domains | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 110 | issue = 19 | pages = 7660–5 | date = May 2013 | pmid = 23610392 | pmc = 3651432 | doi = 10.1073/pnas.1221734110 | bibcode = 2013PNAS..110.7660L | doi-access = free }}

= OB-fold domain =

The OB-fold is a small structural motif originally named for its oligonucleotide/oligosaccharide binding properties. OB-fold domains range between 70 and 150 amino acids in length.{{cite journal | vauthors = Flynn RL, Zou L | title = Oligonucleotide/oligosaccharide-binding fold proteins: a growing family of genome guardians | journal = Critical Reviews in Biochemistry and Molecular Biology | volume = 45 | issue = 4 | pages = 266–75 | date = August 2010 | pmid = 20515430 | pmc = 2906097 | doi = 10.3109/10409238.2010.488216 }} OB-folds bind single-stranded DNA, and hence are single-stranded binding proteins.

OB-fold proteins have been identified as critical for DNA replication, DNA recombination, DNA repair, transcription, translation, cold shock response, and telomere maintenance.{{cite journal | vauthors = Theobald DL, Mitton-Fry RM, Wuttke DS | title = Nucleic acid recognition by OB-fold proteins | journal = Annual Review of Biophysics and Biomolecular Structure | volume = 32 | pages = 115–33 | year = 2003 | pmid = 12598368 | pmc = 1564333 | doi = 10.1146/annurev.biophys.32.110601.142506 }}

Unusual

= Immunoglobulin fold =

The immunoglobulin domain ({{InterPro|IPR013783}}) consists of a beta-sheet structure with large connecting loops, which serve to recognize either DNA major grooves or antigens. Usually found in immunoglobulin proteins, they are also present in Stat proteins of the cytokine pathway. This is likely because the cytokine pathway evolved relatively recently and has made use of systems that were already functional, rather than creating its own.

= B3 domain =

The B3 DBD ({{InterPro|IPR003340}}, {{SCOP|117343}}) is found exclusively in transcription factors from higher plants and restriction endonucleases EcoRII and BfiI and typically consists of 100-120 residues. It includes seven beta sheets and two alpha helices, which form a DNA-binding pseudobarrel protein fold.

= TAL effector =

TAL effectors are found in bacterial plant pathogens of the genus Xanthomonas and are involved in regulating the genes of the host plant in order to facilitate bacterial virulence, proliferation, and dissemination.{{cite journal | vauthors = Boch J, Bonas U | title = Xanthomonas AvrBs3 family-type III effectors: discovery and function | journal = Annual Review of Phytopathology | volume = 48 | pages = 419–36 | year = 2010 | issue = 1 | pmid = 19400638 | doi = 10.1146/annurev-phyto-080508-081936 | bibcode = 2010AnRvP..48..419B }} They contain a central region of tandem 33-35 residue repeats and each repeat region encodes a single DNA base in the TALE's binding site.{{cite journal | vauthors = Moscou MJ, Bogdanove AJ | title = A simple cipher governs DNA recognition by TAL effectors | journal = Science | volume = 326 | issue = 5959 | pages = 1501 | date = December 2009 | pmid = 19933106 | doi = 10.1126/science.1178817 | bibcode = 2009Sci...326.1501M | s2cid = 6648530 }}{{cite journal | vauthors = Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A, Bonas U | title = Breaking the code of DNA binding specificity of TAL-type III effectors | journal = Science | volume = 326 | issue = 5959 | pages = 1509–12 | date = December 2009 | pmid = 19933107 | doi = 10.1126/science.1178811 | bibcode = 2009Sci...326.1509B | s2cid = 206522347 }} Within the repeat it is residue 13 alone that directly contacts the DNA base, determining sequence specificity, while other positions make contacts with the DNA backbone, stabilising the DNA-binding interaction.{{cite journal | vauthors = Mak AN, Bradley P, Cernadas RA, Bogdanove AJ, Stoddard BL | title = The crystal structure of TAL effector PthXo1 bound to its DNA target | journal = Science | volume = 335 | issue = 6069 | pages = 716–9 | date = February 2012 | pmid = 22223736 | pmc = 3427646 | doi = 10.1126/science.1216211 | bibcode = 2012Sci...335..716M }} Each repeat within the array takes the form of paired alpha-helices, while the whole repeat array forms a right-handed superhelix, wrapping around the DNA-double helix. TAL effector repeat arrays have been shown to contract upon DNA binding and a two-state search mechanism has been proposed whereby the elongated TALE begins to contract around the DNA beginning with a successful Thymine recognition from a unique repeat unit N-terminal of the core TAL-effector repeat array.{{cite journal | vauthors = Cuculis L, Abil Z, Zhao H, Schroeder CM | title = Direct observation of TALE protein dynamics reveals a two-state search mechanism | journal = Nature Communications | volume = 6 | pages = 7277 | date = June 2015 | pmid = 26027871 | pmc = 4458887 | doi = 10.1038/ncomms8277 | bibcode = 2015NatCo...6.7277C }}

Related proteins are found in bacterial plant pathogen Ralstonia solanacearum,{{cite journal | vauthors = de Lange O, Schreiber T, Schandry N, Radeck J, Braun KH, Koszinowski J, Heuer H, Strauß A, Lahaye T | title = Breaking the DNA-binding code of Ralstonia solanacearum TAL effectors provides new possibilities to generate plant resistance genes against bacterial wilt disease | journal = The New Phytologist | volume = 199 | issue = 3 | pages = 773–86 | date = August 2013 | pmid = 23692030 | doi = 10.1111/nph.12324 | doi-access = free | bibcode = 2013NewPh.199..773D }} the fungal endosymbiont Burkholderia rhizoxinica{{cite journal | vauthors = Juillerat A, Bertonati C, Dubois G, Guyot V, Thomas S, Valton J, Beurdeley M, Silva GH, Daboussi F, Duchateau P | title = BurrH: a new modular DNA binding protein for genome engineering | journal = Scientific Reports | volume = 4 | pages = 3831 | date = January 2014 | pmid = 24452192 | doi = 10.1038/srep03831 | pmc=5379180| bibcode = 2014NatSR...4.3831J }} and two as-yet unidentified marine-microorganisms.{{cite journal | vauthors = de Lange O, Wolf C, Thiel P, Krüger J, Kleusch C, Kohlbacher O, Lahaye T | title = DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats | journal = Nucleic Acids Research | volume = 43 | issue = 20 | pages = 10065–80 | date = November 2015 | pmid = 26481363 | pmc = 4787788 | doi = 10.1093/nar/gkv1053 }} The DNA binding code and the structure of the repeat array is conserved between these groups, referred to collectively as the TALE-likes.

See also

  • For a structural classification of DNA-binding-domains presents in land plant genomes, see {{Cite journal |last1=Blanc-Mathieu |first1=Romain |last2=Dumas |first2=Renaud |last3=Turchi |first3=Laura |last4=Lucas |first4=Jérémy |last5=Parcy |first5=François |date=July 2023 |title=Plant-TFClass: a structural classification for plant transcription factors |url=https://www.sciencedirect.com/science/article/pii/S1360138523002273 |journal=Trends in Plant Science |volume=29 |issue=1 |pages=40–51 |language=en |doi=10.1016/j.tplants.2023.06.023|pmid=37482504 }}
  • Comparison of nucleic acid simulation software

References

{{reflist|35em}}