DNA binding site

{{Short description|Regions of DNA capable of binding to biomolecules}}

File:Transcription factors DNA binding sites.svg ]]

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence (e.g. a genome) and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases (see site-specific recombination) and methyltransferases.{{cite journal |author=Halford E.S. |author2=Marko J.F. |year=2004 |title=How do site-specific DNA-binding proteins find their targets? |journal=Nucleic Acids Research |volume=32 |issue=10 |pages=3040–3052 | pmid=15178741 |doi=10.1093/nar/gkh624 |pmc=434431}}

DNA binding sites can be thus defined as short DNA sequences (typically 4 to 30 base pairs long, but up to 200 bp for recombination sites) that are specifically bound by one or more DNA-binding proteins or protein complexes. It has been reported that some binding sites have potential to undergo fast evolutionary change.{{cite journal |author=Borneman, A.R. |author2=Gianoulis, T.A. |author3=Zhang, Z.D. |author4=Yu, H. |author5=Rozowsky, J. |author6=Seringhaus, M.R. |author7=Wang, L.Y. |author8=Gerstein, M. |author9=Snyder, M. |s2cid=21535866 |name-list-style=amp |year=2007 |title=Divergence of transcription factor binding sites across related yeast species. |journal=Science |volume=317 |issue=5839 |pages=815–819|pmid=17690298|bibcode = 2007Sci...317..815B |doi = 10.1126/science.1140748 }}

Types of DNA binding sites

DNA binding sites can be categorized according to their biological function. Thus, we can distinguish between transcription factor-binding sites, restriction sites and recombination sites. Some authors have proposed that binding sites could also be classified according to their most convenient mode of representation.{{cite journal |author=Stormo GD |year=2000 |title=DNA binding sites: representation and discovery |journal=Bioinformatics |volume=16 |issue=1 |pages=16–23 |pmid=10812473 |doi=10.1093/bioinformatics/16.1.16|doi-access=free }} On the one hand, restriction sites can be generally represented by consensus sequences. This is because they target mostly identical sequences and restriction efficiency decreases abruptly for less similar sequences. On the other hand, DNA binding sites for a given transcription factor are usually all different, with varying degrees of affinity of the transcription factor for the different binding sites. This makes it difficult to accurately represent transcription factor binding sites using consensus sequences, and they are typically represented using position specific frequency matrices (PSFM), which are often graphically depicted using sequence logos. This argument, however, is partly arbitrary. Restriction enzymes, like transcription factors, yield a gradual, though sharp, range of affinities for different sites {{cite journal |vauthors=Pingoud A, Jeltsch A |year=1997 |title= Recognition and Cleavage of DNA by Type-II Restriction Endonucleases |journal=European Journal of Biochemistry | volume=246 |issue=1 |pages=1–22 |pmid=9210460 |doi=10.1111/j.1432-1033.1997.t01-6-00001.x|doi-access=free }} and are thus also best represented by PSFM. Likewise, site-specific recombinases also show a varied range of affinities for different target sites.{{cite journal |doi= 10.1128/JB.182.10.2787-2792.2000 |vauthors= Gyohda A, Komano T |year=2000 |title= Purification and characterization of the R64 shufflon-specific recombinase.|journal=Journal of Bacteriology | volume=182 |issue=10 |pages=2787–2792 |pmid=10781547 |pmc= 101987}}{{cite book |author= Birge, E.A. | year=2006 |title=Bacterial and Bacteriophage Genetics | edition= 5th | chapter= 15: Site Specific Recombination | pages=463–478 | publisher=Springer | isbn= 978-0-387-23919-4}}

History and main experimental techniques

The existence of something akin to DNA binding sites was suspected from the experiments on the biology of the bacteriophage lambda{{cite journal |author= Campbell A |year=1963 |title= Fine Structure Genetics and its Relation to Function |journal=Annual Review of Microbiology | volume=17 |issue= 1 |pages=2787–2792 |pmid=14145311 | doi=10.1146/annurev.mi.17.100163.000405}} and the regulation of the Escherichia coli lac operon.{{cite journal |vauthors= Jacob F, Monod J |year=1961 |title= Genetic regulatory mechanisms in the synthesis of proteins |journal=Journal of Molecular Biology | volume=3 |issue= 3 | pages=318–356 |pmid=13718526 |doi=10.1016/S0022-2836(61)80072-7|s2cid=19804795 }} DNA binding sites were finally confirmed in both systems {{cite journal |vauthors= Gilbert W, Maxam A |year=1973 |title= The nucleotide sequence of the lac operator |journal=Proceedings of the National Academy of Sciences of the United States of America | volume=70 |pages=3581–3584|pmid=4587255 |issue= 12 |pmc= 427284 |doi=10.1073/pnas.70.12.3581|bibcode = 1973PNAS...70.3581G |doi-access=free }}{{cite journal |doi= 10.1038/250394a0 |vauthors= Maniatis T, Ptashne M, Barrell BG, Donelson J |year=1974 |title= Sequence of a repressor-binding site in the DNA of bacteriophage lambda |journal=Nature | volume=250|pages=394–397|pmid=4854243 |issue= 465|bibcode = 1974Natur.250..394M |s2cid= 4204720 }}{{cite journal |doi= 10.1073/pnas.72.3.1072 |author= Nash H. A. |year=1975 |title= Integrative recombination of bacteriophage lambda DNA in vitro |journal=Proceedings of the National Academy of Sciences of the United States of America | volume=72 |pages=1072–1076 | pmid=1055366 |issue= 3 |pmc= 432468|bibcode = 1975PNAS...72.1072N |doi-access= free }} with the advent of DNA sequencing techniques. From then on, DNA binding sites for many transcription factors, restriction enzymes and site-specific recombinases have been discovered using a profusion of experimental methods. Historically, the experimental techniques of choice to discover and analyze DNA binding sites have been the DNAse footprinting assay and the Electrophoretic Mobility Shift Assay (EMSA). However, the development of DNA microarrays and fast sequencing techniques has led to new, massively parallel methods for in-vivo identification of binding sites, such as ChIP-chip and ChIP-Seq.{{cite journal |vauthors= Elnitski L, Jin VX, Farnham PJ, Jones SJ |year=2006 |title= Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques |journal=Genome Research | volume=16 |pages=1455–1464 | pmid=17053094 |doi=10.1101/gr.4140006 |issue= 12|doi-access=free }} To quantify the binding affinity{{cite journal | vauthors = Baaske P, Wienken CJ, Reineck P, Duhr S, Braun D | s2cid = 42489892 | title = Optical Thermophoresis quantifies Buffer dependence of Aptamer Binding | journal = Angew. Chem. Int. Ed.| volume = 49 | issue = 12| pages = 2238–41 |date=Feb 2010 | pmid = 20186894| doi = 10.1002/anie.200903998}}

  • {{cite web |date=February 24, 2010 |title=A hot road to new drugs |website=Phys.org |url=http://www.physorg.com/news186225693.html}} of proteins and other molecules to specific DNA binding sites the biophysical method Microscale Thermophoresis{{cite journal | author=Wienken CJ | title=Protein-binding assays in biological liquids using microscale thermophoresis. | journal=Nature Communications | year=2010 | pages=100 | issue=7 | volume=1 | doi = 10.1038/ncomms1093 | bibcode=2010NatCo...1..100W | pmid=20981028|display-authors=etal| doi-access=free }} is used.

Databases

Due to the diverse nature of the experimental techniques used in determining binding sites and to the patchy coverage of most organisms and transcription factors, there is no central database (akin to GenBank at the National Center for Biotechnology Information) for DNA binding sites. Even though NCBI contemplates DNA binding site annotation in its reference sequences (RefSeq), most submissions omit this information. Moreover, due to the limited success of bioinformatics in producing efficient DNA binding site prediction tools (large false positive rates are often associated with in-silico motif discovery / site search methods), there has been no systematic effort to computationally annotate these features in sequenced genomes.

There are, however, several private and public databases devoted to compilation of experimentally reported, and sometimes computationally predicted, binding sites for different transcription factors in different organisms. Below is a non-exhaustive table of available databases:

class="wikitable" border="1"
Name

! Organisms

! Source

! Access

! URL

PlantRegMap

| 165 plant species (e.g., Arabidopsis thaliana, Oryza sativa, Zea mays, etc.)

| Expert curation and projection

| Public

| [https://web.archive.org/web/20200513055428/http://plantregmap.cbi.pku.edu.cn/]

JASPAR

| Vertebrates, Plants, Fungi, Flies, and Worms

| Expert curation with literature support

| Public

| [http://jaspar.genereg.net]

CIS-BP

| All Eukaryotes

| Experimentally derived motifs and predictions

| Public

| [http://cisbp.ccbr.utoronto.ca/]

CollecTF

| Prokaryotes

| Literature curation

| Public

| [http://collectf.umbc.edu]

RegPrecise

| Prokaryotes

| Expert curation

| Public

| [https://web.archive.org/web/20190714083006/http://regprecise.lbl.gov/]

RegTransBase

| Prokaryotes

| Expert/literature curation

| Public

| [https://web.archive.org/web/20170510191131/http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=main]

RegulonDB

| Escherichia coli

| Expert curation

| Public

| [http://regulondb.ccg.unam.mx/] {{Webarchive|url=https://web.archive.org/web/20170507214341/http://regulondb.ccg.unam.mx/ |date=2017-05-07 }}

PRODORIC

| Prokaryotes

| Expert curation

| Public

| [http://prodoric.tu-bs.de/] {{Webarchive|url=https://web.archive.org/web/20070516233630/http://prodoric.tu-bs.de/ |date=2007-05-16 }}

TRANSFAC

| Mammals

| Expert/literature curation

| Public/Private

| [http://www.biobase-international.com/pages/index.php?id=transfac] {{Webarchive|url=https://web.archive.org/web/20081023235101/http://www.biobase-international.com/pages/index.php?id=transfac |date=2008-10-23 }}

TRED

| Human, Mouse, Rat

| Computer predictions, manual curation

| Public

| [http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=home]

DBSD

| Drosophila species

| Literature/Expert curation

| Public

| [http://rulai.cshl.org/dbsd/index.html]

HOCOMOCO

| Human, Mouse

| Literature/Expert curation

| Public

| [http://hocomoco.autosome.ru/],[https://web.archive.org/web/20170713105827/http://www.cbrc.kaust.edu.sa/hocomoco10]

MethMotif

| Human, Mouse

| Expert curation

| Public

| [http://bioinfo-csi.nus.edu.sg/methmotif/] {{Webarchive|url=https://web.archive.org/web/20191029215808/http://bioinfo-csi.nus.edu.sg/methmotif/ |date=2019-10-29 }}

Representation of DNA binding sites

A collection of DNA binding sites, typically referred to as a DNA binding motif, can be represented by a consensus sequence. This representation has the advantage of being compact, but at the expense of disregarding a substantial amount of information.{{cite journal |author= Schneider T.D. |year=2002 |title= Consensus sequence Zen |journal=Applied Bioinformatics | volume=1 |pages=111–119 | pmid=15130839 |issue= 3 |pmc= 1852464}} A more accurate way of representing binding sites is through Position Specific Frequency Matrices (PSFM). These matrices give information on the frequency of each base at each position of the DNA binding motif. PSFM are usually conceived with the implicit assumption of positional independence (different positions at the DNA binding site contribute independently to the site function), although this assumption has been disputed for some DNA binding sites.{{cite journal |doi= 10.1093/nar/30.5.1255 |author= Bulyk M.L. |author2= Johnson P.L. |author3= Church G.M. |year=2002 |title= Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors |journal=Nucleic Acids Research | volume=30 |pages=1255–1261 | pmid=11861919 |issue= 5 |pmc= 101241}} Frequency information in a PSFM can be formally interpreted under the framework of Information Theory,{{cite journal |doi= 10.1016/0022-2836(86)90165-8 |vauthors= Schneider TD, Stormo GD, Gold L, Ehrenfeucht A |year=1986 |title= Information content of binding sites on nucleotide sequences |journal=Journal of Molecular Biology | volume=188 |pages=415–431X | pmid=3525846 |issue= 3}} leading to its graphical representation as a sequence logo.

class="wikitable" border="1"
| 1

| 2

| 3

| 4

| 5

| 6

| 7

| 8

| 9

| 10

| 11

| 12

| 13

| 14

| 15

| 16

A

| 1

| 0

| 1

| 5

| 32

| 5

| 35

| 23

| 34

| 14

| 43

| 13

| 34

| 4

| 52

| 3

C

| 50

| 1

| 0

| 1

| 5

| 6

| 0

| 4

| 4

| 13

| 3

| 8

| 17

| 51

| 2

| 0

G

| 0

| 0

| 54

| 15

| 5

| 5

| 12

| 2

| 7

| 1

| 1

| 3

| 1

| 0

| 1

| 52

T

| 5

| 55

| 1

| 35

| 14

| 40

| 9

| 27

| 11

| 28

| 9

| 32

| 4

| 1

| 1

| 1

Sum

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

| 56

PSFM for the transcriptional repressor LexA as derived from 56 LexA-binding sites stored in Prodoric. Relative frequencies are obtained by dividing the counts in each cell by the total count (56)

Computational search and discovery of binding sites

In bioinformatics, one can distinguish between two separate problems regarding DNA binding sites: searching for additional members of a known DNA binding motif (the site search problem) and discovering novel DNA binding motifs in collections of functionally related sequences (the sequence motif discovery problem).{{cite journal |author= Erill I |author2= O'Neill MC |year=2009 |title= A reexamination of information theory-based methods for DNA-binding site identification |journal=BMC Bioinformatics | volume=10 |issue=1 |page=57 | pmid=19210776 |doi=10.1186/1471-2105-10-57 |pmc= 2680408 |doi-access= free }} Many different methods have been proposed to search for binding sites. Most of them rely on the principles of information theory and have available web servers (Yellaboina)(Munch), while other authors have resorted to machine learning methods, such as artificial neural networks.{{cite journal |vauthors= Bisant D, Maizel J |year=1995 |title= Identification of ribosome binding sites in Escherichia coli using neural network models |journal=Nucleic Acids Research| volume=23 |pages=1632–1639| pmid=7784221 |doi=10.1093/nar/23.9.1632 |issue= 9 |pmc= 306908}}{{cite journal |author= O'Neill M.C. |year=1991 |title= Training back-propagation neural networks to define and detect DNA-binding sites |journal=Nucleic Acids Research | volume=19 |pages=133–318 | pmid=2014171 |doi=10.1093/nar/19.2.313 |issue= 2 |pmc= 333596}} A plethora of algorithms is also available for sequence motif discovery. These methods rely on the hypothesis that a set of sequences share a binding motif for functional reasons. Binding motif discovery methods can be divided roughly into enumerative, deterministic and stochastic.{{cite book |author= Bailey T.L. |title=Bioinformatics|year=2008|chapter= Discovering Sequence Motifs | volume=452 |pages=231–251| pmid=18566768 |doi=10.1007/978-1-60327-159-2_12|series=Methods in Molecular Biology|isbn=978-1-58829-707-5|url=https://espace.library.uq.edu.au/view/UQ:174081/MIC15UQ174081.pdf}} MEME{{cite journal |author= Bailey T.L. |year=2002|title= Discovering novel sequence motifs with MEME |journal= Current Protocols in Bioinformatics | volume=2 |issue=4 | pmid=18792935 |doi=10.1002/0471250953.bi0204s00 |pages= 2.4.1–2.4.35|s2cid=205157795}} and Consensus {{cite journal |doi= 10.1073/pnas.86.4.1183 |vauthors= Stormo GD, Hartzell GW 3rd |year=1989|title= Identifying protein-binding sites from unaligned DNA fragments|journal= Proceedings of the National Academy of Sciences of the United States of America | volume=86|pages=1183–1187| pmid=2919167 |issue= 4 |pmc= 286650|bibcode = 1989PNAS...86.1183S |doi-access= free }} are classical examples of deterministic optimization, while the Gibbs sampler{{cite journal |vauthors= Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC |s2cid=3040614|year=1993|title= Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment |journal=Science | volume=262|pages=208–214 | pmid=8211139 |doi=10.1126/science.8211139 |issue= 5131|bibcode = 1993Sci...262..208L }} is the conventional implementation of a purely stochastic method for DNA binding motif discovery. Another instance of this class of methods is SeSiMCMC{{Cite journal|last=Favorov|first=A V|author2=M S Gelfand|author3=A V Gerasimova|author4=D A Ravcheev|author5=A A Mironov|author6=V J Makeev|date=2005-05-15|title=A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length|journal=Bioinformatics|volume=21|issue=10|pages=2240–2245|doi=10.1093/bioinformatics/bti336|issn=1367-4803|pmid=15728117|doi-access=free}} that is focused of weak TFBS sites with symmetry. While enumerative methods often resort to regular expression representation of binding sites, PSFM and their formal treatment under Information Theory methods are the representation of choice for both deterministic and stochastic methods. Hybrid methods, e.g. ChIPMunk{{Cite journal|last=Kulakovskiy|first=I V|author2=V A Boeva|author3=A V Favorov|author4=V J Makeev|date=2010-08-24|title=Deep and wide digging for binding motifs in ChIP-Seq data|journal=Bioinformatics|volume=26|issue=20|pages=2622–3|doi=10.1093/bioinformatics/btq488|issn=1367-4811|pmid=20736340|doi-access=free}} that combines greedy optimization with subsampling, also use PSFM. Recent advances in sequencing have led to the introduction of comparative genomics approaches to DNA binding motif discovery, as exemplified by PhyloGibbs.{{cite journal |vauthors= Das MK, Dai HK |year=2007 |title= A survey of DNA motif finding algorithms |journal=BMC Bioinformatics | volume=8|issue=Suppl 7 |page=S21 | pmid=18047721 |doi=10.1186/1471-2105-8-S7-S21 |pmc= 2099490 |doi-access=free }}{{cite journal |vauthors=Siddharthan R, Siggia ED, van Nimwegen E |title=PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny |journal=PLOS Comput Biol |volume=1 |issue=7 |pages=e67 |year=2005 |doi=10.1371/journal.pcbi.0010067 | pmid=16477324 |pmc=1309704|bibcode = 2005PLSCB...1...67S |doi-access=free }}

More complex methods for binding site search and motif discovery rely on the base stacking and other interactions between DNA bases, but due to the small sample sizes typically available for binding sites in DNA, their efficiency is still not completely harnessed. An example of such tool is the [https://archive.today/20130415144422/http://nar.oxfordjournals.org/content/38/12/e135.full ULPB]{{cite journal |vauthors= Salama RA, Stekel DJ |year=2010 |pages= e135 |issue= 12|title= Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction |volume= 38 |journal=Nucleic Acids Research |doi=10.1093/nar/gkq274 |pmc= 2896541 |pmid= 20439311}}

See also

References

{{Reflist|2}}