transposon sequencing

{{multiple issues|

{{technical|date=June 2014}}

{{update|date=May 2015}}

}}

Transposon insertion sequencing (Tn-seq) combines transposon insertional mutagenesis with massively parallel sequencing (MPS) of the transposon insertion sites to identify genes contributing to a function of interest in bacteria. The method was originally established by concurrent work in four laboratories under the acronyms HITS,Gawronski JD, Wong SM, Giannoukos G, Ward DV, Akerley BJ. Tracking insertion mutants within libraries by deep sequencing and a genome-wide screen for Haemophilus genes required in the lung. Proc Natl Acad Sci USA. 2009;106:16422–7. doi: 10.1073/pnas.0906627106.[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2792183/ PMC Free Article] INSeq,Goodman AL, McNulty NP, Zhao Y, Leip D, Mitra RD, Lozupone CA, et al. Identifying genetic determinants needed to establish a human gut symbiont in its habitat. Cell Host Microbe. 2009;6:279–89. doi: 10.1016/j.chom.2009.08.003. TraDIS,Langridge GC, Phan MD, Turner DJ, Perkins TT, Parts L, Haase J, et al. Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. Genome Res. 2009;19:2308–16. doi: 10.1101/gr.097097.109. and Tn-Seq.van Opijnen T, Bodi KL, Camilli A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat Methods. 2009;6:767–72. doi: 10.1038/nmeth.1377. Numerous variations have been subsequently developed and applied to diverse biological systems. Collectively, the methods are often termed Tn-Seq as they all involve monitoring the fitness of transposon insertion mutants via DNA sequencing approaches.

Transposons are highly regulated, discrete DNA segments that can relocate within the genome. They are universal and are found in Eubacteria, Archaea, and Eukarya, including humans. Transposons have a large influence on gene expression and can be used to determine gene function. In fact, when a transposon inserts itself in a gene, the gene's function will be disrupted.{{cite journal | vauthors = Hayes F | title = Transposon-based strategies for microbial functional genomics and proteomics | journal = Annual Review of Genetics | volume = 37 | issue = 1 | pages = 3–29 | date = 2003 | pmid = 14616054 | doi = 10.1146/annurev.genet.37.110801.142807 }} Because of that property, transposons have been manipulated for use in insertional mutagenesis.{{cite journal | vauthors = Kleckner N, Chan RK, Tye BK, Botstein D | title = Mutagenesis by insertion of a drug-resistance element carrying an inverted repetition | journal = Journal of Molecular Biology | volume = 97 | issue = 4 | pages = 561–75 | date = October 1975 | pmid = 1102715 | doi = 10.1016/s0022-2836(75)80059-3 }} The development of microbial genome sequencing was a major advance for the use of transposon mutagenesis.{{cite journal | vauthors = Smith V, Chou KN, Lashkari D, Botstein D, Brown PO | title = Functional analysis of the genes of yeast chromosome V by genetic footprinting | journal = Science | volume = 274 | issue = 5295 | pages = 2069–74 | date = December 1996 | pmid = 8953036 | doi = 10.1126/science.274.5295.2069 | bibcode = 1996Sci...274.2069S | doi-access = }}{{cite journal | vauthors = Akerley BJ, Rubin EJ, Camilli A, Lampe DJ, Robertson HM, Mekalanos JJ | title = Systematic identification of essential genes by in vitro mariner mutagenesis | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 95 | issue = 15 | pages = 8927–32 | date = July 1998 | pmid = 9671781 | pmc = 21179 | doi = 10.1073/pnas.95.15.8927 | bibcode = 1998PNAS...95.8927A | doi-access = free }} The function affected by a transposon insertion could be linked to the disrupted gene by sequencing the genome to locate the transposon insertion site. Massively parallel sequencing allows simultaneous sequencing of transposon insertion sites in large mixtures of different mutants. Therefore, genome-wide analysis is feasible if transposons are positioned throughout the genome in a mutant collection.

Transposon sequencing requires the creation of a transposon insertion library, which will contain a group of mutants that collectively have transposon insertions in all non-essential genes. The library is grown under an experimental condition of interest. Mutants with transposons inserted in genes required for growth under the test condition will diminish in frequency from the population. To identify mutants being lost, genomic sequences adjacent to the transposon ends are amplified by PCR and sequenced by MPS to determine the location and abundance of each insertion mutation. The importance of each gene for growth under the test condition is determined by comparing the abundance of each mutant before and after growth under the condition being examined. Tn-seq is useful for both the study of a single gene's fitness as well as gene interactions {{cite journal | vauthors = van Opijnen T, Bodi KL, Camilli A | title = Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms | journal = Nature Methods | volume = 6 | issue = 10 | pages = 767–72 | date = October 2009 | pmid = 19767758 | pmc = 2957483 | doi = 10.1038/nmeth.1377 }}

Signature–tagged mutagenesis (STM) is an older technique that also involves pooling transposon insertion mutants to determine the importance of the disrupted genes under selective growth conditions.{{cite journal | vauthors = Mazurkiewicz P, Tang CM, Boone C, Holden DW | title = Signature-tagged mutagenesis: barcoding mutants for genome-wide screens | journal = Nature Reviews Genetics | volume = 7 | issue = 12 | pages = 929–39 | date = December 2006 | pmid = 17139324 | doi = 10.1038/nrg1984 | s2cid = 27956117 | doi-access = free }} High-throughput versions of STM use genomic microarrays, which are less accurate and have a lower dynamic range than massively-parallel sequencing.{{cite journal | vauthors = Barquist L, Boinett CJ, Cain AK | title = Approaches to querying bacterial genomes with transposon-insertion sequencing | journal = RNA Biology | volume = 10 | issue = 7 | pages = 1161–9 | date = July 2013 | pmid = 23635712 | pmc = 3849164 | doi = 10.4161/rna.24765 }} With the invention of next generation sequencing, genomic data became increasingly available. However, despite the increase in genomic data, our knowledge of gene function remains the limiting factor in our understanding of the role genes play.{{cite journal | vauthors = Bork P | title = Powers and pitfalls in sequence analysis: the 70% hurdle | journal = Genome Research | volume = 10 | issue = 4 | pages = 398–400 | date = April 2000 | pmid = 10779480 | doi = 10.1101/gr.10.4.398 | doi-access = free }}{{cite journal | vauthors = Kasif S, Steffen M | title = Biochemical networks: the evolution of gene annotation | journal = Nature Chemical Biology | volume = 6 | issue = 1 | pages = 4–5 | date = January 2010 | pmid = 20016491 | pmc = 2907659 | doi = 10.1038/nchembio.288 }} Therefore, a need for a high throughput approach to study genotype–phenotype relationships like Tn-seq was necessary.

Methodology

Transposon sequencing begins by transducing{{clarify|date=March 2021}} bacterial populations with transposable elements{{clarify|date=March 2021}} using bacteriophages. Tn-seq{{clarify|reason=are we supposed to understand "transposon sequencing" and "Tn-seq" as being two different things? does one refer to a specific protocol for implementing the other?|date=March 2021}} uses the Himar I Mariner transposon, a common and stable{{clarify|reason=what does it mean for a transposon to be "stable"? are there "unstable" transposons? what property causes their stability or lack thereof? the sequence? the bacterium's defenses?|date=March 2021}} transposon. After transduction, the DNA is cleaved{{clarify|reason=presumably this means "cut", since that's what it says in the diagram, but I'm not sure about the jargon|date=March 2021}} and the inserted sequence amplified through PCR. The recognition sites{{clarify|reason=I just realized I don't even know what is meant by "recognition sites" here or why I should care about what they are or what they have to do with anything|date=March 2021}} for MmeI, a type IIS restriction endonuclease{{clarify|reason=what on earth is an endonuclease, much less a restriction nuclease, much less a type IIS restriction endonuclease? does it matter that the endonuclease is a restriction endonuclease? does it matter that the restriction endonuclease is a type IIS endonclease? why mention all of this?|date=March 2021}}, can be introduced by a single nucleotide change in the terminal repeats{{clarify|reason=what are "terminal repeats"?|date=March 2021}} of Mariner{{clarify|reason=does "Mariner" refer to the Himar I Mariner transposon? also it says recognition sites are introduced somewhere, introduced where? what are recognition sites? why are they introduced and why do we care that they are introduced?|date=March 2021}}.{{cite journal | vauthors = Goodman AL, McNulty NP, Zhao Y, Leip D, Mitra RD, Lozupone CA, Knight R, Gordon JI | title = Identifying genetic determinants needed to establish a human gut symbiont in its habitat | journal = Cell Host & Microbe | volume = 6 | issue = 3 | pages = 279–89 | date = September 2009 | pmid = 19748469 | pmc = 2895552 | doi = 10.1016/j.chom.2009.08.003 }} It{{clarify|reason="It" = "what"? MmeI? endonuclease? I thought the endonuclease was doing something with the terminal repeat of the transposon? Now the endonuclease is part of the transposon, or part of the terminal repeat of the transposon? So it is interacting with itself?|date=March 2021}} is located 4 base pairs before the end of the terminal repeat.

MmeI makes a 2 base pair staggered cut{{clarify|reason=What is a "staggered cut" as opposed to a "regular cut"? Does it have something to do with "cleaving"? I thought we wanted to cleave something 4 base pairs before the end of the terminal repeat. Now we are doing something with a recognition site somewhere?|date=March 2021}} 20 bases downstream{{clarify|reason=I know what this means but there has to be a less jargon-y way of saying it. DNA sequences are not literally streams or other flowing bodies of water, and the layperson shouldn't be expected to be familiar with the analogy|date=March 2021}} of the recognition site{{clarify|reason=Again, what is a recognition site, and who is doing the recognizing? IS it the recognition site of MmeI? If it is, why is it called the "recognition site" if MmeI is actually recognizing something 20 base pairs away? Shouldn't that be called the "recognition site" instead? If so, why do we have to talk about this other sequence location?|date=March 2021}}.{{cite journal | vauthors = Morgan RD, Dwinell EA, Bhatia TK, Lang EM, Luyten YA | title = The MmeI family: type II restriction-modification enzymes that employ single-strand modification for host protection | journal = Nucleic Acids Research | volume = 37 | issue = 15 | pages = 5208–21 | date = August 2009 | pmid = 19578066 | pmc = 2731913 | doi = 10.1093/nar/gkp534 }}

When MmeI digests DNA from a library{{clarify|reason="Library" is a super-jargon-y term in this context which again refers to another analogy which the layperson should not be expected to be familiar with. I have heard members of my lab use it for months now and I still don't understand what they're referring to.|date=March 2021}} of transposon insertion mutants{{clarify|reason=This refers to bacteria which have been mutated by inserting transposons into their genome?|date=March 2021}}, fragmented DNA including the left and right transposon and 16 base pair of surrounding genomic DNA is produced. The 16 base pair fragment is enough to determine the location of the transposon insertion in the bacterial genome. The ligation{{clarify|reason=What does ligation mean? Probably unnecessary super jargon-y term. At the very least link to an article explaining it|date=March 2021}} of the adaptor{{clarify|reason=Again, more jargon, what is an adaptor, I hear it all the time but it does not make any sense to me and seems to be another unnecessarily complicating and obfuscating term|date=March 2021}} is facilitated by the 2 base overhang{{clarify|reason=What on earth is a "base overhang"? Why should we care? Does it matter for understanding the method whether there is anything facilitating anything? Or is that just a random organic chemistry fact that will be irrelevant in future iterations/versions of this overarching method? Why should the reader care|date=March 2021}}. A primer{{clarify|reason=I have some vague idea of what a "primer" is, and what it means for a "primer" to be "specific to X" means, but the average reader probably does not. Anyway does it really matter that a primer was used to amplify the sequence via PCR, since you know basically all PCR amplifications are performed using sequence-specific primers? What is exactly gained in understanding by adding this extraneous information? At the very least link to a Wikipedia article defining "prmer" and PCR|date=March 2021}} specific to the adaptor and a primer specific to the transposon are used to amplify the sequence via PCR. The 120 base pair product{{clarify|reason="Product"? Product of what? Something I can buy on Amazon? Presumably they mean the product of some chemical reaction, and presumably the PCR chemical reaction, but it's not clear, and if that's really what is meant, then why not say "the PCR-amplified sequences"|date=March 2021}} is then isolated using agarose gel{{clarify|reason=Isn't all PCR amplified DNA isolated using agarose gel? This is confusing because generalities of PCR are being discussed as if they were specific to this method. Thus even for readers familiar with PCR who would understand this better, it's still confusing. Explicitly identify what is unique to this method and which parts are subroutines that are commonly done things. Otherwise the definition/meaning of the method is a formless blob.|date=March 2021}} or PAGE{{clarify|reason=I am not even going to bother to pretend to understand what this acronym refers to. Googling "page" will almost certainly not lead to any useful results. Again a Wikipedia link or some other external reference would be helpful. That being said, it seems rather unlikely that any of this information is necessary to understand what's going on so why include it?|date=March 2021}} purification. Massively parallel sequencing is then used to determine the sequences of the flanking 16 base pairs{{clarify|reason=What does it mean for base pairs to "flank"? What are they flanking in the first place? And why do we care about these flanking sequences? Shouldn't we care about the transposons or the genes or something?|date=March 2021}}.

Gene function is inferred after looking at the effects of the insertion on gene function under certain conditions{{clarify|reason=Isn't this supposed to be the whole point of the method and something that paragraphs were spent trying to describe above? Why is this only allocated a relic sentence at the end of the paragraph? How does it relate to any of the biochemistry mentioned above? I.e. how does any of the above allow us to look at the effects of the insertion of gene function?|date=March 2021}}.

Advantages and disadvantages

Unlike high-throughput insertion track by deep sequencing (HITS) and transposon-directed insertion site sequencing (TraDIS){{clarify|reason=Didn't the introduction paragraph say that TraDIS is a type of transposon sequencing? Is this article about Tn-Seq or transposon sequencing? Again are Tn-seq and transposon sequencing intended to be understood as synonymous, or is Tn-seq just one choice among a number of protocols implementing transposon sequencing?|date=March 2021}}, Tn-seq is specific to the Himar I Mariner transposon, and cannot be applied to other transposons or insertional elements. However, the protocol for Tn-seq{{clarify|reason=Is the article about "transposon sequencing" or about "Tn-seq"? Are those two terms even synonymous? Is this section talking about advantages of Tn-seq over other protocols for achieving "transposon sequencing" (in which case the name of the section should be different or this should be in a different article) or is it about the advantages or disadvantages of "transposon sequencing" for solving/identifying certain problems compared to other methods?|date=March 2021}} is less time intensive{{citation needed|date=March 2021}}. HITS and TraDIS{{clarify|reason=Didn't the introduction paragraph say that HITS and TraDIS are types of transposon sequencing? Is this article about Tn-Seq or transposon sequencing? Again are Tn-seq and transposon sequencing intended to be understood as synonymous, or is Tn-seq just one choice among a number of protocols implementing transposon sequencing?|date=March 2021}} use a DNA shearing{{clarify|reason=Is DNA shearing the same as DNA cleaving and DNA cutting and "restriction"? Could this article just please choose one simple and non-jargony term and then stick with it throughout?|date=March 2021}} technique that produce a range of PCR product sizes that could cause shorter DNA templates being preferentially amplified over longer templates. Tn-seq produces a product that is uniform in size, therefore reducing the possibility of PCR bias.

Tn-seq can be used to identify both the fitness of single genes and to map gene interactions in microorganisms. Existing methods for these types of study are dependent on preexisting genomic microarrays or gene knockout arrays, whereas Tn-seq is not. Tn-seq's utilization of massively parallel sequencing makes this technique easily reproducible, sensitive, and robust.{{clarify|reason=Is this article about transposon sequencing or Tn-seq? If it is about Tn-seq, the section should be called "advantages" because it doesn't really list any disadvantages, as far as I can tell|date=March 2021}}

Applications

Tn-seq has proven to be a useful technique for identifying new gene functions.{{clarify|reason=Again, is the article about transposon sequencing or about Tn-seq? Are they the same thing? It sounds like this was written by some author of a Tn-seq paper, and sounds really biased and not objective|date=March 2021}} The highly sensitive nature of Tn-seq{{citation needed|date=March 2021}} can be used to determine phenotype-genotype relationships that may have been deemed insignificant by less sensitive methods. Tn-seq identified essential genes and pathways that are important for the utilization of cholesterol in Mycobacterium tuberculosis.{{cite journal | vauthors = Griffin JE, Gawronski JD, Dejesus MA, Ioerger TR, Akerley BJ, Sassetti CM | title = High-resolution phenotypic profiling defines genes essential for mycobacterial growth and cholesterol catabolism | journal = PLOS Pathogens | volume = 7 | issue = 9 | pages = e1002251 | date = September 2011 | pmid = 21980284 | pmc = 3182942 | doi = 10.1371/journal.ppat.1002251 | doi-access = free }}

Tn-seq has been used to study higher order genome organization using gene interactions.{{citation needed|date=March 2021}} Genes function as a highly linked network{{citation needed|date=March 2021}}. Therefore, in order to study a gene's impact on phenotype, gene interactions must also be considered{{citation needed|date=March 2021}}. These gene networks can be studied by screening for synthetic lethality and gene interactions where a double mutant shows an unexpected fitness value compared to each individual mutant{{clarify|reason=Wasn't this already discussed in multiple sections above? Can we choose a single explanation of this and rely on it throughout the rest of the article please? Or is there some additional subtlety introduced in this section which I'm not aware of?|date=March 2021}}{{citation needed|date=March 2021}}. Tn-seq was used to determine genetic interactions between five query genes and the rest of the genome in Streptococcus pneumoniae, which revealed both aggravating and alleviating genetic interactions.{{clarify|reason=What on earth does it mean for a genetic interaction to be aggravating or alleviating? Do genes suddenly now have feelings? And are genetic interactions supposed to be understood to mean the same thing as gene interactions? If so, why not use the same term throughout the article?|date=March 2021}}

Tn-seq used in combination with RNA-seq can be utilized to examine the role of non-coding DNA regions.{{cite journal | vauthors = Mann B, van Opijnen T, Wang J, Obert C, Wang YD, Carter R, McGoldrick DJ, Ridout G, Camilli A, Tuomanen EI, Rosch JW | title = Control of virulence by small RNAs in Streptococcus pneumoniae | journal = PLOS Pathogens | volume = 8 | issue = 7 | pages = e1002788 | date = 2012 | pmid = 22807675 | pmc = 3395615 | doi = 10.1371/journal.ppat.1002788 | doi-access = free }}

References