genome mining

{{Short description|Research process}}

File:Bioinformatics.png

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions.{{cite journal | vauthors = Albarano L, Esposito R, Ruocco N, Costantini M | title = Genome Mining as New Challenge in Natural Products Discovery | journal = Marine Drugs | volume = 18 | issue = 4 | pages = 199 | date = April 2020 | pmid = 32283638 | pmc = 7230286 | doi = 10.3390/md18040199 | doi-access = free }} It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data (represented by DNA sequences and annotations) accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry,{{cite journal | vauthors = Hannigan GD, Prihoda D, Palicka A, Soukup J, Klempir O, Rampula L, Durcak J, Wurst M, Kotowski J, Chang D, Wang R, Piizzi G, Temesi G, Hazuda DJ, Woelk CH, Bitton DA | display-authors = 6 | title = A deep learning genome-mining strategy for biosynthetic gene cluster prediction | journal = Nucleic Acids Research | volume = 47 | issue = 18 | pages = e110 | date = October 2019 | pmid = 31400112 | pmc = 6765103 | doi = 10.1093/nar/gkz654 }}{{cite journal | vauthors = Lee N, Hwang S, Kim J, Cho S, Palsson B, Cho BK | title = Mini review: Genome mining approaches for the identification of secondary metabolite biosynthetic gene clusters in Streptomyces | journal = Computational and Structural Biotechnology Journal | volume = 18 | pages = 1548–1556 | date = 2020-01-01 | pmid = 32637051 | pmc = 7327026 | doi = 10.1016/j.csbj.2020.06.024 }} such as discovering novel natural products.{{cite journal | vauthors = Challis GL | title = Genome mining for novel natural product discovery | journal = Journal of Medicinal Chemistry | volume = 51 | issue = 9 | pages = 2618–2628 | date = May 2008 | pmid = 18393407 | doi = 10.1021/jm700948z }}

History

In the mid- to late 1980s, researchers have increasingly focused on genetic studies with the advancing sequencing technologies.{{cite journal | vauthors = Bains W, Smith GC | title = A novel method for nucleic acid sequence determination | journal = Journal of Theoretical Biology | volume = 135 | issue = 3 | pages = 303–307 | date = December 1988 | pmid = 3256722 | doi = 10.1016/S0022-5193(88)80246-7 | bibcode = 1988JThBi.135..303B }} The GenBank database was established in 1982 for the collection, management, storage, and distribution of DNA sequence data due to the increasing availability of DNA sequences. With the increasing number of genetic data, biotechnological companies have been able to use human DNA sequence to develop protein and antibody drugs through genome mining since 1992.{{cite journal | vauthors = Cook-Deegan R, Heaney C | title = Patents in genomics and human genetics | journal = Annual Review of Genomics and Human Genetics | volume = 11 | issue = 1 | pages = 383–425 | date = 2010-09-01 | pmid = 20590431 | pmc = 2935940 | doi = 10.1146/annurev-genom-082509-141811 }} In the late 1990s, many companies, such as Amgen, Immunec, Genentech were able to develop drugs that progressed to the clinical stage by adopting genome mining.{{cite journal | vauthors = Ziemert N, Alanjary M, Weber T | title = The evolution of genome mining in microbes - a review | journal = Natural Product Reports | volume = 33 | issue = 8 | pages = 988–1005 | date = August 2016 | pmid = 27272205 | doi = 10.1039/C6NP00025H | doi-access = free }} Since the Human Genome Project was completed in the early 2000, researchers have been sequencing the genomes of many microorganisms.{{cite journal | vauthors = Omura S, Ikeda H, Ishikawa J, Hanamoto A, Takahashi C, Shinose M, Takahashi Y, Horikawa H, Nakazawa H, Osonoe T, Kikuchi H, Shiba T, Sakaki Y, Hattori M | display-authors = 6 | title = Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 98 | issue = 21 | pages = 12215–12220 | date = October 2001 | pmid = 11572948 | pmc = 59794 | doi = 10.1073/pnas.211433198 | bibcode = 2001PNAS...9812215O | doi-access = free }} Subsequently, many of these genomes have been carefully studied to identify new genes and biosynthetic pathways.{{cite journal | vauthors = Tang X, Li J, Millán-Aguiñaga N, Zhang JJ, O'Neill EC, Ugalde JA, Jensen PR, Mantovani SM, Moore BS | display-authors = 6 | title = Identification of Thiotetronic Acid Antibiotic Biosynthetic Pathways by Target-directed Genome Mining | journal = ACS Chemical Biology | volume = 10 | issue = 12 | pages = 2841–2849 | date = December 2015 | pmid = 26458099 | pmc = 4758359 | doi = 10.1021/acschembio.5b00658 }}

Algorithms

As large quantities of genomic sequence data began to accumulate in public databases, genetic algorithms became important to decipher the enormous collection of genomic data. They are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection.{{cite journal | vauthors = Brandon MC, Wallace DC, Baldi P | title = Data structures and compression algorithms for genomic sequence data | journal = Bioinformatics | volume = 25 | issue = 14 | pages = 1731–1738 | date = July 2009 | pmid = 19447783 | pmc = 2705231 | doi = 10.1093/bioinformatics/btp319 }} The followings are commonly used genetic algorithms:

  • AntiSMASH (Antibiotics and Secondary Metabolite Analysis Shell) addresses secondary metabolite genome pipelines.{{cite journal | vauthors = Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R | display-authors = 6 | title = antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences | journal = Nucleic Acids Research | volume = 39 | issue = Web Server issue | pages = W339–W346 | date = July 2011 | pmid = 21672958 | pmc = 3125804 | doi = 10.1093/nar/gkr466 }}
  • PRISM (Prediction Informatics for Secondary Metabolites){{cite web | url = http://prism.adapsyn.com | title = PRISM | publisher = Adapsyn Bioscience }} is a combinatorial approach to chemical structure prediction for genetically encoded nonribosomal peptides and type I and II polyketides.{{cite journal | vauthors = Skinnider MA, Johnston CW, Gunabalasingam M, Merwin NJ, Kieliszek AM, MacLellan RJ, Li H, Ranieri MR, Webster AL, Cao MP, Pfeifle A, Spencer N, To QH, Wallace DP, Dejong CA, Magarvey NA | display-authors = 6 | title = Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences | journal = Nature Communications | volume = 11 | issue = 1 | pages = 6058 | date = November 2020 | pmid = 33247171 | pmc = 7699628 | doi = 10.1038/s41467-020-19986-1 | bibcode = 2020NatCo..11.6058S }}
  • SIM (Statistically based sequence similarity) method, such as FASTA or PSI-BLAST{{Broken anchor|date=2024-06-29|bot=User:Cewbot/log/20201008/configuration|target_link=BLAST (biotechnology)#Program|reason= The anchor (Program) has been deleted.}}, infer orthologous homology.{{cite journal | vauthors = King RD, Wise PH, Clare A | title = Confirmation of data mining based predictions of protein function | journal = Bioinformatics | volume = 20 | issue = 7 | pages = 1110–1118 | date = May 2004 | pmid = 14764546 | doi = 10.1093/bioinformatics/bth047 | doi-access = free }}
  • BLAST (Basic local alignment search tool) is an approach for rapid sequence comparison.{{cite journal | vauthors = Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ | title = Basic local alignment search tool | journal = Journal of Molecular Biology | volume = 215 | issue = 3 | pages = 403–410 | date = October 1990 | pmid = 2231712 | doi = 10.1016/S0022-2836(05)80360-2 }}

Applications

Genome mining applies on the discovery of natural product by facilitating the characterization of novel molecules and biosynthetic pathways.{{cite journal | vauthors = Medema MH, de Rond T, Moore BS | title = Mining genomes to illuminate the specialized chemistry of life | journal = Nature Reviews. Genetics | volume = 22 | issue = 9 | pages = 553–571 | date = September 2021 | pmid = 34083778 | pmc = 8364890 | doi = 10.1038/s41576-021-00363-7 }}

= Natural product discovery =

The production of natural products is regulated by the biosynthetic gene clusters (BGCs) encoded in the microorganism.{{cite journal | vauthors = Rutledge PJ, Challis GL | title = Discovery of microbial natural products by activation of silent biosynthetic gene clusters | journal = Nature Reviews. Microbiology | volume = 13 | issue = 8 | pages = 509–523 | date = August 2015 | pmid = 26119570 | doi = 10.1038/nrmicro3496 | s2cid = 6474118 }} By adopting genome mining, the BGCs that produce the target natural product can be predicted.{{cite journal | vauthors = Belknap KC, Park CJ, Barth BM, Andam CP | title = Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria | journal = Scientific Reports | volume = 10 | issue = 1 | pages = 2003 | date = February 2020 | pmid = 32029878 | pmc = 7005152 | doi = 10.1038/s41598-020-58904-9 | bibcode = 2020NatSR..10.2003B }} Some important enzymes responsible for the formation of natural products are polyketide synthases (PKS), non-ribosomal peptide synthases (NRPS), ribosomally and post-translationally modified peptides (RiPPs), and terpenoids, and many more.{{cite journal | vauthors = Hoffmeister D, Keller NP | title = Natural products of filamentous fungi: enzymes, genes, and their regulation | journal = Natural Product Reports | volume = 24 | issue = 2 | pages = 393–416 | date = April 2007 | pmid = 17390002 | doi = 10.1039/B603084J }} Mining for enzymes, researchers can figure out the classes that BGCs encode and compare target gene clusters to known gene clusters.{{cite journal | vauthors = Micallef ML, D'Agostino PM, Sharma D, Viswanathan R, Moffitt MC | title = Genome mining for natural product biosynthetic gene clusters in the Subsection V cyanobacteria | journal = BMC Genomics | volume = 16 | issue = 1 | pages = 669 | date = September 2015 | pmid = 26335778 | pmc = 4558948 | doi = 10.1186/s12864-015-1855-z | doi-access = free }} To verify the relation between the BGCs and natural products, the target BGCs can be expressed by suitable host through the use of molecular cloning.{{cite journal | vauthors = Gomez-Escribano JP, Bibb MJ | title = Heterologous expression of natural product biosynthetic gene clusters in Streptomyces coelicolor: from genome mining to manipulation of biosynthetic pathways | journal = Journal of Industrial Microbiology & Biotechnology | volume = 41 | issue = 2 | pages = 425–431 | date = February 2014 | pmid = 24096958 | doi = 10.1007/s10295-013-1348-5 | s2cid = 15215660 }}

Databases and tools

Genetic data has been accumulated in databases. Researchers are able to utilize algorithms to decipher the data accessible from databases for the discovery of new processes, targets, and products. The following are databases and tools:

  • GenBank database provides genomic datasets for analysis.{{cite journal | vauthors = Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I | title = GenBank | journal = Nucleic Acids Research | volume = 49 | issue = D1 | pages = D92–D96 | date = January 2021 | pmid = 33196830 | pmc = 7778897 | doi = 10.1093/nar/gkaa1023 }}
  • UCSC Genome Browser
  • AntiSMASH-DB{{cite web | url = https://antismash-db.secondarymetabolites.org/ | title = AntiSMASH-DB }}{{cite web | url = https://img.jgi.doe.gov/cgi-bin/abc/main.cgi | title = IMG-ABC }} allows comparing the sequences of newly sequenced BGCs against those of previously predicted and experimentally characterized ones.{{cite journal | vauthors = Palaniappan K, Chen IA, Chu K, Ratner A, Seshadri R, Kyrpides NC, Ivanova NN, Mouncey NJ | display-authors = 6 | title = IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase | journal = Nucleic Acids Research | volume = 48 | issue = D1 | pages = D422–D430 | date = January 2020 | pmid = 31665416 | pmc = 7145673 | doi = 10.1093/nar/gkz932 }}
  • BIG-FAM {{cite web | url = https://bigfam.bioinformatics.nl/ | title = BIG-FAM | publisher = }} is a biosynthetic gene cluster family database.{{cite journal | vauthors = Kautsar SA, Blin K, Shaw S, Weber T, Medema MH | title = BiG-FAM: the biosynthetic gene cluster families database | journal = Nucleic Acids Research | volume = 49 | issue = D1 | pages = D490–D497 | date = January 2021 | pmid = 33010170 | pmc = 7778980 | doi = 10.1093/nar/gkaa812 }}
  • DoBISCUIT{{cite web | url = https://www.nite.go.jp/en/nbrc/genome/dobiscuit.html | title = DoBISCUIT}} is a database of secondary metabolite biosynthetic gene clusters.{{cite journal | vauthors = Ichikawa N, Sasagawa M, Yamamoto M, Komaki H, Yoshida Y, Yamazaki S, Fujita N | title = DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters | journal = Nucleic Acids Research | volume = 41 | issue = Database issue | pages = D408–D414 | date = January 2013 | pmid = 23185043 | pmc = 3531092 | doi = 10.1093/nar/gks1177 }}
  • MIBiG (Minimum Information about a Biosynthetic Gene cluster specification){{cite web | url = https://mibig.secondarymetabolites.org/ | title = MIBiG }} provides a standard for annotations and metadata on biosynthetic gene clusters and their molecular products.{{cite journal | vauthors = Kautsar SA, Blin K, Shaw S, Navarro-Muñoz JC, Terlouw BR, van der Hooft JJ, van Santen JA, Tracanna V, Suarez Duran HG, Pascal Andreu V, Selem-Mojica N, Alanjary M, Robinson SL, Lund G, Epstein SC, Sisto AC, Charkoudian LK, Collemare J, Linington RG, Weber T, Medema MH | display-authors = 6 | title = MIBiG 2.0: a repository for biosynthetic gene clusters of known function | journal = Nucleic Acids Research | volume = 48 | issue = D1 | pages = D454–D458 | date = January 2020 | pmid = 31612915 | pmc = 7145714 | doi = 10.1093/nar/gkz882 }}
  • Interactive tree of life (iTOL){{cite web | url = https://itol.embl.de/ | title = iTOL | publisher = }} is a web-based tool for the display, manipulation and annotation of phylogenetic trees.{{cite journal | vauthors = Letunic I, Bork P | title = Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees | journal = Nucleic Acids Research | volume = 44 | issue = W1 | pages = W242–W245 | date = July 2016 | pmid = 27095192 | pmc = 4987883 | doi = 10.1093/nar/gkw290 }}

References