Virome analysis

{{Short description|Study of viral material}}

Virome analysis refers to the study of virome, collection of all viral material found in an organism or ecosystem.{{Cite journal |last1=Breitbart |first1=Mya |last2=Salamon |first2=Peter |last3=Andresen |first3=Bjarne |last4=Mahaffy |first4=Joseph M. |last5=Segall |first5=Anca M. |last6=Mead |first6=David |last7=Azam |first7=Farooq |last8=Rohwer |first8=Forest |date=2002-10-16 |title=Genomic analysis of uncultured marine viral communities |journal=Proceedings of the National Academy of Sciences |volume=99 |issue=22 |pages=14250–14255 |doi=10.1073/pnas.202488399 |doi-access=free |pmid=12384570 |pmc=137870 |bibcode=2002PNAS...9914250B |issn=0027-8424}} Viromes are incredibly diverse and complex,{{Cite journal |last1=Liang |first1=Guanxiang |last2=Bushman |first2=Frederic D. |date=2021-03-30 |title=The human virome: assembly, composition and host interactions |url=https://doi.org/10.1038/s41579-021-00536-5 |journal=Nature Reviews Microbiology |volume=19 |issue=8 |pages=514–527 |doi=10.1038/s41579-021-00536-5 |issn=1740-1526 |pmc=8008777 |pmid=33785903}} and are often poorly characterized.{{Cite journal |last1=Wommack |first1=K. Eric |last2=Bhavsar |first2=Jaysheel |last3=Polson |first3=Shawn W. |last4=Chen |first4=Jing |last5=Dumas |first5=Michael |last6=Srinivasiah |first6=Sharath |last7=Furman |first7=Megan |last8=Jamindar |first8=Sanchita |last9=Nasko |first9=Daniel J. |date=2012-07-27 |title=VIROME: a standard operating procedure for analysis of viral metagenome sequences |url=https://doi.org/10.4056/sigs.2945050 |journal=Standards in Genomic Sciences |volume=6 |issue=3 |pages=427–439 |doi=10.4056/sigs.2945050 |pmid=23407591 |pmc=3558967 |bibcode=2012SGenS...6..421W |issn=1944-3277}} Since viruses rely on a host system for persistence and replication,{{Cite journal |last1=Randall |first1=Richard E |last2=Griffin |first2=Diane E |date=April 2017 |title=Within host RNA virus persistence: mechanisms and consequences |journal=Current Opinion in Virology |language=en |volume=23 |pages=35–42 |doi=10.1016/j.coviro.2017.03.001 |pmc=5474179 |pmid=28319790}} unique host-virus and virus-microbiome interactions have been observed.{{Cite journal |last=Handley |first=Scott A. |date=2016-04-01 |title=The virome: a missing component of biological interaction networks in health and disease |journal=Genome Medicine |volume=8 |issue=1 |page=32 |doi=10.1186/s13073-016-0287-y |doi-access=free |pmid=27037032 |pmc=4818473 |issn=1756-994X}} In some cases, viruses are capable of persisting within certain environmental matrices prior to infecting a host organism.{{Cite journal |last1=Kormuth |first1=Karen A. |last2=Lin |first2=Kaisen |last3=Qian |first3=Zhihong |last4=Myerburg |first4=Michael M. |last5=Marr |first5=Linsey C. |last6=Lakdawala |first6=Seema S. |date=2019-08-21 |title=Environmental Persistence of Influenza Viruses Is Dependent upon Virus Type and Host Origin |journal=mSphere |volume=4 |issue=4 |pages=10.1128/msphere.00552–19 |doi=10.1128/msphere.00552-19 |pmc=6706471 |pmid=31434749}} These interactions contribute to the overall health and disease of an individual either through infecting the host, or indirectly through modulating microbial communities (bacteriophages). Environmental virome samples include matrices such as soil, aquatic, wastewater, and fomites, can provide insights into the abundance, role and fitness of viruses across different ecological settings.

Virome anlaysis utilizes both molecular biology and computational techniques such as DNA sequencing, metagenomics, machine learning, and bioinformatics.

History

The first virome analysis was performed in 2002 investigating virus composition of seawater samples collected off the coast of California. More than 65% of the viral sequences had not been seen before, highlighting the viral diversity of environmental viromes. Between 2003 and 2006, similar metagenomic experiments in human fecal samples exploring the human virome yielded comparable rates of viral diversity including an abundance of viral 'dark matter'.{{Cite journal |last1=Breitbart |first1=Mya |last2=Hewson |first2=Ian |last3=Felts |first3=Ben |last4=Mahaffy |first4=Joseph M. |last5=Nulton |first5=James |last6=Salamon |first6=Peter |last7=Rohwer |first7=Forest |date=2003-10-15 |title=Metagenomic Analyses of an Uncultured Viral Community from Human Feces |url=https://doi.org/10.1128/jb.185.20.6220-6223.2003 |journal=Journal of Bacteriology |volume=185 |issue=20 |pages=6220–6223 |doi=10.1128/jb.185.20.6220-6223.2003 |pmid=14526037 |pmc=225035 |issn=0021-9193}}{{Cite journal |last1=Zhang |first1=Tao |last2=Breitbart |first2=Mya |last3=Lee |first3=Wah Heng |last4=Run |first4=Jin-Quan |last5=Wei |first5=Chia Lin |last6=Soh |first6=Shirlena Wee Ling |last7=Hibberd |first7=Martin L |last8=Liu |first8=Edison T |last9=Rohwer |first9=Forest |last10=Ruan |first10=Yijun |date=2005-12-20 |title=RNA Viral Community in Human Feces: Prevalence of Plant Pathogenic Viruses |journal=PLOS Biology |volume=4 |issue=1 |pages=e3 |doi=10.1371/journal.pbio.0040003 |doi-access=free |pmid=16336043 |pmc=1310650 |issn=1545-7885}} These early studies relied on Sanger sequencing and were limited in both throughput and sequencing depth but supported the emergence of virome analysis. The development of next-generation sequencing (NGS) greatly expanded virome analysis capabilities and knowledge on virome diversity.{{Cite journal |last1=Dolja |first1=Valerian V. |last2=Koonin |first2=Eugene V. |date=January 2018 |title=Metagenomics reshapes the concepts of RNA virus evolution by revealing extensive horizontal virus transfer |url=https://doi.org/10.1016/j.virusres.2017.10.020 |journal=Virus Research |volume=244 |pages=36–52 |doi=10.1016/j.virusres.2017.10.020 |pmid=29103997 |pmc=5801114 |issn=0168-1702}} Metagenomic shotgun sequencing is often used in virome studies as an unbiased approach for sequencing the total viral communities of the sample. This sequencing approach produces shorter reads (~100 - 300 bp) but can generate millions of reads drastically improving the sequencing depth and coverage. These metagenomic studies allow for viral discovery, classification, and exploration of host-virus interactions, but are greatly limited by the computational analysis.{{Cite journal |last1=Tyson |first1=Gene W. |last2=Chapman |first2=Jarrod |last3=Hugenholtz |first3=Philip |last4=Allen |first4=Eric E. |last5=Ram |first5=Rachna J. |last6=Richardson |first6=Paul M. |last7=Solovyev |first7=Victor V. |last8=Rubin |first8=Edward M. |last9=Rokhsar |first9=Daniel S. |last10=Banfield |first10=Jillian F. |date=2004-02-01 |title=Community structure and metabolism through reconstruction of microbial genomes from the environment |url=https://doi.org/10.1038/nature02340 |journal=Nature |volume=428 |issue=6978 |pages=37–43 |doi=10.1038/nature02340 |pmid=14961025 |bibcode=2004Natur.428...37T |issn=0028-0836}}{{Cite journal |last1=Segata |first1=Nicola |last2=Boernigen |first2=Daniela |last3=Tickle |first3=Timothy L |last4=Morgan |first4=Xochitl C |last5=Garrett |first5=Wendy S |last6=Huttenhower |first6=Curtis |date=January 2013 |title=Computational meta'omics for microbial community studies |url=https://doi.org/10.1038/msb.2013.22 |journal=Molecular Systems Biology |volume=9 |issue=1 |page=666 |doi=10.1038/msb.2013.22 |pmid=23670539 |pmc=4039370 |issn=1744-4292}}

File:Virome metagenomic analysis.jpg

Traditional virome analysis

The output of virome metagenomic studies using shotgun sequencing is hundreds of thousands or even millions of short reads (~100 - 300 bp). These reads undergo quality control checkpoints using tools to assess sequence read quality, read trimming and host depletion to prepare the viral sequences for assembly and alignment. Reference-guided de novo assembly is the most popular method for genome assembly in virome analysis.{{Cite journal |last1=Quince |first1=Christopher |last2=Walker |first2=Alan W |last3=Simpson |first3=Jared T |last4=Loman |first4=Nicholas J |last5=Segata |first5=Nicola |date=September 2017 |title=Shotgun metagenomics, from sampling to analysis |url=https://doi.org/10.1038/nbt.3935 |journal=Nature Biotechnology |volume=35 |issue=9 |pages=833–844 |doi=10.1038/nbt.3935 |pmid=28898207 |hdl=2164/10167 |issn=1087-0156}} Sequencing reads are assembled into overlapping subsequences of a fixed length k (k-mers) known as contigs.{{Cite journal |last1=Bankevich |first1=Anton |last2=Nurk |first2=Sergey |last3=Antipov |first3=Dmitry |last4=Gurevich |first4=Alexey A. |last5=Dvorkin |first5=Mikhail |last6=Kulikov |first6=Alexander S. |last7=Lesin |first7=Valery M. |last8=Nikolenko |first8=Sergey I. |last9=Pham |first9=Son |last10=Prjibelski |first10=Andrey D. |last11=Pyshkin |first11=Alexey V. |last12=Sirotkin |first12=Alexander V. |last13=Vyahhi |first13=Nikolay |last14=Tesler |first14=Glenn |last15=Alekseyev |first15=Max A. |date=May 2012 |title=SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing |url=https://doi.org/10.1089/cmb.2012.0021 |journal=Journal of Computational Biology |volume=19 |issue=5 |pages=455–477 |doi=10.1089/cmb.2012.0021 |pmid=22506599 |pmc=3342519 |issn=1066-5277}} Contigs are aligned to reference databases for sequence similarity to assign viral taxonomy of the sample. This method, however, requires prior knowledge of viral taxonomy and is greatly impacted by the lack of robust references available.{{Cite journal |last1=Ren |first1=Jie |last2=Ahlgren |first2=Nathan A. |last3=Lu |first3=Yang Young |last4=Fuhrman |first4=Jed A. |last5=Sun |first5=Fengzhu |date=2017-07-06 |title=VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data |journal=Microbiome |volume=5 |issue=1 |page=69 |doi=10.1186/s40168-017-0283-5 |doi-access=free |pmid=28683828 |pmc=5501583 |issn=2049-2618}} Current databases tend to be biased towards clinically relevant and cultivable viruses, notably reducing the analysis power. As a result, it is believed that our understanding of virus classification and taxonomy greatly underestimates the virome's true diversity.

File:Reference-guided Virome analysis.jpg

Another limitation is the ability of the assembly tools to assemble low coverage, low abundance viruses. Low abundance viruses may end up fragmented if sequencing depth is insufficient. Tools can adjust for shorter k-mer lengths to include fragmented viral reads but this can introduce issues with contig ambiguity. This limitation leads to considerable proportions of uncharacterized viral sequencing reads or 'viral dark matter'. New analysis software that harnesses machine learning have emerged to improve the deficiencies of reference database similarity approaches.

Deep learning in virome analysis

Deep learning has demonstrated advantages in many other applications within the genomics field, often surpassing traditional, state-of-the-art computational methods in terms of predictive performance, especially when trained with sufficient data.{{Citation |last1=Yue |first1=Tianwei |title=Deep Learning for Genomics: A Concise Overview |date=2018 |arxiv=1802.00810 |last2=Wang |first2=Yuanxin |last3=Zhang |first3=Longxiang |last4=Gu |first4=Chunming |last5=Xue |first5=Haoru |last6=Wang |first6=Wenping |last7=Lyu |first7=Qi |last8=Dun |first8=Yujie}} Deep learning supports multitask learning, which is an approach where the model shares knowledge across a primary task and one or more secondary tasks, improving the versatility of tools.{{Cite book |last1=Seltzer |first1=Michael L. |title=2013 IEEE International Conference on Acoustics, Speech and Signal Processing |last2=Droppo |first2=Jasha |date=May 2013 |isbn=978-1-4799-0356-6 |pages=6965–6969 |chapter=Multi-task learning in deep neural networks for improved phoneme recognition |doi=10.1109/ICASSP.2013.6639012 |chapter-url=https://ieeexplore.ieee.org/document/6639012}} Moreover, with multi-view learning, which facilitates the integration of multiple data types–such as sequence data, DNA methylation, gene expression, and more–can produce more accurate and robust predictions.

Virome classification and analysis present a unique challenge due to the rapid evolution of viral genomes, which often leads to high sequence divergence within a species.{{Cite journal |last1=Elbasir |first1=Abdurrahman |last2=Ye |first2=Ying |last3=Schäffer |first3=Daniel E. |last4=Hao |first4=Xue |last5=Wickramasinghe |first5=Jayamanna |last6=Tsingas |first6=Konstantinos |last7=Lieberman |first7=Paul M. |last8=Long |first8=Qi |last9=Morris |first9=Quaid |last10=Zhang |first10=Rugang |last11=Schäffer |first11=Alejandro A. |last12=Auslander |first12=Noam |date=2023-02-11 |title=A deep learning approach reveals unexplored landscape of viral expression in cancer |url=https://doi.org/10.1038/s41467-023-36336-z |journal=Nature Communications |volume=14 |issue=1 |page=785 |bibcode=2023NatCo..14..785E |doi=10.1038/s41467-023-36336-z |issn=2041-1723 |pmc=9922274 |pmid=36774364}} Deep learning models attempt to address this challenge and can recognize complex patterns in viral sequence fragments while handling high-dimensional data.{{Cite journal |last1=Sukhorukov |first1=Grigorii |last2=Khalili |first2=Maryam |last3=Gascuel |first3=Olivier |last4=Candresse |first4=Thierry |last5=Marais-Colombel |first5=Armelle |last6=Nikolski |first6=Macha |date=2022-05-13 |title=VirHunter: A Deep Learning-Based Method for Detection of Novel RNA Viruses in Plant Sequencing Data |journal=Frontiers in Bioinformatics |volume=2 |doi=10.3389/fbinf.2022.867111 |issn=2673-7647 |pmc=9580956 |pmid=36304258 |doi-access=free}}

= Viral identification =

Traditional database-based tools like BLAST rely on reference data and can struggle with highly divergent viruses with no known homologs across previously identified in existing genomes{{Cite journal |last1=Tampuu |first1=Ardi |last2=Bzhalava |first2=Zurab |last3=Dillner |first3=Joakim |last4=Vicente |first4=Raul |date=2019-04-08 |title=ViraMiner: Deep Learning on Raw DNA Sequences for Identifying Viral Genomes in Human Samples |url=https://doi.org/10.1101/602656 |journal=PLOS ONE |volume=14 |issue=9 |pages=e0222271 |doi=10.1101/602656 |pmc=6738585 |pmid=31509583 |access-date=2025-02-21}} – these sequences are generally classified as “unknown”, providing little information to the user. Similarly, other sequence alignment-based methods, such as Kraken{{Cite journal |last1=Wood |first1=Derrick E. |last2=Lu |first2=Jennifer |last3=Langmead |first3=Ben |date=2019-09-07 |title=Improved metagenomic analysis with Kraken 2 |url=https://doi.org/10.1101/762302 |journal=Genome Biology |volume=20 |issue=1 |page=257 |doi=10.1101/762302 |pmc=6883579 |pmid=31779668 |access-date=2025-02-21}} and Metavir,{{Cite journal |last1=Roux |first1=Simon |last2=Tournayre |first2=Jeremy |last3=Mahul |first3=Antoine |last4=Debroas |first4=Didier |last5=Enault |first5=François |date=2014-03-19 |title=Metavir 2: new tools for viral metagenome comparison and assembled virome analysis |journal=BMC Bioinformatics |volume=15 |issue=1 |page=76 |doi=10.1186/1471-2105-15-76 |issn=1471-2105 |pmc=4002922 |pmid=24646187 |doi-access=free}} also face limitations due to biases in databases. Current virus genome databases are heavily skewed towards viruses that infect hosts that are cultivable in the lab.{{Cite journal |last1=Ren |first1=Jie |last2=Song |first2=Kai |last3=Deng |first3=Chao |last4=Ahlgren |first4=Nathan A. |last5=Fuhrman |first5=Jed A. |last6=Li |first6=Yi |last7=Xie |first7=Xiaohui |last8=Poplin |first8=Ryan |last9=Sun |first9=Fengzhu |date=March 2020 |title=Identifying viruses from metagenomic data using deep learning |url=https://doi.org/10.1007/s40484-019-0187-4 |journal=Quantitative Biology |volume=8 |issue=1 |pages=64–77 |doi=10.1007/s40484-019-0187-4 |issn=2095-4689 |pmc=8172088 |pmid=34084563}} The lack of sufficient data available can negatively impact viral identification. For example, one study estimates that only 15% of viruses in the human gut have similarity to known viruses in databases, limiting the extent of expected matches.

Several tools use traditional machine-learning approaches for viral identification. For example, HMMER3 uses profile Hidden Markov Models (pHMMs) based on reference databases of viral protein families to characterize unknown viruses.{{Cite journal |last1=Mistry |first1=Jaina |last2=Finn |first2=Robert D. |last3=Eddy |first3=Sean R. |last4=Bateman |first4=Alex |last5=Punta |first5=Marco |date=2013-04-17 |title=Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions |url=https://doi.org/10.1093/nar/gkt263 |journal=Nucleic Acids Research |volume=41 |issue=12 |pages=e121 |doi=10.1093/nar/gkt263 |issn=1362-4962 |pmc=3695513 |pmid=23598997}} However, this method is still constrained by the scarcity of characterized viral proteins in viral databases and can struggle with highly divergent viral sequences. Deep learning provides a more flexible alternative, as models do not have to rely solely on predefined reference databases but instead, learn to recognize viral genomic signatures from the training data.

Tools such as DeepVirFinder and ViraMiner use a combination of convolutional neural networks (CNNs) and dense neural networks to learn viral genomic signatures. DeepVirFinder processes DNA sequences by encoding them, passing them through convolutional layers, applying max pooling and a fully connected layer, and ultimately outputting a probability score between 0 and 1 for binary classification. ViraMiner uses a similar architecture but uses the average operator instead of the maximum operator to maintain more information about the frequency of patterns.

Long Short-Term Memory (LSTM) architecture, a type of RNN, has been highly efficient for classification tasks despite being originally developed for generative tasks.{{Cite journal |last1=Ahmed |first1=Hania |last2=Mumtaz |first2=Zilwa |last3=Saqib |first3=Sharmeen |last4=Zubair Yousaf |first4=Muhammad |date=March 2025 |title=ViroNia: LSTM based proteomics model for precise prediction of HCV |url=https://doi.org/10.1016/j.compbiomed.2024.109573 |journal=Computers in Biology and Medicine |volume=186 |pages=109573 |doi=10.1016/j.compbiomed.2024.109573 |issn=0010-4825 |pmid=39733555}} This has allowed the application of LSTMs in virome classification tasks. An example of an LSTM-based tool is ViroNIA, which predicts hepatitis C virus (HCV) sequences. ViroNIA processes one-hot encoded viral sequences that are padded to a fixed length and then analyzed hierarchically with two LSTM layers. Another model, Seeker, uses LSTM architecture to identify bacteriophages.{{Cite journal |last1=Auslander |first1=Noam |last2=Gussow |first2=Ayal B |last3=Benler |first3=Sean |last4=Wolf |first4=Yuri I |last5=Koonin |first5=Eugene V |date=2020-10-12 |title=Seeker: alignment-free identification of bacteriophage genomes by deep learning |url=https://doi.org/10.1093/nar/gkaa856 |journal=Nucleic Acids Research |volume=48 |issue=21 |pages=e121 |doi=10.1093/nar/gkaa856 |issn=0305-1048 |pmc=7708075 |pmid=33045744}}

Other tools have used large language model architecture, such as ViraLM,{{Cite journal |last1=Raiaan |first1=Mohaimenul Azam Khan |last2=Mukta |first2=Md. Saddam Hossain |last3=Fatema |first3=Kaniz |last4=Fahad |first4=Nur Mohammad |last5=Sakib |first5=Sadman |last6=Mim |first6=Most Marufatul Jannat |last7=Ahmad |first7=Jubaer |last8=Ali |first8=Mohammed Eunus |last9=Azam |first9=Sami |date=2024 |title=A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges |url=https://doi.org/10.1109/access.2024.3365742 |journal=IEEE Access |volume=12 |pages=26839–26874 |doi=10.1109/access.2024.3365742 |bibcode=2024IEEEA..1226839R |issn=2169-3536}} for efficient and accurate viral classification.

= Virome-host interaction analysis =

Another important application of deep learning is virome-host interaction analysis. Currently, no high-throughput experimental methods can definitively assign a host to uncultivated viruses.{{Cite journal |last1=Yan |first1=Binghao |last2=Nam |first2=Yunbi |last3=Li |first3=Lingyao |last4=Deek |first4=Rebecca A. |last5=Li |first5=Hongzhe |last6=Ma |first6=Siyuan |date=2025-01-07 |title=Recent advances in deep learning and language models for studying the microbiome |journal=Frontiers in Genetics |volume=15 |doi=10.3389/fgene.2024.1494474 |issn=1664-8021 |pmc=11747409 |pmid=39840283 |doi-access=free}} Alignment-based approaches struggle due to the scarcity of robust data in reference databases and high viral sequence divergence. On the other hand, alignment-free methods– using features such as k-mer composition analysis, codon usage, and GC content, to measure similarity between viral and host sequences to other viruses with a known host, provide a viable alternative. Since genomic features are embedded in viral genomes, deep learning models could learn these features automatically to drive predictions. For example, evoMIL, which predicts virus-host association at the species level, accepts the viral sequence as a sole input.{{Cite journal |last1=Liu |first1=Dan |last2=Young |first2=Francesca |last3=Robertson |first3=David L |last4=Yuan |first4=Ke |date=2023-04-08 |title=Prediction of virus-host associations using protein language models and multiple instance learning |url=https://doi.org/10.1101/2023.04.07.536023 |doi=10.1101/2023.04.07.536023 |access-date=2025-02-21 |website=doi.org}}

= Viral resistance and mutation detection =

Deep learning models can also be used to characterize drug resistance in viruses through the identification of drug resistance mutations.{{Cite journal |last1=Steiner |first1=Margaret C. |last2=Gibson |first2=Keylie M. |last3=Crandall |first3=Keith A. |date=2020-05-19 |title=Drug Resistance Prediction Using Deep Learning Techniques on HIV-1 Sequence Data |journal=Viruses |volume=12 |issue=5 |pages=560 |doi=10.3390/v12050560 |issn=1999-4915 |pmc=7290575 |pmid=32438586 |doi-access=free}} Here models can make predictions and identify novel patterns in the input data, rather than relying on known drug resistance mutation. Geometric deep learning, which incorporates physical knowledge into neural architectures,{{Citation |last1=Bronstein |first1=Michael M. |title=Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges |date=2021 |arxiv=2104.13478 |last2=Bruna |first2=Joan |last3=Cohen |first3=Taco |last4=Veličković |first4=Petar}} could increase model prediction performance here, increasing the depth of learned patterns by incorporating 3-Dimensional molecular structure in drug interaction.{{Cite journal |last1=Das |first1=Bihter |last2=Kutsal |first2=Mucahit |last3=Das |first3=Resul |date=October 2022 |title=A geometric deep learning model for display and prediction of potential drug-virus interactions against SARS-CoV-2 |url=https://doi.org/10.1016/j.chemolab.2022.104640 |journal=Chemometrics and Intelligent Laboratory Systems |volume=229 |pages=104640 |doi=10.1016/j.chemolab.2022.104640 |issn=0169-7439 |pmc=9400382 |pmid=36042844}}

= Functional virome analysis =

Some work has also been done to apply deep learning methods to characterize viral community function. For example, VIBRANT, a tool that employs a neural network multi-layer perceptron classifier, looks for auxiliary metabolic genes (AMGs) to identify the metabolic pathways existing in viral communities. AMGs are host-derived genes that can be actively expressed during infection to improve viral fitness.{{Cite journal |last=Kanehisa |first=M. |date=2000-01-01 |title=KEGG: Kyoto Encyclopedia of Genes and Genomes |url=https://doi.org/10.1093/nar/28.1.27 |journal=Nucleic Acids Research |volume=28 |issue=1 |pages=27–30 |doi=10.1093/nar/28.1.27 |pmid=10592173 |pmc=102409 |issn=1362-4962}} These AMGs are automatically assigned to KEGG{{Cite journal |last1=Breitbart |first1=Mya |last2=Rohwer |first2=Forest |date=June 2005 |title=Here a virus, there a virus, everywhere the same virus? |url=https://doi.org/10.1016/j.tim.2005.04.003 |journal=Trends in Microbiology |volume=13 |issue=6 |pages=278–284 |doi=10.1016/j.tim.2005.04.003 |pmid=15936660 |issn=0966-842X}} metabolic pathways to provide insights into viral community function.

= Limitations =

While deep learning can achieve strong performance metrics, it often provides limited interpretability compared with statistical and traditional machine learning-based methods.{{Cite journal |last1=Shorten |first1=Connor |last2=Khoshgoftaar |first2=Taghi M. |last3=Furht |first3=Borko |date=2021-01-11 |title=Deep Learning applications for COVID-19 |journal=Journal of Big Data |volume=8 |issue=1 |page=18 |doi=10.1186/s40537-020-00392-9 |doi-access=free |pmid=33457181 |pmc=7797891 |issn=2196-1115}} Further research into the part of the inputs that influence predictions, the driving factors for the activation of certain neurons, and representation analysis can address these challenges in interpretability. Deep learning models also generally require large training datasets to produce accurate predictions. As such, such models could be limited by the availability of relevant viromics data.

=Comparison of Traditional and Deep Learning Models for Viral Identification and Analysis=

class="wikitable"

|Feature

|Traditional Virome Analysis

|Deep Learning Virome Analysis

Approach

|Mainly reference-based analysis.

|de novo viral identification and analysis possible.

Data Dependency

|Requires viral reference genomes or databases.

|Learns from labeled and unlabeled sequences. Generally requires a large training dataset.

Handling Novel Viruses

|Limited discovery and analysis of novel or highly divergent discovery.

|Can detect novel viruses.

Computational Resource Requirements

|Often computationally intensive due to sequence alignment.

|Computationally expensive during model training but can be efficient once trained.

Integration with multiple data types

|Typically focuses on sequence data.

|Could integrate multi-omics data.

Multiomics

Incorporating a multiomics approach into virome analysis could provide a more comprehensive understanding of the biology. Transcriptomics can assist in determining gene expression between genetically different viral strains leading to fitness within the virome, and virus-host interactions.{{Cite journal |last1=Mihindukulasuriya |first1=Kathie A. |last2=Mars |first2=Ruben A.T. |last3=Johnson |first3=Abigail J. |last4=Ward |first4=Tonya |last5=Priya |first5=Sambhawa |last6=Lekatz |first6=Heather R. |last7=Kalari |first7=Krishna R. |last8=Droit |first8=Lindsay |last9=Zheng |first9=Tenghao |last10=Blekhman |first10=Ran |last11=D'Amato |first11=Mauro |last12=Farrugia |first12=Gianrico |last13=Knights |first13=Dan |last14=Handley |first14=Scott A. |last15=Kashyap |first15=Purna C. |date=October 2021 |title=Multi-Omics Analyses Show Disease, Diet, and Transcriptome Interactions With the Virome |url=https://doi.org/10.1053/j.gastro.2021.06.077 |journal=Gastroenterology |volume=161 |issue=4 |pages=1194–1207.e8 |doi=10.1053/j.gastro.2021.06.077 |pmid=34245762 |pmc=8463486 |issn=0016-5085}} Analyzing viral transcripts can also help characterize viral infections and distinguish between latent or active infections. Proteomics studies can confirm findings from transcriptomic studies and identify biomarkers as diagnostic and therapeutic targets.{{Cite journal |last1=Stupak |first1=Aleksandra |last2=Kwiatek |first2=Maciej |last3=Gęca |first3=Tomasz |last4=Kwaśniewska |first4=Anna |last5=Mlak |first5=Radosław |last6=Nawrot |first6=Robert |last7=Goździcka-Józefiak |first7=Anna |last8=Kwaśniewski |first8=Wojciech |date=2024-10-23 |title=A Virome and Proteomic Analysis of Placental Microbiota in Pregnancies with and without Fetal Growth Restriction |journal=Cells |volume=13 |issue=21 |pages=1753 |doi=10.3390/cells13211753 |doi-access=free |pmid=39513860 |pmc=11544783 |issn=2073-4409}} Metabolomics can provide valuable information on the biochemical changes due to the composition of viruses.{{Cite journal |last1=Xie |first1=Peiwei |last2=Luo |first2=Mei |last3=Fan |first3=Jiahui |last4=Xiong |first4=Lishou |date=2024-06-29 |title=Multiomics Analysis Reveals Gut Virome–Bacteria–Metabolite Interactions and Their Associations with Symptoms in Patients with IBS-D |journal=Viruses |volume=16 |issue=7 |pages=1054 |doi=10.3390/v16071054 |doi-access=free |pmid=39066219 |pmc=11281411 |issn=1999-4915}} Metabolites produced by the host in response to viral infections can be used as biomarkers to help with predicting the virome diversity. Virome analysis with the inclusion of multiomics can lead to improved personalized medicine through a more comprehensive understanding of the virome's role in a host.

Future

Population wide virome surveillance to understand viral outbreaks. This can be achieved through using environmental matrices such as wastewater as a proxy to determine emerging viruses or circulation of high pathogenic strains.{{Cite journal |last1=McCall |first1=Camille |last2=Wu |first2=Huiyun |last3=Miyani |first3=Brijen |last4=Xagoraraki |first4=Irene |date=October 2020 |title=Identification of multiple potential viral diseases in a large urban center using wastewater surveillance |url=https://doi.org/10.1016/j.watres.2020.116160 |journal=Water Research |volume=184 |pages=116160 |doi=10.1016/j.watres.2020.116160 |pmid=32738707 |pmc=7342010 |bibcode=2020WatRe.18416160M |issn=0043-1354}} Zoonotic spillover events could be predicted or detected through monitoring high-risk host reservoirs such as rodents, livestock or birds.{{Cite journal |last1=Leifels |first1=Mats |last2=Khalilur Rahman |first2=Omar |last3=Sam |first3=I-Ching |last4=Cheng |first4=Dan |last5=Chua |first5=Feng Jun Desmond |last6=Nainani |first6=Dhiraj |last7=Kim |first7=Se Yeon |last8=Ng |first8=Wei Jie |last9=Kwok |first9=Wee Chiew |last10=Sirikanchana |first10=Kwanrawee |last11=Wuertz |first11=Stefan |last12=Thompson |first12=Janelle |last13=Chan |first13=Yoke Fun |date=2022-10-30 |title=The one health perspective to improve environmental surveillance of zoonotic viruses: lessons from COVID-19 and outlook beyond |url=https://doi.org/10.1038/s43705-022-00191-8 |journal=ISME Communications |volume=2 |issue=1 |page=107 |doi=10.1038/s43705-022-00191-8 |pmid=36338866 |pmc=9618154 |bibcode=2022ISMEC...2..107L |issn=2730-6151}} Surveillance of viruses is becoming increasingly important for outbreak prevention and investigation.

References