Bioinformatics discovery of non-coding RNAs
Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.
Discovery by homology search
Homology search refers to the process of searching a sequence database for RNAs that are similar to already known RNA sequences. Any algorithm that is designed for homology search of nucleic acid sequences can be used, e.g., BLAST.{{cite journal |vauthors=Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ |title=Gapped BLAST and PSI-BLAST: a new generation of protein database search programs |journal=Nucleic Acids Res. |volume=25 |issue=17 |pages=3389–3402 |date=September 1997 |pmid=9254694 |pmc=146917 |doi= 10.1093/nar/25.17.3389}} However, such algorithms typically are not as sensitive or accurate as algorithms specifically designed for RNA.
Of particular importance for RNA is its conservation of a secondary structure, which can be modeled to achieve additional accuracy in searches. For example, Covariance models{{cite journal |vauthors=Eddy SR, Durbin R |title=RNA sequence analysis using covariance models |journal=Nucleic Acids Res. |volume=22 |issue=11 |pages=2079–2088 |date=June 1994 |pmid=8029015 |pmc=308124 |doi= 10.1093/nar/22.11.2079}} can be viewed as an extension to a profile hidden Markov model that also reflects conserved secondary structure. Covariance models are implemented in the Infernal software package.{{cite journal |vauthors=Nawrocki EP, Eddy SR |title=Infernal 1.1: 100-fold faster RNA homology searches |journal=Bioinformatics |volume=29 |issue=22 |pages=2933–2935 |date=November 2013 |pmid=24008419 |pmc=3810854 |doi=10.1093/bioinformatics/btt509 }}
Discovery of specific types of ncRNAs
Some types of RNAs have shared properties that algorithms can exploit. For example, tRNAscan-SE{{cite journal |vauthors=Lowe TM, Eddy SR |title=tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence |journal=Nucleic Acids Res. |volume=25 |issue=5 |pages=955–964 |date=March 1997 |pmid=9023104 |pmc=146525 |doi= 10.1093/nar/25.5.955}} is specialized to finding tRNAs. The heart of this program is a tRNA homology search based on covariance models, but other tRNA-specific search programs are used to accelerate searches.
The properties of snoRNAs have enabled the development of programs to detect new examples of snoRNAs, including those that might be only distantly related to previously known examples. Computer programs implementing such approaches include snoscan{{cite journal |vauthors=Lowe TM, Eddy SR |s2cid=8084145 |title=A computational screen for methylation guide snoRNAs in yeast |journal=Science |volume=283 |issue=5405 |pages=1168–1171 |date=February 1999 |pmid=10024243 |doi= 10.1126/science.283.5405.1168|bibcode=1999Sci...283.1168L }} and snoReport.{{cite journal |vauthors=Hertel J, Hofacker IL, Stadler PF |title=SnoReport: computational identification of snoRNAs with unknown targets |journal=Bioinformatics |volume=24 |issue=2 |pages=158–164 |date=January 2008 |pmid=17895272 |doi=10.1093/bioinformatics/btm464 |doi-access=free }}
Similarly, several algorithms have been developed to detect microRNAs. Examples include miRNAFold{{cite journal |vauthors=Tempel S, Tahi F |title=A fast ab-initio method for predicting miRNA precursors in genomes. |journal=Nucleic Acids Res. |volume=40 |issue=11 |pages=955–964 |year=2012 |pmid=22362754 |doi= 10.1093/nar/gks146 |pmc=3367186}} and miRNAminer.{{cite journal |vauthors=Artzi S, Kiezun A, Shomron N |title=miRNAminer: a tool for homologous microRNA gene search. |journal=BMC Bioinformatics |volume=9|page=39 |year=2008 |pmid=18215311 |doi=10.1186/1471-2105-9-39 |pmc=2258288 |issue=1 |doi-access=free }}
Discovery by general properties
Some properties are shared by multiple unrelated classes of ncRNA, and these properties can be targeted to discover new classes. Chief among them is the conservation of an RNA secondary structure. To measure conservation of secondary structure, it is necessary to somehow find homologous sequences that might exhibit a common structure. Strategies to do this have included the use of BLAST between two sequences {{cite journal |vauthors=Rivas E, Eddy SR |title=Noncoding RNA gene detection using comparative sequence analysis |journal=BMC Bioinformatics |volume=2 |pages=8 |date=2001 |pmid=11801179 |pmc=64605 |doi= 10.1186/1471-2105-2-8 |doi-access=free }} or multiple sequences,{{cite journal |vauthors=Tseng HH, Weinberg Z, Gore J, Breaker RR, Ruzzo WL |title=Finding non-coding RNAs through genome-scale clustering |journal=J Bioinform Comput Biol |volume=7 |issue=2 |pages=373–388 |date=April 2009 |pmid=19340921 |pmc=3417115 |doi= 10.1142/s0219720009004126}} exploited synteny via orthologous genes{{cite journal |vauthors=Weinberg Z, Barrick JE, Yao Z, Roth A, Kim JN, Gore J, Wang JX, Lee ER, Block KF, Sudarsan N, Neph S, Tompa M, Ruzzo WL, Breaker RR |title=Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline |journal=Nucleic Acids Res. |volume=35 |issue=14 |pages=4809–4819 |date=2007 |pmid=17621584 |pmc=1950547 |doi=10.1093/nar/gkm487 }}{{cite journal |vauthors=Hammond MC, Wachter A, Breaker RR |title=A plant 5S ribosomal RNA mimic regulates alternative splicing of transcription factor IIIA pre-mRNAs |journal=Nat. Struct. Mol. Biol. |volume=16 |issue=5 |pages=541–549 |date=May 2009 |pmid=19377483 |pmc=2680232 |doi=10.1038/nsmb.1588 }} or used locality sensitive hashing in combination with sequence and structural features.{{cite journal |vauthors=Heyne S, Costa F, Rose D, Backofen R |title=GraphClust: alignment-free structural clustering of local RNA secondary structures |journal=Bioinformatics |volume=28 |issue=12 |pages=i224–32 |date=June 2012 |pmid=22689765 |pmc=3371856 |doi=10.1093/bioinformatics/bts224 }}
Mutations that change the nucleotide sequence, but preserve secondary structure are called covariation, and can provide evidence of conservation. Other statistics and probabilistic models can be used to measure such conservation. The first ncRNA discovery method to use structural conservation was QRNA, which compared the probabilities of an alignment of two sequences based on either an RNA model or a model in which only the primary sequence conserved. Work in this direction has allowed for more than two sequences and included phylogenetic models, e.g., with EvoFold.{{cite journal |vauthors=Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D |title=Identification and classification of conserved RNA secondary structures in the human genome |journal=PLOS Comput. Biol. |volume=2 |issue=4 |pages=e33 |date=April 2006 |pmid=16628248 |pmc=1440920 |doi=10.1371/journal.pcbi.0020033 |bibcode=2006PLSCB...2...33P |doi-access=free }} An approach taken in RNAz{{cite journal |vauthors=Washietl S, Hofacker IL, Stadler PF |title=Fast and reliable prediction of noncoding RNAs |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=102 |issue=7 |pages=2454–2459 |date=February 2005 |pmid=15665081 |pmc=548974 |doi=10.1073/pnas.0409169102 |doi-access=free }} involved computing statistics on an input multiple-sequence alignment. Some of these statistics relate to structural conservation, while others measure general properties of the alignment that could affect the expected ranges of the structural statistics. These statistics were combined using a support vector machine.
Other properties include the appearance of a promoter to transcribe the RNA. ncRNAs are also often followed by a Rho-independent transcription terminator.
Using a combination of these approaches, multiple studies have enumerated candidate RNAs, e.g.,
Some studies have proceeded to manual analysis of the predictions to find a details structural and functional prediction.{{cite journal |vauthors=Weinberg Z, Wang JX, Bogue J, Yang J, Corbino K, Moy RH, Breaker RR |title=Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes |journal=Genome Biol. |volume=11 |issue=3 |pages=R31 |date=2010 |pmid=20230605 |pmc=2864571 |doi=10.1186/gb-2010-11-3-r31 |doi-access=free }}{{cite journal |vauthors=Weinberg Z, Lünse CE, Corbino KA, Ames TD, Nelson JW, Roth A, Perkins KR, Sherlock ME, Breaker RR |title=Detection of 224 candidate structured RNAs by comparative analysis of specific subsets of intergenic regions |journal=Nucleic Acids Res. |volume=45 |issue=18 |pages=10811–10823 |date=October 2017 |pmid=28977401 |pmc=5737381 |doi=10.1093/nar/gkx699 }}