Short Oligonucleotide Analysis Package
SOAP (Short Oligonucleotide Analysis Package) is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.
All programs in the SOAP package may be used free of charge and are distributed under the GPL open source software license.
Functionality
The SOAP suite of tools can be used to perform the following genome assembly tasks:
= Sequence Alignment =
SOAPaligner (SOAP2) is specifically designed for fast alignment of short reads and performs favorably with respect to similar alignment tools such as Bowtie and MAQ.{{cite journal|last1=Li|first1=R.|last2=Yu|first2=C.|last3=Li|first3=Y.|last4=Lam|first4=T.-W.|last5=Yiu|first5=S.-M.|last6=Kristiansen|first6=K.|last7=Wang|first7=J.|title=SOAP2: an improved ultrafast tool for short read alignment|journal=Bioinformatics|volume=25|issue=15|year=2009|pages=1966–1967|issn=1367-4803|doi=10.1093/bioinformatics/btp336|pmid=19497933|doi-access=}}
= Genome Assembly =
SOAPdenovo is a short read de novo assembler utilizing De Bruijn graph construction. It is optimized for short reads such as that generated by Illumina and is capable of assembling large genomes such as the human genome.{{cite journal|last1=Li|first1=R.|last2=Zhu|first2=H.|last3=Ruan|first3=J.|last4=Qian|first4=W.|last5=Fang|first5=X.|last6=Shi|first6=Z.|last7=Li|first7=Y.|last8=Li|first8=S.|last9=Shan|first9=G.|last10=Kristiansen|first10=K.|last11=Li|first11=S.|last12=Yang|first12=H.|last13=Wang|first13=J.|last14=Wang|first14=J.|title=De novo assembly of human genomes with massively parallel short read sequencing|journal=Genome Research|volume=20|issue=2|year=2009|pages=265–272|issn=1088-9051|doi=10.1101/gr.097261.109|pmid=20019144|pmc=2813482}} SOAPdenovo was used to assemble the genome of the giant panda.{{cite journal|last1=Li|first1=Ruiqiang|last2=Fan|first2=Wei|last3=Tian|first3=Geng|last4=Zhu|first4=Hongmei|last5=He|first5=Lin|last6=Cai|first6=Jing|last7=Huang|first7=Quanfei|last8=Cai|first8=Qingle|last9=Li|first9=Bo|last10=Bai|first10=Yinqi|last11=Zhang|first11=Zhihe|last12=Zhang|first12=Yaping|last13=Wang|first13=Wen|last14=Li|first14=Jun|last15=Wei|first15=Fuwen|last16=Li|first16=Heng|last17=Jian|first17=Min|last18=Li|first18=Jianwen|last19=Zhang|first19=Zhaolei|last20=Nielsen|first20=Rasmus|last21=Li|first21=Dawei|last22=Gu|first22=Wanjun|last23=Yang|first23=Zhentao|last24=Xuan|first24=Zhaoling|last25=Ryder|first25=Oliver A.|last26=Leung|first26=Frederick Chi-Ching|last27=Zhou|first27=Yan|last28=Cao|first28=Jianjun|last29=Sun|first29=Xiao|last30=Fu|first30=Yonggui|last31=Fang|first31=Xiaodong|last32=Guo|first32=Xiaosen|last33=Wang|first33=Bo|last34=Hou|first34=Rong|last35=Shen|first35=Fujun|last36=Mu|first36=Bo|last37=Ni|first37=Peixiang|last38=Lin|first38=Runmao|last39=Qian|first39=Wubin|last40=Wang|first40=Guodong|last41=Yu|first41=Chang|last42=Nie|first42=Wenhui|last43=Wang|first43=Jinhuan|last44=Wu|first44=Zhigang|last45=Liang|first45=Huiqing|last46=Min|first46=Jiumeng|last47=Wu|first47=Qi|last48=Cheng|first48=Shifeng|last49=Ruan|first49=Jue|last50=Wang|first50=Mingwei|last51=Shi|first51=Zhongbin|last52=Wen|first52=Ming|last53=Liu|first53=Binghang|last54=Ren|first54=Xiaoli|last55=Zheng|first55=Huisong|last56=Dong|first56=Dong|last57=Cook|first57=Kathleen|last58=Shan|first58=Gao|last59=Zhang|first59=Hao|last60=Kosiol|first60=Carolin|last61=Xie|first61=Xueying|last62=Lu|first62=Zuhong|last63=Zheng|first63=Hancheng|last64=Li|first64=Yingrui|last65=Steiner|first65=Cynthia C.|last66=Lam|first66=Tommy Tsan-Yuk|last67=Lin|first67=Siyuan|last68=Zhang|first68=Qinghui|last69=Li|first69=Guoqing|last70=Tian|first70=Jing|last71=Gong|first71=Timing|last72=Liu|first72=Hongde|last73=Zhang|first73=Dejin|last74=Fang|first74=Lin|last75=Ye|first75=Chen|last76=Zhang|first76=Juanbin|last77=Hu|first77=Wenbo|last78=Xu|first78=Anlong|last79=Ren|first79=Yuanyuan|last80=Zhang|first80=Guojie|last81=Bruford|first81=Michael W.|last82=Li|first82=Qibin|last83=Ma|first83=Lijia|last84=Guo|first84=Yiran|last85=An|first85=Na|last86=Hu|first86=Yujie|last87=Zheng|first87=Yang|last88=Shi|first88=Yongyong|last89=Li|first89=Zhiqiang|last90=Liu|first90=Qing|last91=Chen|first91=Yanling|last92=Zhao|first92=Jing|last93=Qu|first93=Ning|last94=Zhao|first94=Shancen|last95=Tian|first95=Feng|last96=Wang|first96=Xiaoling|last97=Wang|first97=Haiyin|last98=Xu|first98=Lizhi|last99=Liu|first99=Xiao|display-authors=29|last100=Vinar|first100=Tomas|last101=Wang|first101=Yajun|last102=Lam|first102=Tak-Wah|last103=Yiu|first103=Siu-Ming|last104=Liu|first104=Shiping|last105=Zhang|first105=Hemin|last106=Li|first106=Desheng|last107=Huang|first107=Yan|last108=Wang|first108=Xia|last109=Yang|first109=Guohua|last110=Jiang|first110=Zhi|last111=Wang|first111=Junyi|last112=Qin|first112=Nan|last113=Li|first113=Li|last114=Li|first114=Jingxiang|last115=Bolund|first115=Lars|last116=Kristiansen|first116=Karsten|last117=Wong|first117=Gane Ka-Shu|last118=Olson|first118=Maynard|last119=Zhang|first119=Xiuqing|last120=Li|first120=Songgang|last121=Yang|first121=Huanming|last122=Wang|first122=Jian|last123=Wang|first123=Jun|title=The sequence and de novo assembly of the giant panda genome|journal=Nature|volume=463|issue=7279|year=2009|pages=311–317|issn=0028-0836|doi=10.1038/nature08696|pmid=20010809|pmc=3951497|bibcode=2010Natur.463..311L }} This was upgraded to SOAPdenovo2, which was optimized for large genomes and included the widely used GapCloser module.{{Cite journal|last1=Luo|first1=Ruibang|last2=Liu|first2=Binghang|last3=Xie|first3=Yinlong|last4=Li|first4=Zhenyu|last5=Huang|first5=Weihua|last6=Yuan|first6=Jianying|last7=He|first7=Guangzhu|last8=Chen|first8=Yanxiang|last9=Pan|first9=Qi|last10=Liu|first10=Yunjie|last11=Tang|first11=Jingbo|date=2012-12-01|title=SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler|url= |journal=GigaScience|language=en|volume=1|issue=1|pages=18|doi=10.1186/2047-217X-1-18|pmc=3626529|pmid=23587118 |doi-access=free }}
= Transcriptome Assembly =
SOAPdenovo-Trans is a de novo transcriptome assembler designed specifically for RNA-Seq that was created for the 1000 Plant Genomes project.{{Cite journal|last1=Xie|first1=Yinlong|last2=Wu|first2=Gengxiong|last3=Tang|first3=Jingbo|last4=Luo|first4=Ruibang|last5=Patterson|first5=Jordan|last6=Liu|first6=Shanlin|last7=Huang|first7=Weihua|last8=He|first8=Guangzhu|last9=Gu|first9=Shengchang|last10=Li|first10=Shengkang|last11=Zhou|first11=Xin|date=2014-06-15|title=SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads|url=https://academic.oup.com/bioinformatics/article/30/12/1660/380938|journal=Bioinformatics|language=en|volume=30|issue=12|pages=1660–1666|doi=10.1093/bioinformatics/btu077|pmid=24532719|issn=1367-4803|doi-access=free|arxiv=1305.6760}}
= Indel Discovery =
SOAPindel is a tool to find insertions and deletions from next generation paired-end sequencing data, providing a list of candidate indels with quality scores.{{Cite journal|last1=Li|first1=Shengting|last2=Li|first2=Ruiqiang|last3=Li|first3=Heng|last4=Lu|first4=Jianliang|last5=Li|first5=Yingrui|last6=Bolund|first6=Lars|last7=Schierup|first7=Mikkel H.|last8=Wang|first8=Jun|date=2013-01-01|title=SOAPindel: Efficient identification of indels from short paired reads|journal=Genome Research|language=en|volume=23|issue=1|pages=195–200|doi=10.1101/gr.132480.111|issn=1088-9051|pmid=22972939|pmc=3530679|doi-access=free}}
= SNP Discovery =
SOAPsnp is a consensus sequence builder. This tool uses the output from SOAPaligner to generate a consensus sequence which enables SNPs to be called on a newly sequenced individual.
= Structural Variation Discovery =
SOAPsv is a tool to find structural variations using whole genome assembly.{{Cite journal|last1=Li|first1=Yingrui|last2=Zheng|first2=Hancheng|last3=Luo|first3=Ruibang|last4=Wu|first4=Honglong|last5=Zhu|first5=Hongmei|last6=Li|first6=Ruiqiang|last7=Cao|first7=Hongzhi|last8=Wu|first8=Boxin|last9=Huang|first9=Shujia|last10=Shao|first10=Haojing|last11=Ma|first11=Hanzhou|date=August 2011|title=Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly|journal=Nature Biotechnology|language=en|volume=29|issue=8|pages=723–730|doi=10.1038/nbt.1904|pmid=21785424|issn=1546-1696|doi-access=free}}
= Quality control and preprocessing =
SOAPnuke is a tool for integrated quality control and preprocessing of datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments.{{Cite journal|last1=Chen|first1=Yuxin|last2=Chen|first2=Yongsheng|last3=Shi|first3=Chunmei|last4=Huang|first4=Zhibo|last5=Zhang|first5=Yong|last6=Li|first6=Shengkang|last7=Li|first7=Yan|last8=Ye|first8=Jia|last9=Yu|first9=Chang|last10=Li|first10=Zhuo|last11=Zhang|first11=Xiuqing|date=2018-01-01|title=SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data|url= |journal=GigaScience|language=en|volume=7|issue=1|pages=1–6|doi=10.1093/gigascience/gix120|pmc=5788068|pmid=29220494}}
History
= SOAP v1 =
The first release of SOAP consisted only of the sequence alignment tool SOAPaligner.{{cite journal|last1=Li|first1=R.|last2=Li|first2=Y.|last3=Kristiansen|first3=K.|last4=Wang|first4=J.|title=SOAP: short oligonucleotide alignment program|journal=Bioinformatics|volume=24|issue=5|year=2008|pages=713–714|issn=1367-4803|doi=10.1093/bioinformatics/btn025|pmid=18227114|doi-access=free}}
= SOAP v2 =
SOAP v2 extended and improved on SOAP v1 by significantly improving the performance of the SOAPaligner tool. Alignment time was reduced by a factor of 20-30, while memory usage was reduced by a factor of 3. Support was added for compressed file formats.
The SOAP suite was expanded then to include the new tools: SOAPdenovo 1&2, SOAPindel, SOAPsnp, and SOAPsv.
= SOAP v3 =
SOAP v3 extended the alignment tool by being the first short-read alignment tool to utilize GPU processors.{{cite journal|last1=Liu|first1=C.-M.|last2=Wong|first2=T.|last3=Wu|first3=E.|last4=Luo|first4=R.|last5=Yiu|first5=S.-M.|last6=Li|first6=Y.|last7=Wang|first7=B.|last8=Yu|first8=C.|last9=Chu|first9=X.|last10=Zhao|first10=K.|last11=Li|first11=R.|last12=Lam|first12=T.-W.|title=SOAP3: ultra-fast GPU-based parallel alignment tool for short reads|journal=Bioinformatics|volume=28|issue=6|year=2012|pages=878–879|issn=1367-4803|doi=10.1093/bioinformatics/bts061|pmid=22285832|doi-access=free}} As a result of these improvements, SOAPalign significantly outperformed competing aligners Bowtie and BWA in terms of speed.
See also
External links
- http://soap.genomics.org.cn {{Webarchive|url=https://web.archive.org/web/20181224221234/http://soap.genomics.org.cn/ |date=2018-12-24 }}
- http://soap.genomics.org.cn/soap1
- http://bioinformatics.genomics.org.cn
- http://seqanswers.com/forums/showthread.php?t=43