single-nucleotide polymorphism

{{short description|Single nucleotide in genomic DNA at which different sequence alternatives exist}}

{{redirect|SNPs|the singular|SNP (disambiguation)}}

File:dna-SNP.svg

In genetics and bioinformatics, a single-nucleotide polymorphism (SNP {{IPAc-en|s|n|ɪ|p}}; plural SNPs {{IPAc-en|s|n|ɪ|p|s}}) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more),{{Cite web|title = single-nucleotide polymorphism / SNP {{!}} Learn Science at Scitable|url = http://www.nature.com/scitable/definition/single-nucleotide-polymorphism-snp-295|website = www.nature.com|access-date = 2015-11-13|url-status = live|archive-url = https://web.archive.org/web/20151110112814/http://www.nature.com/scitable/definition/single-nucleotide-polymorphism-snp-295|archive-date = 2015-11-10}} many publications{{cite journal |title=dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation |year=1999|pmid=10447503|last1=Sherry|first1=S. T.|last2=Ward|first2=M.|last3=Sirotkin|first3=K.|journal=Genome Research|volume=9|issue=8|pages=677–679|doi=10.1101/gr.9.8.677|s2cid=10775908|doi-access=free}}{{cite journal |pmid=11237011|title = Initial sequencing and analysis of the human genome|year = 2001|last1 = Lander|first1 = E. S.|last2 = Linton|first2 = L. M.|last3 = Birren|first3 = B.|last4 = Nusbaum|first4 = C.|last5 = Zody|first5 = M. C.|last6 = Baldwin|first6 = J.|last7 = Devon|first7 = K.|last8 = Dewar|first8 = K.|last9 = Doyle|first9 = M.|last10 = Fitzhugh|first10 = W.|last11 = Funke|first11 = R.|last12 = Gage|first12 = D.|last13 = Harris|first13 = K.|last14 = Heaford|first14 = A.|last15 = Howland|first15 = J.|last16 = Kann|first16 = L.|last17 = Lehoczky|first17 = J.|last18 = Levine|first18 = R.|last19 = McEwan|first19 = P.|last20 = McKernan|first20 = K.|last21 = Meldrim|first21 = J.|last22 = Mesirov|first22 = J. P.|last23 = Miranda|first23 = C.|last24 = Morris|first24 = W.|last25 = Naylor|first25 = J.|last26 = Raymond|first26 = C.|last27 = Rosetti|first27 = M.|last28 = Santos|first28 = R.|last29 = Sheridan|first29 = A.|last30 = Sougnez|first30 = C.|journal = Nature|volume = 409|issue = 6822|pages = 860–921|doi = 10.1038/35057062| bibcode=2001Natur.409..860L |display-authors = 1|doi-access = free|hdl = 2027.42/62798|hdl-access = free}}{{cite journal |pmid=26432245|title = A global reference for human genetic variation|journal = Nature|year = 2015|volume = 526|issue = 7571|pages = 68–74|doi = 10.1038/nature15393|pmc = 4750478|last1 = Auton|first1 = Adam|last2 = Abecasis|first2 = Gonçalo R.|last3 = Altshuler|first3 = David M.|last4 = Durbin|first4 = Richard M.|last5 = Abecasis|first5 = Gonçalo R.|last6 = Bentley|first6 = David R.|last7 = Chakravarti|first7 = Aravinda|last8 = Clark|first8 = Andrew G.|last9 = Donnelly|first9 = Peter|last10 = Eichler|first10 = Evan E.|last11 = Flicek|first11 = Paul|last12 = Gabriel|first12 = Stacey B.|last13 = Gibbs|first13 = Richard A.|last14 = Green|first14 = Eric D.|last15 = Hurles|first15 = Matthew E.|last16 = Knoppers|first16 = Bartha M.|last17 = Korbel|first17 = Jan O.|last18 = Lander|first18 = Eric S.|last19 = Lee|first19 = Charles|last20 = Lehrach|first20 = Hans|last21 = Mardis|first21 = Elaine R.|last22 = Marth|first22 = Gabor T.|last23 = McVean|first23 = Gil A.|last24 = Nickerson|first24 = Deborah A.|last25 = Schmidt|first25 = Jeanette P.|last26 = Sherry|first26 = Stephen T.|last27 = Wang|first27 = Jun|last28 = Wilson|first28 = Richard K.|last29 = Gibbs|first29 = Richard A.|last30 = Boerwinkle|first30 = Eric|bibcode = 2015Natur.526...68T|display-authors = 1}} do not apply such a frequency threshold.

For example, a G nucleotide present at a specific location in a reference genome may be replaced by an A in a minority of individuals. The two possible nucleotide variations of this SNP – G or A – are called alleles.{{cite journal|date=September 2017|title=ASPsiRNA: A Resource of ASP-siRNAs Having Therapeutic Potential for Human Genetic Disorders and Algorithm for Prediction of Their Inhibitory Efficacy|journal=G3|volume=7|issue=9|pages=2931–2943|doi=10.1534/g3.117.044024|pmid=28696921|doi-access=free|last1=Monga|first1=Isha|last2=Qureshi|first2=Abid|last3=Thakur|first3=Nishant|last4=Gupta|first4=Amit Kumar|last5=Kumar|first5=Manoj|pmc=5592921}}

SNPs can help explain differences in susceptibility to a wide range of diseases across a population. For example, a common SNP in the CFH gene is associated with increased risk of age-related macular degeneration.{{Cite journal |last1=Calippe |first1=Bertrand |last2=Guillonneau |first2=Xavier |last3=Sennlaub |first3=Florian |date=March 2014 |title=Complement factor H and related proteins in age-related macular degeneration |url=https://comptes-rendus.academie-sciences.fr/biologies/articles/10.1016/j.crvi.2013.12.003/|journal=Comptes Rendus Biologies |volume=337 |issue=3 |pages=178–184 |doi=10.1016/j.crvi.2013.12.003 |issn=1631-0691 |pmid=24702844}} Differences in the severity of an illness or response to treatments may also be manifestations of genetic variations caused by SNPs. For example, two common SNPs in the APOE gene, rs429358 and rs7412, lead to three major APO-E alleles with different associated risks for development of Alzheimer's disease and age at onset of the disease.{{Cite journal|last1=Husain|first1=Mohammed Amir|last2=Laurent|first2=Benoit|last3=Plourde|first3=Mélanie|date=2021-02-17|title=APOE and Alzheimer's Disease: From Lipid Transport to Physiopathology and Therapeutics|journal=Frontiers in Neuroscience|volume=15|page=630502|doi=10.3389/fnins.2021.630502|pmid=33679311|pmc=7925634|issn=1662-453X|doi-access=free}}

Single nucleotide substitutions with an allele frequency of less than 1% are sometimes called single-nucleotide variants (SNVs).{{Cite web |date=2012-07-20 |title=Definition of single nucleotide variant - NCI Dictionary of Genetics Terms |url=https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/single-nucleotide-variant |access-date=2023-05-02 |website=www.cancer.gov |language=en}} "Variant" may also be used as a general term for any single nucleotide change in a DNA sequence,{{citation |last=Wright |first=Alan F |title=eLS |chapter=Genetic Variation: Polymorphisms and Mutations |date=September 23, 2005 |publisher=Wiley |doi=10.1038/npg.els.0005005|isbn=9780470016176 |s2cid=82415195 |doi-access=free }} encompassing both common SNPs and rare mutations, whether germline or somatic.{{cite journal |title=SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors |year=2010|pmid=20130035|last1=Goya|first1=R.|last2=Sun|first2=M. G.|last3=Morin|first3=R. D.|last4=Leung|first4=G.|last5=Ha|first5=G.|last6=Wiegand|first6=K. C.|last7=Senz|first7=J.|last8=Crisan|first8=A.|last9=Marra|first9=M. A.|last10=Hirst|first10=M.|last11=Huntsman|first11=D.|last12=Murphy|first12=K. P.|last13=Aparicio|first13=S.|last14=Shah|first14=S. P.|journal=Bioinformatics|volume=26|issue=6|pages=730–736|doi=10.1093/bioinformatics/btq040|pmc=2832826}}{{Cite journal|last1=Katsonis|first1=Panagiotis|last2=Koire|first2=Amanda|last3=Wilson|first3=Stephen Joseph|last4=Hsu|first4=Teng-Kuei|last5=Lua|first5=Rhonald C.|last6=Wilkins|first6=Angela Dawn|last7=Lichtarge|first7=Olivier|date=2014-10-20|title=Single nucleotide variations: Biological impact and theoretical interpretation|url=|journal=Protein Science|volume=23|issue=12|pages=1650–1666|doi=10.1002/pro.2552|pmid=25234433|pmc=4253807|issn=0961-8368}} The term SNV has therefore been used to refer to point mutations found in cancer cells.{{Cite journal|last1=Khurana|first1=Ekta|last2=Fu|first2=Yao|last3=Chakravarty|first3=Dimple|last4=Demichelis|first4=Francesca|last5=Rubin|first5=Mark A.|last6=Gerstein|first6=Mark|date=2016-01-19|title=Role of non-coding sequence variants in cancer|url=|journal=Nature Reviews Genetics|volume=17|issue=2|pages=93–108|doi=10.1038/nrg.2015.17|pmid=26781813|s2cid=14433306|issn=1471-0056}} DNA variants must also commonly be taken into consideration in molecular diagnostics applications such as designing PCR primers to detect viruses, in which the viral RNA or DNA sample may contain SNVs.{{Citation needed|date=May 2023}} However, this nomenclature uses arbitrary distinctions (such as an allele frequency of 1%) and is not used consistently across all fields; the resulting disagreement has prompted calls for a more consistent framework for naming differences in DNA sequences between two samples.{{cite journal |last1=Karki |first1=Roshan |last2=Pandya |first2=Deep |last3=Elston |first3=Robert C. |last4=Ferlini |first4=Cristiano |date=July 15, 2015 |title=Defining "mutation" and "polymorphism" in the era of personal genomics |journal=BMC Medical Genomics |publisher=Springer Science and Business Media LLC |volume=8 |issue=1 |page=37 |doi=10.1186/s12920-015-0115-z |pmid=26173390 |pmc=4502642 |issn=1755-8794 |doi-access=free }}{{cite web |last=Li |first=Heng |date=March 15, 2021 |title=SNP vs SNV |url=http://lh3.github.io/2021/03/15/snp-vs-snv |access-date=May 3, 2023 |website=Heng Li's blog}}

Types

File:Types of SNP new1.png

Single-nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions (regions between genes). SNPs within a coding sequence do not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code.{{Cite journal |last1=Spencer |first1=Paige S. |last2=Barral |first2=José M. |date=2012 |title=Genetic code redundancy and its influence on the encoded polypeptides |journal=Computational and Structural Biotechnology Journal |volume=1 |pages=e201204006 |doi=10.5936/csbj.201204006 |issn=2001-0370 |pmc=3962081 |pmid=24688635}}

SNPs in the coding region are of two types: synonymous SNPs and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence, while nonsynonymous SNPs change the amino acid sequence of protein.{{Cite journal |last1=Chu |first1=Duan |last2=Wei |first2=Lai |date=2019-04-16 |title=Nonsynonymous, synonymous and nonsense mutations in human cancer-related genes undergo stronger purifying selections than expectation |journal=BMC Cancer |volume=19 |issue=1 |pages=359 |doi=10.1186/s12885-019-5572-x |issn=1471-2407 |pmc=6469204 |pmid=30991970 |doi-access=free }}

  • SNPs in non-coding regions can manifest in a higher risk of cancer,{{cite journal|vauthors=Li G, Pan T, Guo D, Li LC|date=2014|title=Regulatory Variants and Disease: The E-Cadherin -160C/A SNP as an Example|journal=Molecular Biology International|volume=2014|pages=967565|doi=10.1155/2014/967565|pmc=4167656|pmid=25276428|doi-access=free}} and may affect mRNA structure and disease susceptibility.{{cite journal|vauthors=Lu YF, Mauger DM, Goldstein DB, Urban TJ, Weeks KM, Bradrick SS|date=November 2015|title=IFNL3 mRNA structure is remodeled by a functional non-coding polymorphism associated with hepatitis C virus clearance|journal=Scientific Reports|volume=5|pages=16037|bibcode=2015NatSR...516037L|doi=10.1038/srep16037|pmc=4631997|pmid=26531896}} Non-coding SNPs can also alter the level of expression of a gene, as an eQTL (expression quantitative trait locus).
  • SNPs in coding regions:
  • synonymous substitutions by definition do not result in a change of amino acid in the protein, but still can affect its function in other ways. An example would be a seemingly silent mutation in the multidrug resistance gene 1 (MDR1), which codes for a cellular membrane pump that expels drugs from the cell, can slow down translation and allow the peptide chain to fold into an unusual conformation, causing the mutant pump to be less functional (in MDR1 protein e.g. C1236T polymorphism changes a GGC codon to GGT at amino acid position 412 of the polypeptide (both encode glycine) and the C3435T polymorphism changes ATC to ATT at position 1145 (both encode isoleucine)).{{cite journal|vauthors=Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM|date=January 2007|title=A "silent" polymorphism in the MDR1 gene changes substrate specificity|url=https://zenodo.org/record/1230874|journal=Science|volume=315|issue=5811|pages=525–8|bibcode=2007Sci...315..525K|doi=10.1126/science.1135308|pmid=17185560|s2cid=15146955|doi-access=free}}
  • nonsynonymous substitutions:
  • missense – single change in the base results in change in amino acid of protein and its malfunction which leads to disease (e.g. c.1580G>T SNP in LMNA gene – position 1580 (nt) in the DNA sequence (CGT codon) causing the guanine to be replaced with the thymine, yielding CTT codon in the DNA sequence, results at the protein level in the replacement of the arginine by the leucine in the position 527,{{cite journal|vauthors=Al-Haggar M, Madej-Pilarczyk A, Kozlowski L, Bujnicki JM, Yahia S, Abdel-Hadi D, Shams A, Ahmad N, Hamed S, Puzianowska-Kuznicka M|date=November 2012|title=A novel homozygous p.Arg527Leu LMNA mutation in two unrelated Egyptian families causes overlapping mandibuloacral dysplasia and progeria syndrome|journal=European Journal of Human Genetics|volume=20|issue=11|pages=1134–40|doi=10.1038/ejhg.2012.77|pmc=3476705|pmid=22549407}} at the phenotype level this manifests in overlapping mandibuloacral dysplasia and progeria syndrome)
  • nonsensepoint mutation in a sequence of DNA that results in a premature stop codon, or a nonsense codon in the transcribed mRNA, and in a truncated, incomplete, and usually nonfunctional protein product (e.g. Cystic fibrosis caused by the G542X mutation in the cystic fibrosis transmembrane conductance regulator gene).{{cite journal|vauthors=Cordovado SK, Hendrix M, Greene CN, Mochal S, Earley MC, Farrell PM, Kharrazi M, Hannon WH, Mueller PW|date=February 2012|title=CFTR mutation analysis and haplotype associations in CF patients|journal=Molecular Genetics and Metabolism|volume=105|issue=2|pages=249–54|doi=10.1016/j.ymgme.2011.10.013|pmc=3551260|pmid=22137130}}

SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding, messenger RNA degradation, or the sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP) and may be upstream or downstream from the gene.

Frequency

More than 600 million SNPs have been identified across the human genome in the world's population.{{Cite web |title=What are single nucleotide polymorphisms (SNPs)?: MedlinePlus Genetics |url=https://medlineplus.gov/genetics/understanding/genomicresearch/snp/ |access-date=2023-03-22 |website=medlineplus.gov |language=en}} A typical genome differs from the reference human genome at 4–5 million sites, most of which (more than 99.9%) consist of SNPs and short indels.{{cite journal | vauthors = Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR | title = A global reference for human genetic variation | journal = Nature | volume = 526 | issue = 7571 | pages = 68–74 | date = October 2015 | pmid = 26432245 | pmc = 4750478 | doi = 10.1038/nature15393 | bibcode = 2015Natur.526...68T }}

= Within a genome =

The genomic distribution of SNPs is not homogenous; SNPs occur in non-coding regions more frequently than in coding regions or, in general, where natural selection is acting and "fixing" the allele (eliminating other variants) of the SNP that constitutes the most favorable genetic adaptation.{{cite journal | vauthors = Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L | s2cid = 205357396 | title = Natural selection has driven population differentiation in modern humans | journal = Nature Genetics | volume = 40 | issue = 3 | pages = 340–5 | date = March 2008 | pmid = 18246066 | doi = 10.1038/ng.78 }} Other factors, like genetic recombination and mutation rate, can also determine SNP density.{{cite journal | vauthors = Nachman MW | title = Single nucleotide polymorphisms and recombination rate in humans | journal = Trends in Genetics | volume = 17 | issue = 9 | pages = 481–5 | date = September 2001 | pmid = 11525814 | doi = 10.1016/S0168-9525(01)02409-X }}

SNP density can be predicted by the presence of microsatellites: AT microsatellites in particular are potent predictors of SNP density, with long (AT)(n) repeat tracts tending to be found in regions of significantly reduced SNP density and low GC content.{{cite journal | vauthors = Varela MA, Amos W | title = Heterogeneous distribution of SNPs in the human genome: microsatellites as predictors of nucleotide diversity and divergence | journal = Genomics | volume = 95 | issue = 3 | pages = 151–9 | date = March 2010 | pmid = 20026267 | doi = 10.1016/j.ygeno.2009.12.003 | doi-access = }}

= Within a population =

Since there are variations between human populations, a SNP allele that is common in one geographical or ethnic group may be rarer in another. However, this pattern of variation is relatively rare; in a global sample of 67.3 million SNPs, the Human Genome Diversity Project "found no such private variants that are fixed in a given continent or major region. The highest frequencies are reached by a few tens of variants present at >70% (and a few thousands at >50%) in Africa, the Americas, and Oceania. By contrast, the highest frequency variants private to Europe, East Asia, the Middle East, or Central and South Asia reach just 10 to 30%."{{cite journal| author=Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P | display-authors=etal| title=Insights into human genetic variation and population history from 929 diverse genomes. | journal=Science | year= 2020 | volume= 367 | issue= 6484 | pages= eaay5012| pmid=32193295 | doi=10.1126/science.aay5012 | pmc=7115999 }}

Within a population, SNPs can be assigned a minor allele frequency (MAF)—the lowest allele frequency at a locus that is observed in a particular population.{{cite journal | vauthors = Zhu Z, Yuan D, Luo D, Lu X, Huang S | title = Enrichment of Minor Alleles of Common SNPs and Improved Risk Prediction for Parkinson's Disease | journal = PLOS ONE | volume = 10 | issue = 7 | pages = e0133421 | date = 2015-07-24 | pmid = 26207627 | pmc = 4514478 | doi = 10.1371/journal.pone.0133421 | bibcode = 2015PLoSO..1033421Z | doi-access = free }} This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms.

With this knowledge, scientists have developed new methods in analyzing population structures in less studied species.{{Cite journal|last1=Hivert|first1=Valentin|last2=Leblois|first2=Raphaël|last3=Petit|first3=Eric J.|last4=Gautier|first4=Mathieu|last5=Vitalis|first5=Renaud|date=2018-07-30|title=Measuring Genetic Differentiation from Pool-seq Data|journal=Genetics|volume=210|issue=1|pages=315–330|doi=10.1534/genetics.118.300900|pmid=30061425|pmc=6116966|issn=0016-6731|doi-access=free}}{{Cite journal|last1=Ekblom|first1=R|last2=Galindo|first2=J|date=2010-12-08|title=Applications of next generation sequencing in molecular ecology of non-model organisms|journal=Heredity|volume=107|issue=1|pages=1–15|doi=10.1038/hdy.2010.152|pmid=21139633|pmc=3186121|issn=0018-067X|doi-access=free}}{{Cite journal|last=Ellegren|first=Hans|date=January 2014|title=Genome sequencing and population genomics in non-model organisms|url=|journal=Trends in Ecology & Evolution|volume=29|issue=1|pages=51–63|doi=10.1016/j.tree.2013.09.008|pmid=24139972|bibcode=2014TEcoE..29...51E |issn=0169-5347}} By using pooling techniques, the cost of the analysis is significantly lowered.{{citation needed|date=October 2020}} These techniques are based on sequencing a population in a pooled sample instead of sequencing every individual within the population by itself. With new bioinformatics tools, there is a possibility of investigating population structure, gene flow, and gene migration by observing the allele frequencies within the entire population. With these protocols there is a possibility for combining the advantages of SNPs with micro satellite markers.{{Cite journal|last1=Dorant|first1=Yann|last2=Benestan|first2=Laura|last3=Rougemont|first3=Quentin|last4=Normandeau|first4=Eric|last5=Boyle|first5=Brian|last6=Rochette|first6=Rémy|last7=Bernatchez|first7=Louis|date=2019|title=Comparing Pool-seq, Rapture, and GBS genotyping for inferring weak population structure: The American lobster (Homarus americanus) as a case study|url= |journal=Ecology and Evolution|language=en|volume=9|issue=11|pages=6606–6623|doi=10.1002/ece3.5240|issn=2045-7758|pmc=6580275|pmid=31236247|bibcode=2019EcoEv...9.6606D }}{{Cite journal|last1=Vendrami|first1=David L. J.|last2=Telesca|first2=Luca|last3=Weigand|first3=Hannah|last4=Weiss|first4=Martina|last5=Fawcett|first5=Katie|last6=Lehman|first6=Katrin|last7=Clark|first7=M. S.|last8=Leese|first8=Florian|last9=McMinn|first9=Carrie|last10=Moore|first10=Heather|last11=Hoffman|first11=Joseph I.|title=RAD sequencing resolves fine-scale population structure in a benthic invertebrate: implications for understanding phenotypic plasticity|url= |journal=Royal Society Open Science|year=2017|volume=4|issue=2|pages=160548|doi=10.1098/rsos.160548|pmc=5367306|pmid=28386419|bibcode=2017RSOS....460548V}} However, there is information lost in the process, such as linkage disequilibrium and zygosity information.

Applications

{{Prose|date=May 2023|section}}

  • Association studies (such as GWAS, see below) can determine whether a genetic variant is associated with a disease or trait.{{cite journal|vauthors=Zhang K, Qin ZS, Liu JS, Chen T, Waterman MS, Sun F|date=May 2004|title=Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies|journal=Genome Research|volume=14|issue=5|pages=908–16|doi=10.1101/gr.1837404|pmc=479119|pmid=15078859}}
  • A tag SNP is a representative single-nucleotide polymorphism in a region of the genome with high linkage disequilibrium (the non-random association of alleles at two or more loci). Tag SNPs are useful in whole-genome SNP association studies, in which hundreds of thousands of SNPs across the entire genome are genotyped.
  • Haplotype mapping: sets of alleles or DNA sequences can be clustered so that a single SNP can identify many linked SNPs.
  • Linkage disequilibrium (LD), a term used in population genetics, indicates non-random association of alleles at two or more loci, not necessarily on the same chromosome. It refers to the phenomenon that SNP allele or DNA sequence that are close together in the genome tend to be inherited together. LD can be affected by two parameters (among other factors, such as population stratification):
  • The distance between the SNPs (the larger the distance, the lower the LD)
  • Recombination rate (the lower the recombination rate, the higher the LD){{Cite journal|vauthors=Gupta PK, Roy JK, Prasad M|date=25 February 2001|title=Single nucleotide polymorphisms: a new paradigm for molecular marker technology and DNA polymorphism detection with emphasis on their use in plants|url=https://www.researchgate.net/publication/229085137|url-status=live|journal=Current Science|volume=80|issue=4|pages=524–535|archive-url=https://web.archive.org/web/20170213091838/https://www.researchgate.net/publication/229085137_Single_nucleotide_polymorphisms_a_new_paradigm_for_molecular_marker_technology_and_DNA_polymorphism_detection_with_emphasis_on_their_use_in_plantsPK_Gupta_JK_Roy_M_PrasadCurr_Sci_80_4_524-35|archive-date=13 February 2017}}
  • In genetic epidemiology SNPs are used to estimate transmission clusters.{{cite journal|vauthors=Stimson J, Gardy J, Mathema B, Crudu V, Cohen T, Colijn C|title=Beyond the SNP Threshold: Identifying Outbreak Clusters Using Inferred Transmissions|url=|journal=Molecular Biology and Evolution|volume=36|issue=3|pages=587–603|doi=10.1093/molbev/msy242|date=25 January 2019|pmid=30690464|pmc=6389316 }}

= Importance =

Variations in the DNA sequences of humans can affect how humans develop diseases and respond to pathogens, chemicals, drugs, vaccines, and other agents. SNPs are also critical for personalized medicine.{{cite journal|first1 = Bruce|last1 = Carlson|title = SNPs — A Shortcut to Personalized Medicine|url = http://www.genengnews.com/gen-articles/snps-a-shortcut-to-personalized-medicine/2507/|journal = Genetic Engineering & Biotechnology News|publisher = Mary Ann Liebert, Inc.|date = 15 June 2008|access-date = 2008-07-06|volume = 28|issue = 12|quote = (subtitle) Medical applications are where the market's growth is expected|url-status = live|archive-url = https://web.archive.org/web/20101226164858/http://www.genengnews.com/gen-articles/snps-a-shortcut-to-personalized-medicine/2507|archive-date = 26 December 2010}} Examples include biomedical research, forensics, pharmacogenetics, and disease causation, as outlined below.

= Clinical research =

== Genome-wide association study (GWAS) ==

One of the main contributions of SNPs in clinical research is genome-wide association study (GWAS).{{Cite journal|last1=Visscher|first1=Peter M.|last2=Wray|first2=Naomi R.|last3=Zhang|first3=Qian|last4=Sklar|first4=Pamela|last5=McCarthy|first5=Mark I.|last6=Brown|first6=Matthew A.|last7=Yang|first7=Jian|date=July 2017|title=10 Years of GWAS Discovery: Biology, Function, and Translation|url=|journal=The American Journal of Human Genetics|volume=101|issue=1|pages=5–22|doi=10.1016/j.ajhg.2017.06.005|pmid=28686856|pmc=5501872|issn=0002-9297}} Genome-wide genetic data can be generated by multiple technologies, including SNP array and whole genome sequencing. GWAS has been commonly used in identifying SNPs associated with diseases or clinical phenotypes or traits. Since GWAS is a genome-wide assessment, a large sample site is required to obtain sufficient statistical power to detect all possible associations. Some SNPs have relatively small effect on diseases or clinical phenotypes or traits. To estimate study power, the genetic model for disease needs to be considered, such as dominant, recessive, or additive effects. Due to genetic heterogeneity, GWAS analysis must be adjusted for race.

== Candidate gene association study ==

Candidate gene association study is commonly used in genetic study before the invention of high throughput genotyping or sequencing technologies.{{Cite journal|last1=Dong|first1=Linda M.|last2=Potter|first2=John D.|last3=White|first3=Emily|last4=Ulrich|first4=Cornelia M.|last5=Cardon|first5=Lon R.|last6=Peters|first6=Ulrike|date=2008-05-28|title=Genetic Susceptibility to Cancer|url=|journal=JAMA|volume=299|issue=20|pages=2423–2436|doi=10.1001/jama.299.20.2423|pmid=18505952|pmc=2772197|issn=0098-7484}} Candidate gene association study is to investigate limited number of pre-specified SNPs for association with diseases or clinical phenotypes or traits. So this is a hypothesis driven approach. Since only a limited number of SNPs are tested, a relatively small sample size is sufficient to detect the association. Candidate gene association approach is also commonly used to confirm findings from GWAS in independent samples.

== Homozygosity mapping in disease ==

Genome-wide SNP data can be used for homozygosity mapping.{{Cite journal|last=Alkuraya|first=Fowzan S.|date=April 2010|title=Homozygosity mapping: One more tool in the clinical geneticist's toolbox|journal=Genetics in Medicine|volume=12|issue=4|pages=236–239|doi=10.1097/gim.0b013e3181ceb95d|pmid=20134328|s2cid=10789932|issn=1098-3600|doi-access=free}} Homozygosity mapping is a method used to identify homozygous autosomal recessive loci, which can be a powerful tool to map genomic regions or genes that are involved in disease pathogenesis.

== Methylation patterns ==

File:Associations between SNPs, methylation patterns and gene expression.png

Recently, preliminary results reported SNPs as important components of the epigenetic program in organisms.{{Cite journal |last1=Vohra |first1=Manik |last2=Sharma |first2=Anu Radha |last3=Prabhu B |first3=Navya |last4=Rai |first4=Padmalatha S. |date=2020 |title=SNPs in Sites for DNA Methylation, Transcription Factor Binding, and miRNA Targets Leading to Allele-Specific Gene Expression and Contributing to Complex Disease Risk: A Systematic Review |journal=Public Health Genomics |volume=23 |issue=5–6 |pages=155–170 |doi=10.1159/000510253 |pmid=32966991 |s2cid=221886624 |issn=1662-4246|doi-access=free }}{{Cite journal |last1=Wang |first1=Jing |last2=Ma |first2=Xiaoqin |last3=Zhang |first3=Qi |last4=Chen |first4=Yinghui |last5=Wu |first5=Dan |last6=Zhao |first6=Pengjun |last7=Yu |first7=Yu |date=2021 |title=The Interaction Analysis of SNP Variants and DNA Methylation Identifies Novel Methylated Pathogenesis Genes in Congenital Heart Diseases |journal=Frontiers in Cell and Developmental Biology |volume=9 |page=665514 |doi=10.3389/fcell.2021.665514 |issn=2296-634X |pmc=8143053 |pmid=34041244|doi-access=free }} Moreover, cosmopolitan studies in European and South Asiatic populations have revealed the influence of SNPs in the methylation of specific CpG sites.{{Cite journal |last1=Hawe |first1=Johann S. |last2=Wilson |first2=Rory |last3=Schmid |first3=Katharina T. |last4=Zhou |first4=Li |last5=Lakshmanan |first5=Lakshmi Narayanan |last6=Lehne |first6=Benjamin C. |last7=Kühnel |first7=Brigitte |last8=Scott |first8=William R. |last9=Wielscher |first9=Matthias |last10=Yew |first10=Yik Weng |last11=Baumbach |first11=Clemens |last12=Lee |first12=Dominic P. |last13=Marouli |first13=Eirini |last14=Bernard |first14=Manon |last15=Pfeiffer |first15=Liliane |date=January 2022 |title=Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function |journal=Nature Genetics |language=en |volume=54 |issue=1 |pages=18–29 |doi=10.1038/s41588-021-00969-x |pmid=34980917 |s2cid=256821844 |issn=1546-1718|pmc=7617265 }} In addition, meQTL enrichment analysis using GWAS database, demonstrated that those associations are important toward the prediction of biological traits.{{Cite journal |last1=Perzel Mandell |first1=Kira A. |last2=Eagles |first2=Nicholas J. |last3=Wilton |first3=Richard |last4=Price |first4=Amanda J. |last5=Semick |first5=Stephen A. |last6=Collado-Torres |first6=Leonardo |last7=Ulrich |first7=William S. |last8=Tao |first8=Ran |last9=Han |first9=Shizhong |last10=Szalay |first10=Alexander S. |last11=Hyde |first11=Thomas M. |last12=Kleinman |first12=Joel E. |last13=Weinberger |first13=Daniel R. |last14=Jaffe |first14=Andrew E. |date=2021-09-02 |title=Genome-wide sequencing-based identification of methylation quantitative trait loci and their role in schizophrenia risk |journal=Nature Communications |language=en |volume=12 |issue=1 |pages=5251 |doi=10.1038/s41467-021-25517-3 |issn=2041-1723 |pmc=8413445 |pmid=34475392|bibcode=2021NatCo..12.5251P }}{{Cite journal |last1=Hoffmann |first1=Anke |last2=Ziller |first2=Michael |last3=Spengler |first3=Dietmar |date=December 2016 |title=The Future is The Past: Methylation QTLs in Schizophrenia |journal=Genes |language=en |volume=7 |issue=12 |pages=104 |doi=10.3390/genes7120104 |issn=2073-4425 |pmc=5192480 |pmid=27886132|doi-access=free }}  

= Forensic sciences =

SNPs have historically been used to match a forensic DNA sample to a suspect but has been made obsolete due to advancing STR-based DNA fingerprinting techniques. However, the development of next-generation-sequencing (NGS) technology may allow for more opportunities for the use of SNPs in phenotypic clues such as ethnicity, hair color, and eye color with a good probability of a match. This can additionally be applied to increase the accuracy of facial reconstructions by providing information that may otherwise be unknown, and this information can be used to help identify suspects even without a STR DNA profile match.

Some cons to using SNPs versus STRs is that SNPs yield less information than STRs, and therefore more SNPs are needed for analysis before a profile of a suspect is able to be created. Additionally, SNPs heavily rely on the presence of a database for comparative analysis of samples. However, in instances with degraded or small volume samples, SNP techniques are an excellent alternative to STR methods. SNPs (as opposed to STRs) have an abundance of potential markers, can be fully automated, and a possible reduction of required fragment length to less than 100 bp.

= Pharmacogenetics =

Pharmacogenetics focuses on identifying genetic variations including SNPs associated with differential responses to treatment.{{Cite journal|last=Daly|first=Ann K|date=2017-10-11|title=Pharmacogenetics: a general review on progress to date|journal=British Medical Bulletin|volume=124|issue=1|pages=65–79|doi=10.1093/bmb/ldx035|pmid=29040422|issn=0007-1420|doi-access=free}} Many drug metabolizing enzymes, drug targets, or target pathways can be influenced by SNPs. The SNPs involved in drug metabolizing enzyme activities can change drug pharmacokinetics, while the SNPs involved in drug target or its pathway can change drug pharmacodynamics. Therefore, SNPs are potential genetic markers that can be used to predict drug exposure or effectiveness of the treatment. Genome-wide pharmacogenetic study is called pharmacogenomics. Pharmacogenetics and pharmacogenomics are important in the development of precision medicine, especially for life-threatening diseases such as cancers.

= Disease =

Only small amount of SNPs in the human genome may have impact on human diseases. Large scale GWAS has been done for the most important human diseases, including heart diseases, metabolic diseases, autoimmune diseases, and neurodegenerative and psychiatric disorders. Most of the SNPs with relatively large effects on these diseases have been identified. These findings have significantly improved understanding of disease pathogenesis and molecular pathways, and facilitated development of better treatment. Further GWAS with larger samples size will reveal the SNPs with relatively small effect on diseases. For common and complex diseases, such as type-2 diabetes, rheumatoid arthritis, and Alzheimer's disease, multiple genetic factors are involved in disease etiology. In addition, gene-gene interaction and gene-environment interaction also play an important role in disease initiation and progression.{{Cite journal|last1=Musci|first1=Rashelle J.|last2=Augustinavicius|first2=Jura L.|last3=Volk|first3=Heather|date=2019-08-13|title=Gene-Environment Interactions in Psychiatry: Recent Evidence and Clinical Implications|url=|journal=Current Psychiatry Reports|volume=21|issue=9|page=81|doi=10.1007/s11920-019-1065-5|pmid=31410638|pmc=7340157|issn=1523-3812}}

Examples

  • rs6311 and rs6313 are SNPs in the Serotonin 5-HT2A receptor gene on human chromosome 13.{{cite journal | vauthors = Giegling I, Hartmann AM, Möller HJ, Rujescu D | title = Anger- and aggression-related traits are associated with polymorphisms in the 5-HT-2A gene | journal = Journal of Affective Disorders | volume = 96 | issue = 1–2 | pages = 75–81 | date = November 2006 | pmid = 16814396 | doi = 10.1016/j.jad.2006.05.016 }}
  • The SNP − 3279C/A (rs3761548) is amongst the SNPs locating in the promoter region of the Foxp3 gene, might be involved in cancer progression.{{cite journal | vauthors = Ezzeddini R, Somi MH, Taghikhani M, Moaddab SY, Masnadi Shirazi K, Shirmohammadi M, Eftekharsadat AT, Sadighi Moghaddam B, Salek Farrokhi A | title = Association of Foxp3 rs3761548 polymorphism with cytokines concentration in gastric adenocarcinoma patients | journal = Cytokine | volume = 138 | issue = | pages = 155351 | date = February 2021 | pmid = 33127257 | doi = 10.1016/j.cyto.2020.155351 | s2cid = 226218796 | url=| issn =1043-4666 }}
  • A SNP in the F5 gene causes Factor V Leiden thrombophilia.{{cite journal | vauthors = Kujovich JL | title = Factor V Leiden thrombophilia | journal = Genetics in Medicine | volume = 13 | issue = 1 | pages = 1–16 | date = January 2011 | pmid = 21116184 | doi = 10.1097/GIM.0b013e3181faa0f2 | doi-access = free }}
  • rs3091244 is an example of a triallelic SNP in the CRP gene on human chromosome 1.{{cite journal | vauthors = Morita A, Nakayama T, Doba N, Hinohara S, Mizutani T, Soma M | title = Genotyping of triallelic SNPs using TaqMan PCR | journal = Molecular and Cellular Probes | volume = 21 | issue = 3 | pages = 171–6 | date = June 2007 | pmid = 17161935 | doi = 10.1016/j.mcp.2006.10.005 }}
  • TAS2R38 codes for PTC tasting ability, and contains 6 annotated SNPs.{{cite journal | vauthors = Prodi DA, Drayna D, Forabosco P, Palmas MA, Maestrale GB, Piras D, Pirastu M, Angius A | title = Bitter taste study in a sardinian genetic isolate supports the association of phenylthiocarbamide sensitivity to the TAS2R38 bitter receptor gene | journal = Chemical Senses | volume = 29 | issue = 8 | pages = 697–702 | date = October 2004 | pmid = 15466815 | doi = 10.1093/chemse/bjh074 | doi-access = free }}
  • rs148649884 and rs138055828 in the FCN1 gene encoding M-ficolin crippled the ligand-binding capability of the recombinant M-ficolin.{{cite journal | vauthors = Ammitzbøll CG, Kjær TR, Steffensen R, Stengaard-Pedersen K, Nielsen HJ, Thiel S, Bøgsted M, Jensenius JC | title = Non-synonymous polymorphisms in the FCN1 gene determine ligand-binding ability and serum levels of M-ficolin | journal = PLOS ONE | volume = 7 | issue = 11 | pages = e50585 | date = 28 November 2012 | pmid = 23209787 | pmc = 3509001 | doi = 10.1371/journal.pone.0050585 | bibcode = 2012PLoSO...750585A | doi-access = free }}
  • rs12821256 on a cis-regulatory module changes the amount of transcription of the KIT ligand gene. Among northern Europeans, high levels of transcription leads to brown hair, and low levels leads to blond hair. This is an example of overt but non-pathological phenotype change by one SNP.{{Cite journal |last1=Guenther |first1=Catherine A. |last2=Tasic |first2=Bosiljka |last3=Luo |first3=Liqun |last4=Bedell |first4=Mary A. |last5=Kingsley |first5=David M. |date=July 2014 |title=A molecular basis for classic blond hair color in Europeans |journal=Nature Genetics |language=en |volume=46 |issue=7 |pages=748–752 |doi=10.1038/ng.2991 |pmid=24880339 |issn=1546-1718|pmc=4704868 }}
  • An intronic SNP in DNA mismatch repair gene PMS2 (rs1059060, Ser775Asn) is associated with increased sperm DNA damage and risk of male infertility.{{cite journal | vauthors = Ji G, Long Y, Zhou Y, Huang C, Gu A, Wang X | title = Common variants in mismatch repair genes associated with increased risk of sperm DNA damage and male infertility | journal = BMC Medicine | volume = 10 | pages = 49 | date = May 2012 | pmid = 22594646 | pmc = 3378460 | doi = 10.1186/1741-7015-10-49 | doi-access = free }}

Databases

As there are for genes, bioinformatics databases exist for SNPs.

  • dbSNP is a SNP database from the National Center for Biotechnology Information (NCBI). {{As of|2015|6|8|df=us}}, dbSNP listed 149,735,377 SNPs in humans.National Center for Biotechnology Information, United States National Library of Medicine. 2014. NCBI dbSNP build 142 for human. {{cite web |url=https://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2014q4/000147.html |title=[DBSNP-announce] DBSNP Human Build 142 (GRCh38 and GRCh37.p13) |access-date=2017-09-11 |url-status=live |archive-url=https://web.archive.org/web/20170910221732/https://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2014q4/000147.html |archive-date=2017-09-10 }}National Center for Biotechnology Information, United States National Library of Medicine. 2015. NCBI dbSNP build 144 for human. Summary Page. {{cite web |url=https://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summary&build_id=144 |title=DBSNP Summary |access-date=2017-09-11 |url-status=live |archive-url=https://web.archive.org/web/20170910221718/https://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summary&build_id=144 |archive-date=2017-09-10 }}
  • [http://db.systemsbiology.net/kaviar/ Kaviar]{{cite journal | vauthors = Glusman G, Caballero J, Mauldin DE, Hood L, Roach JC | title = Kaviar: an accessible system for testing SNV novelty | journal = Bioinformatics | volume = 27 | issue = 22 | pages = 3216–7 | date = November 2011 | pmid = 21965822 | pmc = 3208392 | doi = 10.1093/bioinformatics/btr540 }} is a compendium of SNPs from multiple data sources including dbSNP.
  • SNPedia is a wiki-style database supporting personal genome annotation, interpretation and analysis.
  • The OMIM database describes the association between polymorphisms and diseases (e.g., gives diseases in text form)
  • dbSAP – single amino-acid polymorphism database for protein variation detection{{cite journal | vauthors = Cao R, Shi Y, Chen S, Ma Y, Chen J, Yang J, Chen G, Shi T | title = dbSAP: single amino-acid polymorphism database for protein variation detection | journal = Nucleic Acids Research | volume = 45 | issue = D1 | pages = D827–D832 | date = January 2017 | pmid = 27903894 | pmc = 5210569 | doi = 10.1093/nar/gkw1096 }}
  • The Human Gene Mutation Database provides gene mutations causing or associated with human inherited diseases and functional SNPs
  • The International HapMap Project, where researchers are identifying Tag SNPs to be able to determine the collection of haplotypes present in each subject.
  • GWAS Central allows users to visually interrogate the actual summary-level association data in one or more genome-wide association studies.

The International SNP Map working group mapped the sequence flanking each SNP by alignment to the genomic sequence of large-insert clones in Genebank. These alignments were converted to chromosomal coordinates that is shown in Table 1.{{cite journal | vauthors = Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D | title = A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms | journal = Nature | volume = 409 | issue = 6822 | pages = 928–33 | date = February 2001 | pmid = 11237013 | doi = 10.1038/35057149 | author-link5 = Lincoln Stein | bibcode = 2001Natur.409..928S | doi-access = free }} This list has greatly increased since, with, for instance, the Kaviar database now listing 162 million single nucleotide variants (SNVs).

class="wikitable"

! Chromosome !! Length(bp) !! All SNPs !! !! TSC SNPs !!

Total SNPskb per SNPTotal SNPskb per SNP
1214,066,000129,9311.6575,1662.85
2222,889,000103,6642.1576,9852.90
3186,938,00093,1402.0163,6692.94
4169,035,00084,4262.0065,7192.57
5170,954,000117,8821.4563,5452.69
6165,022,00096,3171.7153,7973.07
7149,414,00071,7522.0842,3273.53
8125,148,00057,8342.1642,6532.93
9107,440,00062,0131.7343,0202.50
10127,894,00061,2982.0942,4663.01
11129,193,00084,6631.5347,6212.71
12125,198,00059,2452.1138,1363.28
1393,711,00053,0931.7735,7452.62
1489,344,00044,1122.0329,7463.00
1573,467,00037,8141.9426,5242.77
1674,037,00038,7351.9123,3283.17
1773,367,00034,6212.1219,3963.78
1873,078,00045,1351.6227,0282.70
1956,044,00025,6762.1811,1855.01
2063,317,00029,4782.1517,0513.71
2133,824,00020,9161.629,1033.72
2233,786,00028,4101.1911,0563.06
X131,245,00034,8423.7720,4006.43
Y21,753,0004,1935.191,78412.19
RefSeq15,696,67414,5341.08

|

Totals2,710,164,0001,419,1901.91887,4503.05

Nomenclature

The nomenclature for SNPs include several variations for an individual SNP, while lacking a common consensus.

The rs### standard is that which has been adopted by dbSNP and uses the prefix "rs", for "reference SNP", followed by a unique and arbitrary number.{{cite book | title = SNP FAQ Archive | location = Bethesda (MD) | publisher = U.S. National Center for Biotechnology Information | chapter-url = https://www.ncbi.nlm.nih.gov/books/NBK44417/ | chapter = Clustered RefSNPs (rs) and Other Data Computed in House |date=2005 }} SNPs are frequently referred to by their dbSNP rs number, as in the examples above.

The Human Genome Variation Society (HGVS) uses a standard which conveys more information about the SNP. Examples are:

  • c.76A>T: "c." for coding region, followed by a number for the position of the nucleotide, followed by a one-letter abbreviation for the nucleotide (A, C, G, T, or U), followed by a greater than sign (">") to indicate substitution, followed by the abbreviation of the nucleotide which replaces the former{{Cite web|url=http://www.hgvs.org/mutnomen/recs.html|title=Recommendations for the description of sequence variants|author=J.T. Den Dunnen|date=2008-02-20|publisher=Human Genome Variation Society|url-status=live|archive-url=https://web.archive.org/web/20080914071152/http://www.hgvs.org/mutnomen/recs.html|archive-date=2008-09-14|access-date=2008-09-05}}{{cite journal | vauthors = den Dunnen JT, Antonarakis SE | title = Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion | journal = Human Mutation | volume = 15 | issue = 1 | pages = 7–12 | date = 2000 | pmid = 10612815 | doi = 10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N | doi-access = free }}{{cite journal | vauthors = Ogino S, Gulley ML, den Dunnen JT, Wilson RB | title = Standard mutation nomenclature in molecular diagnostics: practical and educational challenges | journal = The Journal of Molecular Diagnostics | volume = 9 | issue = 1 | pages = 1–6 | date = February 2007 | pmid = 17251329 | pmc = 1867422 | doi = 10.2353/jmoldx.2007.060081 | author5 = Association for Molecular Pathology Training and Education Committee }}
  • p.Ser123Arg: "p." for protein, followed by a three-letter abbreviation for the amino acid, followed by a number for the position of the amino acid, followed by the abbreviation of the amino acid which replaces the former.{{Cite web|url=http://varnomen.hgvs.org/recommendations/general/|title=Sequence Variant Nomenclature|website=varnomen.hgvs.org|access-date=2019-12-02}}

SNP analysis

SNPs can be easily assayed due to only containing two possible alleles and three possible genotypes involving the two alleles: homozygous A, homozygous B and heterozygous AB, leading to many possible techniques for analysis. Some include: DNA sequencing; capillary electrophoresis; mass spectrometry; single-strand conformation polymorphism (SSCP); single base extension; electrochemical analysis; denaturating HPLC and gel electrophoresis; restriction fragment length polymorphism; and hybridization analysis.

Programs for prediction of SNP effects

An important group of SNPs are those that corresponds to missense mutations causing amino acid change on protein level. Point mutation of particular residue can have different effect on protein function (from no effect to complete disruption its function). Usually, change in amino acids with similar size and physico-chemical properties (e.g. substitution from leucine to valine) has mild effect, and opposite. Similarly, if SNP disrupts secondary structure elements (e.g. substitution to proline in alpha helix region) such mutation usually may affect whole protein structure and function. Using those simple and many other machine learning derived rules a group of programs for the prediction of SNP effect was developed:{{Cite journal|last=Johnson|first=Andrew D.|date=October 2009|title=SNP bioinformatics: a comprehensive review of resources|journal=Circulation: Cardiovascular Genetics|volume=2|issue=5|pages=530–536|doi=10.1161/CIRCGENETICS.109.872010|issn=1942-325X|pmc=2789466|pmid=20031630}}

  • [http://sift-dna.org SIFT] This program provides insight into how a laboratory induced missense or nonsynonymous mutation will affect protein function based on physical properties of the amino acid and sequence homology.
  • [https://gsponerlab.msl.ubc.ca/software/list/ LIST] (Local Identity and Shared Taxa){{cite journal | vauthors = Malhis N, Jones SJ, Gsponer J | title = Improved measures for evolutionary conservation that exploit taxonomy distances | journal = Nature Communications | volume = 10 | issue = 1 | pages = 1556 | date = April 2019 | pmid = 30952844 | pmc = 6450959 | doi = 10.1038/s41467-019-09583-2 | bibcode = 2019NatCo..10.1556M }}{{cite journal |author1=Nawar Malhis |author2= Matthew Jacobson |author3=Steven J. M. Jones |author4=Jörg Gsponer | year = 2020 | title = LIST-S2: Taxonomy Based Sorting of Deleterious Missense Mutations Across Species | journal = Nucleic Acids Research | volume =48|issue=W1|pages=W154–W161| doi=10.1093/nar/gkaa288| pmc = 7319545| pmid=32352516| doi-access=free}} estimates the potential deleteriousness of mutations resulted from altering their protein functions. It is based on the assumption that variations observed in closely related species are more significant when assessing conservation compared to those in distantly related species.
  • [https://rostlab.org/services/snap SNAP2]
  • [http://www.sbg.bio.ic.ac.uk/suspect/index.html SuSPect]
  • [http://genetics.bwh.harvard.edu/pph2/ PolyPhen-2]
  • [http://loschmidt.chemi.muni.cz/predictsnp/ PredictSNP]
  • MutationTaster: [http://www.mutationtaster.org/ official website]
  • [http://www.ensembl.org/info/docs/tools/vep/index.html Variant Effect Predictor] from the Ensembl project
  • [https://genomicscomputbiol.org/ojs3/GCB/article/view/48/182 SNPViz] {{Webarchive|url=https://web.archive.org/web/20200807085537/https://genomicscomputbiol.org/ojs3/GCB/article/view/48/182 |date=2020-08-07 }}:{{Cite journal|url=https://genomicscomputbiol.org/ojs3/GCB/article/view/48/182|title=View of SNPViz - Visualization of SNPs in proteins|website=genomicscomputbiol.org|doi=10.18547/gcb.2018.vol4.iss1.e100048|language=en-US|access-date=2018-10-20|doi-access=free|archive-date=2020-08-07|archive-url=https://web.archive.org/web/20200807085537/https://genomicscomputbiol.org/ojs3/GCB/article/view/48/182|url-status=dead}} This program provides a 3D representation of the protein affected, highlighting the amino acid change so doctors can determine pathogenicity of the mutant protein.
  • [http://provean.jcvi.org/index.php PROVEAN]
  • [http://phyrerisk.bc.ic.ac.uk PhyreRisk] is a database which maps variants to experimental and predicted protein structures.{{cite journal | vauthors = Ofoegbu TC, David A, Kelley LA, Mezulis S, Islam SA, Mersmann SF, Strömich L, Vakser IA, Houlston RS, Sternberg MJ | display-authors = 6 | title = PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants | journal = Journal of Molecular Biology | volume = 431 | issue = 13 | pages = 2460–2466 | date = June 2019 | pmid = 31075275 | pmc = 6597944 | doi = 10.1016/j.jmb.2019.04.043 }}
  • [http://www.sbg.bio.ic.ac.uk/~missense3d/ Missense3D] is a tool which provides a stereochemical report on the effect of missense variants on protein structure.{{cite journal | vauthors = Ittisoponpisan S, Islam SA, Khanna T, Alhuzimi E, David A, Sternberg MJ | title = Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? | journal = Journal of Molecular Biology | volume = 431 | issue = 11 | pages = 2197–2212 | date = May 2019 | pmid = 30995449 | pmc = 6544567 | doi = 10.1016/j.jmb.2019.04.009 }}

See also

References

{{reflist}}

Further reading

{{refbegin}}

  • {{cite web | url = http://www.nature.com/nrg/journal/v5/n2/glossary/nrg1270_glossary.html | work = Nature Reviews | title = Glossary }}
  • [https://web.archive.org/web/20120420021612/http://www.ornl.gov/sci/techresources/Human_Genome/faq/snps.shtml Human Genome Project Information] – SNP Fact Sheet

{{refend}}