cancer genome sequencing

Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).

Unlike whole genome (WG) sequencing which is typically from blood cells, such as J. Craig Venter's and James D. Watson’s WG sequencing projects, saliva, epithelial cells or bone - cancer genome sequencing involves direct sequencing of primary tumor tissue, adjacent or distal normal tissue, the tumor micro environment such as fibroblast/stromal cells, or metastatic tumor sites.

Similar to whole genome sequencing, the information generated from this technique include: identification of nucleotide bases (DNA or RNA), copy number and sequence variants, mutation status, and structural changes such as chromosomal translocations and fusion genes.

Cancer genome sequencing is not limited to WG sequencing and can also include exome, transcriptome, micronome sequencing, and end-sequence profiling. These methods can be used to quantify gene expression, miRNA expression, and identify alternative splicing events in addition to sequence data.

The first report of cancer genome sequencing appeared in 2006. In this study 13,023 genes were sequenced in 11 breast and 11 colorectal tumors.{{cite journal|last1=Sjoblom|first1=T.|last2=Jones|first2=S.|last3=Wood|first3=L. D.|last4=Parsons|first4=D. W.|last5=Lin|first5=J.|last6=Barber|first6=T. D.|last7=Mandelker|first7=D.|last8=Leary|first8=R. J.|last9=Ptak|first9=J.|last10=Silliman|first10=N.|last11=Szabo|first11=S.|last12=Buckhaults|first12=P.|last13=Farrell|first13=C.|last14=Meeh|first14=P.|last15=Markowitz|first15=S. D.|last16=Willis|first16=J.|last17=Dawson|first17=D.|last18=Willson|first18=J. K. V.|last19=Gazdar|first19=A. F.|last20=Hartigan|first20=J.|last21=Wu|first21=L.|last22=Liu|first22=C.|last23=Parmigiani|first23=G.|last24=Park|first24=B. H.|last25=Bachman|first25=K. E.|last26=Papadopoulos|first26=N.|last27=Vogelstein|first27=B.|last28=Kinzler|first28=K. W.|last29=Velculescu|first29=V. E.|title=The Consensus Coding Sequences of Human Breast and Colorectal Cancers|journal=Science|volume=314|issue=5797|year=2006|pages=268–274|issn=0036-8075|doi=10.1126/science.1133427|pmid=16959974|bibcode=2006Sci...314..268S|s2cid=10805017}} A subsequent follow up was published in 2007 where the same group added just over 5,000 more genes and almost 8,000 transcript species to complete the exomes of 11 breast and colorectal tumors.{{cite journal|last1=Wood|first1=L. D.|last2=Parsons|first2=D. W.|last3=Jones|first3=S.|last4=Lin|first4=J.|last5=Sjoblom|first5=T.|last6=Leary|first6=R. J.|last7=Shen|first7=D.|last8=Boca|first8=S. M.|last9=Barber|first9=T.|last10=Ptak|first10=J.|last11=Silliman|first11=N.|last12=Szabo|first12=S.|last13=Dezso|first13=Z.|last14=Ustyanksky|first14=V.|last15=Nikolskaya|first15=T.|last16=Nikolsky|first16=Y.|last17=Karchin|first17=R.|last18=Wilson|first18=P. A.|last19=Kaminker|first19=J. S.|last20=Zhang|first20=Z.|last21=Croshaw|first21=R.|last22=Willis|first22=J.|last23=Dawson|first23=D.|last24=Shipitsin|first24=M.|last25=Willson|first25=J. K. V.|last26=Sukumar|first26=S.|last27=Polyak|first27=K.|last28=Park|first28=B. H.|last29=Pethiyagoda|first29=C. L.|last30=Pant|first30=P. V. K.|last31=Ballinger|first31=D. G.|last32=Sparks|first32=A. B.|last33=Hartigan|first33=J.|last34=Smith|first34=D. R.|last35=Suh|first35=E.|last36=Papadopoulos|first36=N.|last37=Buckhaults|first37=P.|last38=Markowitz|first38=S. D.|last39=Parmigiani|first39=G.|last40=Kinzler|first40=K. W.|last41=Velculescu|first41=V. E.|last42=Vogelstein|first42=B.|title=The Genomic Landscapes of Human Breast and Colorectal Cancers|journal=Science|volume=318|issue=5853|year=2007|pages=1108–1113|issn=0036-8075|doi=10.1126/science.1145720|pmid=17932254|bibcode=2007Sci...318.1108W|citeseerx=10.1.1.218.5477|s2cid=7586573}} The first whole cancer genome to be sequenced was from cytogenetically normal acute myeloid leukaemia by Ley et al. in November 2008.{{cite journal |author=Timothy Ley |title=DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome |journal=Nature |volume=456 |issue=7218 |date=November 2008 |pmid= 18987736

| doi=10.1038/nature07485 |pmc=2603574|display-authors=etal |pages=66–72|bibcode=2008Natur.456...66L }} The first breast cancer tumor was sequenced by Shah et al. in October 2009, the first lung and skin tumors by Pleasance et al. in January 2010, and the first prostate tumors by Berger et al. in February 2011.

History

Historically, cancer genome sequencing efforts has been divided between transcriptome-based sequencing projects and DNA-centered efforts.

The Cancer Genome Anatomy Project (CGAP) was first funded in 1997 with the goal of documenting the sequences of RNA transcripts in tumor cells. As technology improved, the CGAP expanded its goals to include the determination of gene expression profiles of cancerous, precancerous and normal tissues.

The CGAP published the largest publicly available collection of cancer expressed sequence tags in 2003.

The Sanger Institute's Cancer Genome Project, first funded in 2005, focuses on DNA sequencing. It has published a census of genes causally implicated in cancer, and a number of whole-genome resequencing screens for genes implicated in cancer.

The International Cancer Genome Consortium (ICGC) was founded in 2007 with the goal of integrating available genomic, transcriptomic and epigenetic data from many different research groups. As of December 2011, the ICGC includes 45 committed projects and has data from 2,961 cancer genomes available.

Societal Impact

= The Complexity and Biology of Cancer =

The process of tumorigenesis that transforms a normal cell to a cancerous cell involve a series of complex genetic and epigenetic changes. Identification and characterization of all these changes can be accomplished through various cancer genome sequencing strategies.

The power of cancer genome sequencing lies in the heterogeneity of cancers and patients. Most cancers have a variety of subtypes and combined with these ‘cancer variants’ are the differences between a cancer subtype in one individual and in another individual. Cancer genome sequencing allows clinicians and oncologists to identify the specific and unique changes a patient has undergone to develop their cancer. Based on these changes, a personalized therapeutic strategy can be undertaken.

= Clinical Relevance =

A big contribution to cancer death and failed cancer treatment is clonal evolution at the cytogenetic level, for example as seen in acute myeloid leukaemia (AML). In a Nature study published in 2011, Ding et al. identified cellular fractions characterized by common mutational changes to illustrate the heterogeneity of a particular tumor pre- and post-treatment vs. normal blood in one individual.

These cellular factions could only have been identified through cancer genome sequencing, showing the information that sequencing can yield, and the complexity and heterogeneity of a tumor within one individual.

Comprehensive Cancer Genomic Projects

The two main projects focused on complete cancer characterization in individuals, heavily involving sequencing include the Cancer Genome Project, based at the Wellcome Trust Sanger Institute and the Cancer Genome Atlas funded by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). Combined with these efforts, the International Cancer Genome Consortium (a larger organization) is a voluntary scientific organization that provides a forum for collaboration among the world's leading cancer and genomic researchers.

= Cancer Genome Project (CGP) =

The Cancer Genome Projects goal is to identify sequence variants and mutations critical in the development of human cancers. The project involves the systematic screening of coding genes and flanking splice junctions of all genes in the human genome for acquired mutations in human cancers. To investigate these events, the discovery sample set will include DNA from primary tumor, normal tissue (from the same individuals) and cancer cell lines. All results from this project are amalgamated and stored within the COSMIC cancer database. COSMIC also includes mutational data published in scientific literature.

= The Cancer Genome Atlas (TCGA) =

The TCGA is a multi-institutional effort to understand the molecular basis of cancer through genome analysis technologies, including large-scale genome sequencing techniques. Hundreds of samples are being collected, sequenced and analyzed. Currently the cancer tissue being collected include: central nervous system, breast, gastrointestinal, gynecologic, head and neck, hematologic, thoracic, and urologic.

The components of the TCGA research network include: Biospecimen Core Resources, Genome Characterization Centers, Genome Sequencing Centers, Proteome Characterization Centers, a Data Coordinating Center, and Genome Data Analysis Centers. Each cancer type will undergo comprehensive genomic characterization and analysis. The data and information generated is freely available through the projects TCGA data portal.

= International Cancer Genome Consortium (ICGC) =

The ICGC’s goal is “To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe”.

Technologies and platforms

File:2nd Generation Sequencing.png

File:3rd Generation Sequencing.png

Cancer genome sequencing utilizes the same technology involved in whole genome sequencing. The history of sequencing has come a long way, originating in 1977 by two independent groups - Fredrick Sanger’s enzymatic didoxy DNA sequencing technique and the Allen Maxam and Walter Gilbert chemical degradation technique. Following these landmark papers, over 20 years later ‘Second Generation’ high-throughput next generation sequencing (HT-NGS) was born followed by ‘Third Generation HT-NGS technology’ in 2010. The figures to the right illustrate the general biological pipeline and companies involved in second and third generation HT-NGS sequencing.

Three major second generation platforms include Roche/454 Pyro-sequencing, ABI/SOLiD sequencing by ligation, and Illumina’s bridge amplification sequencing technology. Three major third generation platforms include Pacific Biosciences Single Molecule Real Time (SMRT) sequencing, Oxford Nanopore sequencing, and Ion semiconductor sequencing.

Data Analysis

File:Cancer genome sequencing workflow.png

As with any genome sequencing project, the reads must be assembled to form a representation of the chromosomes being sequenced. With cancer genomes, this is usually done by aligning the reads to the human reference genome.

Since even non-cancerous cells accumulate somatic mutations, it is necessary to compare sequence of the tumor to a matched normal tissue in order to discover which mutations are unique to the cancer. In some cancers, such as leukemia, it is not practical to match the cancer sample to a normal tissue, so a different non-cancerous tissue must be used.

It has been estimated that discovery of all somatic mutations in a tumor would require 30-fold sequencing coverage of the tumor genome and a matched normal tissue. By comparison, the original draft of the human genome had approximately 65-fold coverage. To facilitate further improvement in somatic mutation detection in cancer, the Sequencing Quality Control Phase 2 Consortium has established a pair of tumor-normal cell lines as community reference samples and data sets for the benchmarking of cancer mutation detections.{{cite journal |author=Fang, L.T. |title=Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing |journal=Nature Biotechnology |date=September 2021 |volume=39 |issue=9 |pages=1151–1160 |pmid=34504347 |doi=10.1038/s41587-021-00993-6|pmc=8532138 |display-authors=etal }}

A major goal of cancer genome sequencing is to identify driver mutations: genetic changes which increase the mutation rate in the cell, leading to more rapid tumor evolution and metastasis. It is difficult to determine driver mutations from DNA sequence alone; but drivers tend to be the most commonly shared mutations amongst tumors, cluster around known oncogenes, and are tend to be non-silent. Passenger mutations, which are not important in the progression of the disease, are randomly distributed throughout the genome. It has been estimated that the average tumor carries c.a. 80 somatic mutations, fewer than 15 of which are expected to be drivers.

A personal-genomics analysis requires further functional characterization of the detected mutant genes, and the development of a basic model of the origin and progression of the tumor. This analysis can be used to make pharmacological treatment recommendations. As of February 2012, this has only been done for patients clinical trials designed to assess the personal genomics approach to cancer treatment.

Limitations

A large-scale screen for somatic mutations in breast and colorectal tumors showed that many low-frequency mutations each make small contribution to cell survival. If cell survival is determined by many mutations of small effect, it is unlikely that genome sequencing will uncover a single "Achilles heel" target for anti-cancer drugs. However, somatic mutations tend to cluster in a limited number of signalling pathways, which are potential treatment targets.

Cancers are heterogeneous populations of cells. When sequence data is derived from a whole tumor, information about the differences in sequence and expression pattern between cells is lost. This difficulty can be ameliorated by single-cell analysis.

Clinically significant properties of tumors, including drug resistance, are sometimes caused by large-scale rearrangements of the genome, rather than single mutations. In this case, information about single nucleotide variants will be of limited utility.

Cancer genome sequencing can be used to provide clinically relevant information in patients with rare or novel tumor types. Translating sequence information into a clinical treatment plan is highly complicated, requires experts of many different fields, and is not guaranteed to lead to an effective treatment plan.

Incidentalome

The incidentalome is the set of detected genomic variants not related to the cancer under study.{{Cite journal

| last1 = Kohane | first1 = I. S.

| last2 = Masys | first2 = D. R.

| last3 = Altman | first3 = R. B.

| doi = 10.1001/jama.296.2.212

| title = The Incidentalome: A Threat to Genomic Medicine

| journal = JAMA

| volume = 296

| issue = 2

| pages = 212–215

| year = 2006

| pmid = 16835427

| pmc =

}} (The term is a play on the name incidentaloma, which designates tumors and growths detected on whole-body imaging by coincidence).[http://www.medscape.com/viewarticle/810549?nlid=33233_1561&src=wnl_edit_medn_obgy&uac=149266AJ&spon=16 Cancer Gene Sequencing Raises New Medical Ethics Issues] by Janis C. Kelly. Sep 06, 2013 The detection of such variants may result in additional measures such as further testing or lifestyle management.

See also

References

{{Reflist|refs=

{{cite journal |author=Samuel Levy |title=The Diploid Genome Sequence of an Individual Human |journal=PLOS Biology |volume=5 |issue=10 |date=October 2007 |pmid=17803354

|pmc=1964779 |doi=10.1371/journal.pbio.0050254 |display-authors=etal |pages=e254 |doi-access=free }}

{{cite journal |author=David A. Wheeler |title=The complete genome of an individual by massively parallel DNA sequencing |journal=Nature |volume=452 |issue=7189 |date=April 2008 |pmid=18421352 |doi=10.1038/nature06884 |pages=872–6|bibcode=2008Natur.452..872W |display-authors=etal|doi-access=free }}

{{cite journal |author=Sohrab P. Shah |title=Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution |journal=Nature |volume=461 |issue=7265 |pages=809–13 |date=October 2009 |pmid=19812674 |doi=10.1038/nature08489 |bibcode=2009Natur.461..809S |display-authors=etal|doi-access=free }}

{{cite journal |author=Erin D. Pleasance |title=A small-cell lung cancer genome with complex signatures of tobacco exposure |journal=Nature |volume=463 |issue=7278 |pages=184–90 |date=December 2009 |pmid=20016488 |pmc=2880489 |doi=10.1038/nature08629 |display-authors=etal}}

{{cite journal |author=Erin D. Pleasance |title=A comprehensive catalogue of somatic mutations from a human cancer genome |journal=Nature |volume=463 |issue=7278 |pages=191–6 |date=December 2009 |pmid=20016485 |pmc=3145108 |doi=10.1038/nature08658 |display-authors=etal}}

{{cite journal |author=Michael F. Berger |title=The genomic complexity of primary human prostate cancer |journal=Nature |volume=470 |issue=7333 |pages=214–20 |date=February 2011 |pmid=21307934 |pmc=3075885 |doi=10.1038/nature09744 |bibcode=2011Natur.470..214B |display-authors=etal}}

{{cite journal |author=Kenneth W Kinzler |title=Lessons from Hereditary Colorectal Cancer |journal=Cell |volume=87 |issue=2 |pages=159–70 |date=October 1996 |pmid=8861899 |doi=10.1016/S0092-8674(00)81333-1 |display-authors=etal|doi-access=free }}

{{cite journal |author=Peter A. Jones |title=The Epigenomics of Cancer |journal=Cell |volume=128 |issue=4 |pages=683–92 |date=February 2007 |pmid=17320506 |pmc=3894624 |doi=10.1016/j.cell.2007.01.029 |display-authors=etal}}

{{cite journal |author=Angela H. Ting |title=The cancer epigenome--components and functional correlates |journal=Genes & Development |volume=20 |issue=23 |pages=3215–31 |date=December 2006 |pmid=17158741 |doi=10.1101/gad.1464906 |display-authors=etal|doi-access=free }}

{{cite journal |author=Frederick Sanger |title=DNA sequencing with chain-terminating inhibitors |journal=PNAS |volume=74 |issue=12 |date=December 1977 |pmid=1422003 |pages=104–8|doi=10.1073/pnas.74.12.5463 |bibcode=1977PNAS...74.5463S |display-authors=etal|doi-access=free |pmc=431765 }}

{{cite journal |author1=Allan Maxam |author2=Walter Gilbert |title=A new method for sequencing DNA |journal=PNAS |volume=74 |issue=2 |date=February 1977 |pmid=265521 |pmc=392330 |pages=560–4 |doi=10.1073/pnas.74.2.560|bibcode=1977PNAS...74..560M |doi-access=free }}

{{cite journal |author=Chandra Shekhar Pareek |title=Sequencing technologies and genome sequencing |journal=Journal of Applied Genetics |volume=52 |issue=4 |pages=413–35 |date=November 2011 |pmid=21698376 |pmc=3189340 |doi=10.1007/s13353-011-0057-x |display-authors=etal}}

{{cite journal |author=Joseph R. Testa |title=Evolution of Karyotypes in Acute Nonlymphocytic Leukemia |journal=Cancer Research |volume=39 |issue=9 |date=September 1979 |pmid=476688 |pages=3619–27|display-authors=etal}}

{{cite journal |author=Garson OM |title=Cytogenetic studies of 103 patients with acute myelogenous leukemia in relapse |journal=Cancer Genetics and Cytogenetics |volume=40 |issue=2 |date=July 1989 |pmid=2766243 |pages=187–202|display-authors=etal|doi=10.1016/0165-4608(89)90024-1 }}

{{cite web|url=http://cgap.nci.nih.gov |title=Cancer Genome Anatomy Project (CGAP) | Cancer Genome Characterization Initiative (CGCI) |publisher=Cgap.nci.nih.gov |date= |accessdate=2013-09-14}}

{{cite web|author=www-core (Web team) |url=http://sanger.ac.uk/genetics/CGP/ |title=Cancer genome project (CGP) - Wellcome Trust Sanger Institute |publisher=Sanger.ac.uk |date=2013-01-30 |accessdate=2013-09-14 |url-status=dead |archiveurl=https://web.archive.org/web/20130702205644/http://www.sanger.ac.uk/genetics/CGP/ |archivedate=July 2, 2013 }}

{{cite web|url=http://www.icgc.org |title=International Cancer Genome Consortium |publisher=Icgc.org |date= |accessdate=2013-09-14}}

{{cite journal |author=E Pinnisi |title= A catalog of cancer genes at the click of a mouse |journal=Science |volume=267 |issue=5315 |date=May 1997|pmid=9173535 |pages=1023–4 |doi=10.1126/science.276.5315.1023|s2cid= 5832728 }}

[http://licr.org/D_programs/d1c2_HCGP.php] {{webarchive |url=https://web.archive.org/web/20110503232804/http://licr.org/D_programs/d1c2_HCGP.php |date=May 3, 2011 }}

{{cite web|url=http://sanger.ac.uk/genetics/CGP/Census/ |title=COSMIC: Cancer Gene census |publisher=Sanger.ac.uk |accessdate=2013-09-14 |url-status=dead |archiveurl=https://web.archive.org/web/20130702200647/http://www.sanger.ac.uk/genetics/CGP/Census/ |archivedate=July 2, 2013 }}

{{cite journal |author=B Kuska |title=Cancer genome anatomy project set for takeoff |date=December 1996 |journal=Journal of the National Cancer Institute |volume=88 |pmid=8961968|issue=24 |pages=1801–3|doi=10.1093/jnci/88.24.1801 |doi-access=free }}

{{cite journal |author= International Cancer Genome Consortium |title=International network of cancer genome projects |journal=Nature |date=April 2010 |volume=464 |issue=7291 |pmid=20393554 |doi=10.1038/nature08987 |pmc=2902243 |pages=993–8|bibcode=2010Natur.464..993T }}

{{cite journal |author= Wood, L.D. |title= The genomic landscapes of human breast and colorectal cancers |journal=Science |date=November 2007 |volume=318 |issue=5853 |pmid= 17932254 |pages=8–9 |doi=10.1126/science.1145720|bibcode= 2007Sci...318.1108W |display-authors=etal|citeseerx= 10.1.1.218.5477 |s2cid= 7586573 }}

{{cite journal |author1=Straton, M. R. |author2=Campbell, P. J. |author3= Futreal, P.A. |title= The cancer genome |date=April 2009 |journal=Nature |volume=458 |doi=10.1038/nature07943 |pmid=19360079 |issue=7239|pages=719–724 |pmc=2821689 |bibcode=2009Natur.458..719S }}

{{cite journal |author= Jones, S. |title= Core signaling pathways in human pancreatic cancers revealed by global genomic analyses |journal=Science |date=September 2008 |volume=321 |issue=5897 |pages= 1801–6 |pmid=18772397 |doi=10.1126/science.1164368 |pmc=2848990|bibcode= 2008Sci...321.1801J |display-authors=etal}}

{{cite journal |author= Miklos, G. L. |title= The human cancer genome project: one more misstep in the war on cancer |journal=Nature Biotechnology |date=May 2005 |volume=23 |issue=5 |pmid=15877064 |doi=10.1038/nbt0505-535 |pages=535–7|s2cid= 39302093 }}

{{cite journal |author1=Duesberg, P. |author2=Rasnick, D. |title= Aneuploidy approaching a perfect score in predicting and preventing cancer: highlights from a conference held in Oakland, CA in January, 2004. |journal=Cell Cycle |year=2004 |volume=3 |issue=6 |pmid=15197343 |pages=823–8|doi=10.4161/cc.3.6.938 |doi-access=free }}

{{cite journal |author=Jone, S.J. |title=Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. |journal=Genome Biology |year=2010 |volume=11 |issue=8 |pmid=20696054 |doi=10.1186/gb-2010-11-8-r82 |pmc=2945784 |pages=R82|display-authors=etal |doi-access=free }}

{{cite journal |author=Roychowdhury, S. |title=Personalized oncology through integrative high-throughput sequencing: a pilot study |journal=Science Translational Medicine |date=November 2011 |volume=3 |issue=111 |pages=111ra121 |pmid=22133722 |pmc=3476478 |doi=10.1126/scitranslmed.3003161|display-authors=etal}}

{{cite journal |author=Lander, E.S. |title=Initial sequencing and analysis of the human genome |journal=Nature |date=February 2001 |volume=409 |issue=6822 |pages=860–921 |pmid=11237011 |doi=10.1038/35057062|display-authors=etal|url=https://deepblue.lib.umich.edu/bitstream/2027.42/62798/1/409860a0.pdf |doi-access=free }}

{{cite journal |author1=Wong, K. M. |author2=Hudson, T. J. |author3= McPherson, J. D. |title=Unraveling the genetics of cancer: genome sequencing and beyond |journal=Annual Review of Genomics and Human Genetics |date=September 2011 |volume=12 |pmid=21639794 |doi=10.1146/annurev-genom-082509-141532 |pages=407–30|doi-access=free }}

{{cite journal |author = Ding, L. |title=Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing |journal=Nature |volume=481 |date=January 2012 |doi= 10.1038/nature10738 |pmid= 22237025 |pmc=3267864 |issue=7382|pages=506–10 |bibcode=2012Natur.481..506D |display-authors=etal}}

}}