Cambridge Structural Database

{{infobox biodatabase

|title = Cambridge Structural Database

|logo = File:Database.png

|description = {{unbulleted list | Molecular structure elucidation | X-ray crystallography | Cheminformatics | Organic compound |Metalorganic | Organometallic | Chemical structure | Drug discovery | Materials science}}

|scope =

|organism =

|center = Cambridge Crystallographic Data Centre

|laboratory =

|author =

|pmid =

|released =

|standard =

|format = .cif

|url = {{Plainlist|

  • {{URL|http://www.ccdc.cam.ac.uk/}}

}}

|download =

|webservice = {{URL|http://www.ccdc.cam.ac.uk/structures}}

|sql =

|sparql =

|webapp = WebCSD

|standalone = {{unbulleted list | CSD System | CSD (the database) | ConQuest | Mercury | IsoStar | Mogul | GOLD | CSD-CrossMiner}}

|license =

|versioning =

|frequency =

|curation =

|bookmark =

|version =

}}

The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of molecules generally containing at least carbon and hydrogen, comprising a wide range of organic, metal-organic and organometallic molecules. The specific entries are complementary to the other crystallographic databases such as the Protein Data Bank (PDB), Inorganic Crystal Structure Database and International Centre for Diffraction Data. The data, typically obtained by X-ray crystallography and less frequently by electron diffraction or neutron diffraction, and submitted by crystallographers and chemists from around the world, are freely accessible (as deposited by authors) on the Internet via the CSD's parent organization's website (CCDC, Repository{{cite web |publisher=Cambridge Crystallographic Data Centre |url=http://www.ccdc.cam.ac.uk/Community/Requestastructure/Pages/DataRequest.aspx |title=CCDC CIF Depository Request Form |access-date=2014-09-16}}). The CSD is overseen by the not-for-profit incorporated company called the Cambridge Crystallographic Data Centre, CCDC.

File:The inside of the CCDC headquarters Cambridge, UK.jpg

The CSD is a widely used repository for small-molecule organic and metal-organic crystal structures for scientists. Structures deposited with Cambridge Crystallographic Data Centre (CCDC) are publicly available for download at the point of publication or at consent from the depositor. They are also scientifically enriched and included in the database used by software offered by the centre. Targeted subsets of the CSD are also freely available to support teaching and other activities.{{cite web |publisher=Cambridge Crystallographic Data Centre |url=http://www.ccdc.cam.ac.uk/ |title=CCDC Homepage |access-date=2014-09-16}}

History

The CCDC grew out of the activities of the crystallography group led by Olga Kennard OBE FRS in the Department of Organic, Inorganic and Theoretical Chemistry of the University of Cambridge. From 1965, the group began to collect published bibliographic, chemical and crystal structure data for all small molecules studied by X-ray or neutron diffraction. With the rapid developments in computing taking place at this time, this collection was encoded in electronic form and became known as the Cambridge Structural Database (CSD).

The CSD was one of the first numerical scientific databases to begin operations anywhere in the world, and received academic grants from the UK Office for Scientific and Technical Information and then from the UK Science and Engineering Research Council. These funds, together with subventions from National Affiliated Centres, enabled the development of the CSD and its associated software during the 1970s and 1980s. The first releases of the CSD System to the United States, Italy and Japan occurred in the early 1970s. By the early 1980s the CSD System was being distributed in more than 30 countries. As of 2014, the CSD System was distributed to academics in 70 countries.

During the 1980s, interest in the CSD System from pharmaceutical and agrochemicals companies increased significantly. This led to the establishment of the Cambridge Crystallographic Data Centre (CCDC) as an independent company in 1987, with the legal status of a non-profit charitable institution, and with its operations overseen by an international board of governors. The CCDC moved into purpose-built premises on the site of the University Department of Chemistry in 1992.

Kennard retired as Director in 1997 and was succeeded by David Hartley (1997-2002) and Frank Allen (2002-2008). Colin Groom was appointed as executive director from 1 October 2008{{cite journal | vauthors = Groom C, Allen F | title = CCDC well groomed: an interview with Colin Groom, Executive Director, Cambridge Crystallographic Data Centre, and Frank Allen, Emeritus Fellow | journal = Journal of Computer-Aided Molecular Design | volume = 23 | issue = 7 | pages = 391–4 | date = July 2009 | pmid = 19421719 | doi = 10.1007/s10822-009-9272-5 | bibcode = 2009JCAMD..23..391W }} to September 2017.{{Cite web|url=https://www.ccdc.cam.ac.uk/News/List/2017-09-11-ccdc-announcement/|title=Announcement from the Chair, on behalf of Trustees | date=September 11, 2017|website=The Cambridge Crystallographic Data Centre|language=en|access-date=2019-05-15}} And most recently, Juergen Harter was appointed CEO in June 2018.{{Cite web|url=https://www.ccdc.cam.ac.uk/News/List/2018-06-11-new-appointment/|title=The CCDC welcomes Jürgen Harter as CEO | date=June 11, 2018|website=The Cambridge Crystallographic Data Centre (CCDC)|language=en|access-date=2019-05-15}}

CCDC software products diversified to the use of crystallographic data in applications in the life sciences and crystallography. Much of this software development and marketing is carried out by CCDC Software Limited (founded in 1998), a wholly owned subsidiary which covenants all of its profits back to the CCDC.

Although the CCDC is a self-administering organization, it retains close links with the University of Cambridge, and is a University Partner Institution that is qualified to train postgraduate students for higher degrees (PhD, MPhil).

The CCDC established US applications and support operations in the US in October 2013,{{Cite web|url=https://www.ccdc.cam.ac.uk/News/List/post-25/|title=CCDC opens US operations | date=October 30, 2013|website=The Cambridge Crystallographic Data Centre (CCDC)|language=en|access-date=2019-05-15}}{{Cite web|url=https://ored.rutgers.edu/content/cambridge-crystallographic-data-centre-establishes-us-operations-new-partnership-rutgers|title=The Cambridge Crystallographic Data Centre Establishes U.S. Operations in New Partnership with Rutgers' Center for Integrative Proteomics Research |website=Rutgers Office of Research and Economic Development|access-date=May 15, 2019}} initially at Rutgers, the State University of New Jersey, where it is co-located with the RCSB Protein Data Bank

Contents

File:XOPCAJ.jpg

The CSD is updated with about 50,000 new structures each year,{{cite journal | vauthors = Bruno IJ, Groom CR | title = A crystallographic perspective on sharing data and knowledge | journal = Journal of Computer-Aided Molecular Design | volume = 28 | issue = 10 | pages = 1015–22 | date = October 2014 | pmid = 25091065 | pmc = 4196029 | doi = 10.1007/s10822-014-9780-9 | bibcode = 2014JCAMD..28.1015B }} and with improvements to existing entries. Entries (structures) in the repository are released for public access as soon as the corresponding entry has appeared in the peer-reviewed scientific literature. Meanwhile, data can also be deposited and published directly through the CSD without an accompanying scientific article as what is known as a [https://www.ccdc.cam.ac.uk/Community/csd-communications/ CSD Communication].

Periodically, general statistics about the breadth of CSD holdings are reported, for example the January 2014 report.{{cite web |publisher=Cambridge Crystallographic Data Centre |url=http://www.ccdc.cam.ac.uk/Lists/ResourceFileList/2014_stats_entries.pdf |title=CSD Entries: Summary Statistics |access-date=2014-09-16 |archive-url=https://web.archive.org/web/20140611233139/http://www.ccdc.cam.ac.uk/Lists/ResourceFileList/2014_stats_entries.pdf# |archive-date=2014-06-11 |url-status=dead }} {{As of|2019|01||df=}}, the summary statistics are as follows:{{Cite web|url=https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/bda77626c9db4747bc970a250820c3d2.pdf|title=CSD Entries: Summary Statistics | date=January 1, 2019|website=Cambridge Structural Database|access-date=May 15, 2019}}

border="0" style="border: 1px solid #999; background-color:#FFFFFF"
align="center" bgcolor="#CCCCCC"

!Query

!structures

!% of CSD

align="right"

|align="left"|Total # of structures

{{Nts|995907}}100.0
-bgcolor="#EFEFEF" align="right"

|align="left"|# of different compounds

{{Nts|900984}}-
align="right"

|align="left"|# of literature sources

{{Nts|2004}}-
-bgcolor="#EFEFEF" align="right"

|align="left"|Organic structures

{{Nts|431037}}43.5
align="right"

|align="left"|Transition metal present

{{Nts|478138}}48.2
-bgcolor="#EFEFEF" align="right"

|align="left"|alkali or alkaline earth metal present

{{Nts|48056}}4.8
align="right"

|align="left"|main group metal present

{{Nts|101948}}10.3
-bgcolor="#EFEFEF" align="right"

|align="left"|3D coordinates present

{{Nts|937809}}94.6
align="right"

|align="left"|Error-free coordinates

{{Nts|926422}}98.81
align="right"

| align="left" |Neutron studies

{{Nts|2142}}0.2
- align="right" bgcolor="#EFEFEF"

| align="left" |Powder diffraction studies

{{Nts|4761}}0.5
align="right"

| align="left" |Low/high temp. studies

{{Nts|503368}}50.8
- align="right" bgcolor="#EFEFEF"

| align="left" |Absolute configuration determined

{{Nts|28834}}2.9
align="right"

| align="left" |Disorder present in structure

{{Nts|256019}}25.8
- align="right" bgcolor="#EFEFEF"

| align="left" |Polymorphic structures

{{Nts|29817}}3.0
align="right"

| align="left" |R-factor < 0.100

{{Nts|935419}}94.4
- align="right" bgcolor="#EFEFEF"

| align="left" |R-factor < 0.075

{{Nts|845708}}85.3
align="right"

| align="left" |R-factor < 0.050

{{Nts|553042}}55.8
- align="right" bgcolor="#EFEFEF"

| align="left" |R-factor < 0.030

{{Nts|121806}}12.3
align="right"

| align="left" |No. of atoms with 3D coordinates

{{Nts|85791623}}-

As of January 2019, the top 25 scientific journals in terms of publication of structures in the CSD repository were:{{Cite web|url=https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/d22b66666a76428dad703137e9ad8e43.pdf|title=CSD Journal Statistics | date=January 1, 2019|website=Cambridge Structural Database|access-date=May 16, 2019}}

::1. {{Nts|73070}} structures were reported in Inorg. Chem.

::2. {{Nts|62072}} structures were reported in Dalton & J. Chem. Soc., Dalton Trans.

::3. {{Nts|54160}} structures were reported in Organometallics

::4. {{Nts|48967}} structures were reported in J. Am. Chem. Soc.

::5. {{Nts|42422}} structures were reported in Acta Crystallogr. Sect. E

::6. {{Nts|32610}} structures were reported in Chem. Eur. J.

::7. {{Nts|29790}} structures were reported in J. Organomet. Chem.

::8. {{Nts|29640}} structures were reported in Angew. Chem. Int. Ed.

::9. {{Nts|28682}} structures were reported in Inorg. Chim. Acta

::10. {{Nts|28351}} structures were reported in Chem. Commun. & J. Chem. Soc.

::11. {{Nts|27328}} structures were reported in [https://www.ccdc.cam.ac.uk/Community/csd-communications/ CSD Communications]

::12. {{Nts|26774}} structures were reported in Acta Crystallogr. Sect. C

::13. {{Nts|26734}} structures were reported in Polyhedron

::14. {{Nts|24045}} structures were reported in Eur. J. Inorg. Chem.

::15. {{Nts|23483}} structures were reported in J. Org. Chem.

::16. {{Nts|22286}} structures were reported in Cryst. Growth Des.

::17. {{Nts|22011}} structures were reported in CrystEngComm

::18. {{Nts|15985}} structures were reported in Organic Letters

::19. {{Nts|15424}} structures were reported in Z. Anorg. Allg. Chem.

::20. {{Nts|14864}} structures were reported in Acta Crystallogr. Sect. B

::21. {{Nts|13909}} structures were reported in Tetrahedron {{Nts|8597}} structures were reported as Private Communication to the CSD

::22. {{Nts|12734}} structures were reported in J. Mol. Struct.

::23. {{Nts|11234}} structures were reported in Tetrahedron Lett.

::24. {{Nts|9150}} structures were reported in Eur. J. Org. Chem.

::25. {{Nts|8789}} structures were reported in New Journal of Chemistry

These 25 journals account for 704,541 of the 996,193 or 70.7% of the structures in the CSD.

These data show that most structures are determined by X-ray diffraction, with less than 1% of structures being determined by neutron diffraction or powder diffraction. The number of error-free coordinates were taken as a percentage of structures for which 3D coordinates are present in the CSD.

The significance of the structure factor files, mentioned above, is that, for CSD structures determined by X-ray diffraction that have a structure file, a crystallographer can verify the interpretation of the observed measurements.

Growth trend

Historically, the number of structures in the CSD has grown at an approximately exponential rate passing the 25,000 structures milestone in 1977, the 50,000 structures milestone in 1983, the 125,000 structures milestone in 1992, the 250,000 structures milestone in 2001, the 500,000 structures milestone in 2009,{{cite journal | vauthors = Groom CR, Allen FH | title = The Cambridge Structural Database in retrospect and prospect | journal = Angewandte Chemie | volume = 53 | issue = 3 | pages = 662–71 | date = January 2014 | pmid = 24382699 | doi = 10.1002/anie.201306438 | doi-access = free }}{{cite web |url=http://www.ccdc.cam.ac.uk/Solutions/CSDSystem/Pages/CSD.aspx |publisher=CCDC |title=Growth of the Cambridge Structural Database (CSD) since 1970. |access-date=2014-09-16}}{{Cite web|url=https://www.ccdc.cam.ac.uk/CCDCStats/|title=CSD Statistics |website=The Cambridge Crystallographic Data Centre (CCDC)|language=en|access-date=2019-05-17}} and the 1,000,000 structures milestone on June 8, 2019.{{Cite web|url=https://www.chemistryworld.com/news/the-cambridge-structural-database-hits-one-million-structures/3010524.article|title=The Cambridge Structural Database hits one million structures|last1=Robinson|first1=Philip|last2=Withers|first2=Neil|website=Chemistry World|language=en|access-date=2019-06-07|last3=Pink|first3=Chris|last4=Valsler|first4=Ben}} The one millionth structure added to CSD is the crystal structure of 1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one.

File:Growth Trend of Structure in CSD.svg

class="wikitable"

|colspan="3" align="center"|Number of published structures per year

Year

!# published

!Total

align="right"

| 2018

53429{{Nts|974653}}
align="right"

| 2017

55031{{Nts|921224}}
align="right"

| 2016

54975{{Nts|866193}}
align="right"

| 2015

53610{{Nts|811218}}
align="right"

| 2014

50759{{Nts|757608}}
align="right"

| 2013

48025{{Nts|706849}}
align="right"

| 2012

45199{{Nts|661121}}
align="right"

| 2011

43882{{Nts|615922}}
align="right"

| 2010

41240{{Nts|572040}}
align="right"

| 2009

40627{{Nts|530800}}
align="right"

| 2008

36802{{Nts|490173}}
align="right"

| 2007

36569{{Nts|453371}}
align="right"

| 2006

34713{{Nts|416802}}
align="right"

| 2005

31733{{Nts|382089}}
align="right"

| 2004

27988{{Nts|350356}}
align="right"

| 2003

26287{{Nts|322368}}
align="right"

| 2002

24306{{Nts|296081}}
align="right"

| 2001

21781{{Nts|271775}}
align="right"

| 2000

19998{{Nts|249994}}
align="right"

| 1999

18780{{Nts|229996}}
align="right"

| 1998

17289{{Nts|211216}}
align="right"

| 1997

15896{{Nts|193927}}
align="right"

| 1996

15487{{Nts|178031}}
align="right"

| 1995

13001{{Nts|162544}}
align="right"

| 1994

12290{{Nts|149543}}
align="right"

| 1993

12032{{Nts|137253}}
align="right"

| 1992

10691{{Nts|125221}}
align="right"

| 1991

9941{{Nts|114530}}
align="right"

| 1990

8935{{Nts|104589}}
align="right"

| 1989

7750{{Nts|95654}}
align="right"

| 1988

7644{{Nts|87904}}
align="right"

| 1987

7472{{Nts|80260}}
align="right"

| 1986

6873{{Nts|72788}}
align="right"

| 1985

6911{{Nts|65915}}
align="right"

| 1984

6511{{Nts|59004}}
align="right"

| 1983

5250{{Nts|52493}}
align="right"

| 1982

5233{{Nts|47243}}
align="right"

| 1981

4666{{Nts|42010}}
align="right"

| 1980

4252{{Nts|37344}}
align="right"

| 1979

3876{{Nts|33092}}
align="right"

| 1978

3415{{Nts|29216}}
align="right"

| 1977

3092{{Nts|25801}}
align="right"

| 1976

2735{{Nts|22709}}
align="right"

| 1975

2171{{Nts|19974}}
align="right"

| 1974

2142{{Nts|17803}}
align="right"

| 1973

1991{{Nts|15661}}
align="right"

| 1972

1969{{Nts|13670}}
align="right"

| 1971

1548{{Nts|11701}}
align="right"

| 1970

1261{{Nts|10153}}
align="right"

| 1969

1130{{Nts|8892}}
align="right"

| 1968

975{{Nts|7762}}
align="right"

| 1967

936{{Nts|6787}}
align="right"

| 1966

683{{Nts|5851}}
align="right"

| 1965

656{{Nts|5168}}
align="right"

| 1923-1964

4512{{Nts|4512}}

Note: data for 1923-1964 are aggregated together in the last line of the table.

File format

File:BENZAC12.jpg. The top model shows a single molecule of benzoic acid. The bottom model shows a hydrogen-bonded dimer. ]]

The primary file format for CSD structure deposition, adopted around 1991, is the "Crystallographic Information file" format, CIF.{{cite journal |vauthors=Hall SR, Allen FH, Brown ID |title=The Crystallographic Information File (CIF): a new standard archive file for crystallography |journal=Acta Crystallographica |volume=A47 |pages=655–685 |year=1991 |doi=10.1107/S010876739101067X |issue=6 |doi-access=free }}

The deposited CSD files can be downloaded in the CIF format. The validated and curated CSD files can be exported in a wide range of formats, including CIF, MOL, Mol2, PDB, SHELX and XMol, using tools in the CSD System.

The CCDC uses two different codes to distinguish between the deposited dataset and the curated CSD entry. For example, one specific ‘CSD Communication’ of an organic molecule was deposited with the CCDC and assigned the deposition number 'CCDC-991327.' This allows free public access to the data as deposited. From the deposited data, selected information is extracted to prepare the validated and curated CSD entry which was assigned the refcode 'MITGUT'. As a part of the curation process, CCDC also applies an algorithm, DeCIFer, to help the editors assign chemistry to structures when those representations (e.g. bond types and charge assignments etc.) are missing from the original CIF files submitted. The validated and curated entry is included in the CSD System and WebCSD distributions, with availability restricted to those making appropriate contributions.

Viewing the data

File:XURZAN.jpg

Each data set in CSD can be openly viewed and retrieved using the free [https://www.ccdc.cam.ac.uk/structures Access Structure] service. Through this web-browser based service, users can view the data set in 2D and 3D, obtain some basic information about the structure, and download the deposited data set. More advanced search functions and curated information are available through the subscription based [https://www.ccdc.cam.ac.uk/solutions/csd-system CSD system].

Besides using the [https://www.ccdc.cam.ac.uk/solutions/csd-system/ CSD system], the structure files may be viewed using one of several open source computer programs such as Jmol. Some other free, but not open source programs include MDL Chime, Pymol, UCSF Chimera, Rasmol, WINGX,{{cite journal|last=Farrugia|first=Louis J. | name-list-style = vanc |title=WinGX suite for small-molecule single-crystal crystallography|journal=Journal of Applied Crystallography|date=1 August 1999|volume=32|issue=4|pages=837–838|doi=10.1107/S0021889899006020}} the CCDC provides a free version of its visualization program [https://www.ccdc.cam.ac.uk/Community/csd-community/FreeMercury/ Mercury].

Starting from 2015, Mercury from CCDC also provides the functionality to generate 3D print ready file from structures in CSD.{{Cite web|url=https://www.ccdc.cam.ac.uk/Community/blog/post-56/|title=3D Printing: Easy as 1, 2, 3! | date=August 19, 2015|website=The Cambridge Crystallographic Data Centre (CCDC)|language=en|access-date=2019-05-18}}

See also

References

{{reflist|32em}}