CATH database
{{Infobox biodatabase
|title = CATH
|logo = 200px
|description =Protein Structure Classification
|scope =
|organism =
|center =University College London
|laboratory = Institute of Structural and Molecular Biology
|author =
|released = 1997
|standard =
|format =
|url = {{URL|cathdb.info}}
|download = {{URL|cathdb.info/download}}
|webservice =
|sql =
|sparql =
|webapp =
|standalone =
|license =
|versioning =
|frequency = CATH-B is released daily. Official releases are approximately annual.
|curation =
|bookmark =
|version = 4.3
}}
{{Redirect|CATH|other uses|Cath (disambiguation){{!}}Cath}}
The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones, and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.{{cite web|url=http://www.cathdb.info |title=CATH: Protein Structure Classification Database at UCL |website=Cathdb.info |access-date=2017-03-09}}{{cite web|url=http://www.cathdb.info/wiki/doku/?id=tutorials:index |title=CATH |website=Cathdb.info |access-date=2017-03-09}}{{cite web|url=https://twitter.com/CATHDatabase |title=CATH Database (@CATHDatabase) |publisher=Twitter |access-date=2017-03-09}}{{cite journal | vauthors = Pearl FM, Bennett CF, Bray JE, Harrison AP, Martin N, Shepherd A, Sillitoe I, Thornton J, Orengo CA | display-authors = 6 | title = The CATH database: an extended protein family resource for structural and functional genomics | journal = Nucleic Acids Research | volume = 31 | issue = 1 | pages = 452–455 | date = January 2003 | pmid = 12520050 | pmc = 165509 | doi = 10.1093/nar/gkg062 }}
Hierarchical organization
Experimentally determined protein three-dimensional structures are obtained from the Protein Data Bank and split into their consecutive polypeptide chains, where applicable. Protein domains are identified within these chains using a mixture of automatic methods and manual curation.{{Cite web |title=CATH |url=http://cathdb.info/wiki/doku/?id=faq |access-date=2024-09-14 |website=cathdb.info |language=en}}
The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that the domains are related by evolution i.e. they are homologous.
class="wikitable"
|+The four main levels of the CATH hierarchy: !# !Level !Description | ||
1 | Class | the overall secondary-structure content of the domain. (Equivalent to the SCOP Class) |
2 | Architecture | high structural similarity but no evidence of homology. |
3 | Topology/fold | a large-scale grouping of topologies which share particular structural features (Equivalent to the 'fold' level in SCOP) |
4 | Homologous superfamily | indicative of a demonstrable evolutionary relationship. (Equivalent to SCOP superfamily) |
Additional sequence data for domains with no experimentally determined structures are provided by CATH's sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and make homologous superfamily assignments.
Releases
The CATH team releases new data both as daily snapshots, and official releases approximately annually. The latest release of CATH-Gene3D (v4.3) was released in December 2020 and consists of:{{Cite web |title=CATH |url=http://cathdb.info/wiki/doku/?id=release_notes |access-date=2024-09-14 |website=cathdb.info}}
- 500,238 structural protein domain entries
- 151 mln non-structural protein domain entries
- 5,481 homologous superfamily entries
- 212,872 functional family entries
Open-source software
CATH is an open source software project, with developers developing and maintaining a number of open-source tools,{{cite web|url=http://www.cathdb.info/wiki/doku/?id=cath_tools|title=Tools|website=cathdb.info|access-date=2016-12-18}} which are available publicly on GitHub.{{Citation |title=UCLOrengoGroup/cath-tools |date=2024-09-09 |url=https://github.com/UCLOrengoGroup/cath-tools |access-date=2024-09-14 |publisher=UCLOrengoGroup}}
References
{{Reflist}}
{{Use dmy dates|date=April 2017}}
Category:Protein structure databases
Category:Protein classification