Protein Data Bank (file format)

{{Short description|File format for protein data}}

{{redirect|.pdb|the Microsoft file format|Program database|the new file format called PDBx/mmCIF|mmCIF}}

{{Infobox file format

| name = PDB

| extension = {{mono|.pdb}}, {{mono|.ent}}, {{mono|.brk}}

| mime = chemical/x-pdb

| owner =

| developer = Protein Data Bank

| creatorcode =

| genre = chemical file format

| container for = Molecule 3D structure, Protein tertiary structure

| contained by =

| extended from =

| extended to = mmCIF

}}The Protein Data Bank (PDB) file format is a textual file format describing the three-dimensional structures of molecules held in the Protein Data Bank, now succeeded by the mmCIF format. The PDB format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, secondary structure assignments, as well as atomic connectivity. In addition experimental metadata are stored. The PDB format is the legacy file format for the Protein Data Bank which has kept data on biological macromolecules in the newer PDBx/mmCIF file format since 2014.{{Cite journal |last=Berman |first=Helen M. |last2=Kleywegt |first2=Gerard J. |last3=Nakamura |first3=Haruki |last4=Markley |first4=John L. |date=2014-10-01 |title=The Protein Data Bank archive as an open data resource |url=https://doi.org/10.1007/s10822-014-9770-y |journal=Journal of Computer-Aided Molecular Design |language=en |volume=28 |issue=10 |pages=1009–1014 |doi=10.1007/s10822-014-9770-y |issn=1573-4951 |pmc=4196035 |pmid=25062767}}

History

The PDB file format was invented in 1972{{cite web |title=wwPDB: File Format |url=https://www.wwpdb.org/documentation/file-format |website=www.wwpdb.org |language=en}}{{cite web |title=PROTEIN DATABASE FILE RECORD FORMATS |url=https://cdn.rcsb.org/wwpdb/docs/documentation/file-format/PDB_format_1972.pdf |access-date=9 June 2024}} as a human-readable file that would allow researchers to exchange the atomic coordinates in a given protein through a database system. Its fixed-column width format is limited to 80 or 140{{cite web |title=PROTEIN DATABASE FILE RECORD FORMATS |url=https://cdn.rcsb.org/wwpdb/docs/documentation/file-format/PDB_format_1972.pdf |access-date=9 June 2024}} columns, which was based on the width of the computer punch cards that were previously used to exchange the coordinates.{{cite journal |author=Berman |first=Helen M |year=2008 |title=The Protein Data Bank: a historical perspective. |url=https://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=18156675 |journal=Acta Crystallographica |volume=64 |issue=Pt 1 |pages=88–95 |doi=10.1107/S0108767307035623 |issn=2053-2733 |pmid=18156675 |doi-access=free}} Through the years the file format has undergone many changes and revisions. The final update to the PDB file format was in November 2012 with version 3.30.{{Cite web |last= |title=wwPDB: File Formats and the PDB |url=https://www.wwpdb.org/documentation/file-formats-and-the-pdb |access-date=2024-01-15 |website=Protein Data Bank |language=en}}

Example

A typical PDB file describing a protein consists of hundreds to thousands of lines like the following (taken from a file describing the structure of a synthetic [http://www.rcsb.org/pdb/files/1mbs.pdb collagen-like peptide]):

HEADER EXTRACELLULAR MATRIX 22-JAN-98 1A3I

TITLE X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE

TITLE 2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)

...

EXPDTA X-RAY DIFFRACTION

AUTHOR R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO,L.MAZZARELLA,

AUTHOR 2 B.BRODSKY,A.ZAGARI,H.M.BERMAN

...

REMARK 350 BIOMOLECULE: 1

REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C

REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000

REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000

...

SEQRES 1 A 9 PRO PRO GLY PRO PRO GLY PRO PRO GLY

SEQRES 1 B 6 PRO PRO GLY PRO PRO GLY

SEQRES 1 C 6 PRO PRO GLY PRO PRO GLY

...

ATOM 1 N PRO A 1 8.316 21.206 21.530 1.00 17.44 N

ATOM 2 CA PRO A 1 7.608 20.729 20.336 1.00 17.44 C

ATOM 3 C PRO A 1 8.487 20.707 19.092 1.00 17.44 C

ATOM 4 O PRO A 1 9.466 21.457 19.005 1.00 17.44 O

ATOM 5 CB PRO A 1 6.460 21.723 20.211 1.00 22.26 C

...

HETATM 130 C ACY 401 3.682 22.541 11.236 1.00 21.19 C

HETATM 131 O ACY 401 2.807 23.097 10.553 1.00 21.19 O

HETATM 132 OXT ACY 401 4.306 23.101 12.291 1.00 21.19 O

...

;HEADER, TITLE and AUTHOR records : provide information about the researchers who defined the structure; numerous other types of records are available to provide other types of information.

;REMARK records : can contain free-form annotation, but they also accommodate standardized information; for example, the REMARK 350 BIOMT records describe how to compute the coordinates of the experimentally observed multimer from those of the explicitly specified ones of a single repeating unit.

;SEQRES records : give the sequences of the three peptide chains (named A, B and C), which are very short in this example but usually span multiple lines.

;ATOM records : describe the coordinates of the atoms that are part of the protein. For example, the first ATOM line above describes the alpha-N atom of the first residue of peptide chain A, which is a proline residue; the first three floating point numbers are its x, y and z coordinates and are in units of Ångströms.{{cite web |url=http://www.wwpdb.org/documentation/format33/sect9.html |title=wwPDB Format version 3.3: Coordinate Section |access-date=2012-03-23 |archive-url=https://web.archive.org/web/20120228025220/http://www.wwpdb.org/documentation/format33/sect9.html |archive-date=2012-02-28 |url-status=dead }} The next three columns are the occupancy, temperature factor, and the element name, respectively.

;HETATM records : describe coordinates of hetero-atoms, that is those atoms which are not part of the protein molecule.

Molecular visualization software capable of displaying PDB files

{{Column list|1=

}}

3d Animation software capable of displaying PDB files

{{Column list|1=

  • [http://www.blender.org Blender (with appropriate extension installed)]
  • [http://www.sidefx.com Houdini]
  • [https://threejs.org/docs/index.html?q=pd#examples/en/loaders/PDBLoader three.js]

}}

See also

References

{{reflist}}