Group of pictures

In video coding, a group of pictures, or GOP structure, specifies the order in which intra- and inter-frames are arranged. The GOP is a collection of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs, from which the visible frames are generated. Encountering a new GOP in a compressed video stream means that the decoder doesn't need any previous frames in order to decode the next ones, and allows fast seeking through the video.

Elements

A GOP can contain the following picture types:

I frame (intra coded picture, also by some sources incorrectly said to always be key frame,{{Cite web|url=https://blog.video.ibm.com/streaming-video-tips/keyframes-interframe-video-compression/#keyframe|title=Keyframes, InterFrame & Video Compression|date=13 April 2021}} but you cannot always start with I frame and decode next frames cleanly) – a picture that is coded independently of all other pictures, each I frame can be decoded fully on its own. Each GOP begins (in decoding order) with this type of frame.
IDR frame (Instantaneous Decoder Refresh): I frame with a marking indicating that no subsequent P or B frames have references reaching further back than this I frame. Through the use of these IDR frames, closed GOPs are formed that can’t refer to frames outside the GOP.{{Cite web |last=McCarrel |first=Jarrod |date=2022-05-04 |title=What is "Group Of Pictures" and Why is it Important? |url=https://www.veneratech.com/understanding-gop-what-is-group-of-pictures-and-why-is-it-important/ |access-date=2024-06-22 |website=Venera Technologies |language=en-US}} IDRs are the true keyframes together with clean random access frames (recovery points), CLA.
P frame (predictive coded picture) – contains motion-compensated difference information relative to previously decoded pictures. In older designs such as MPEG-1, H.262/MPEG-2 and H.263, each P frame can only reference one picture, and that picture must precede the P frame in display order as well as in decoding order, and the reference must be an I or P frame. These constraints do not apply in the newer standards H.264/MPEG-4 AVC and HEVC.
B frame (bipredictive coded picture) – contains motion-compensated difference information relative to previously decoded pictures. In older designs such as MPEG-1 and H.262/MPEG-2, each B frame can only reference two frames, the one which precedes the B frame in display order and the one which follows, and all referenced pictures must be I or P frames. These constraints do not apply in newer standards H.264/MPEG-4 AVC and HEVC. Sometimes, a codec will use unidirectional B-frames. This is a P-frame that, while it does not use data from a future frame, no other frames depend on it. A fundamental property of B-frames is that they can be dropped without affecting the correct decoding of other frames.
D frame (DC direct coded picture) – serves as a fast-access representation of a frame for loss robustness or fast-forward. D frames are only used in MPEG-1 video.

An I frame indicates the beginning of a GOP. Afterwards, several P and B frames follow. In older designs, the allowed ordering and referencing structure is relatively constrained.{{Cite web|url=http://www.cs.cf.ac.uk/Dave/Multimedia/node258.html|title=B-Frames}}

The I frames contain the full image and do not require any additional information to reconstruct them. Typically, encoders use GOP structures that cause each I frame to be a "clean random access point," such that decoding can start cleanly on an I frame and any errors within the GOP structure are corrected after processing a correct I frame.

In the newer designs found in H.264/MPEG-4 AVC and HEVC, encoders have much more flexibility about referencing structures. They can use the same referencing structures as were previously used in older designs, or they can use more pictures as references and they can use more flexible ordering of the coding order relative to the display order. They are also allowed to use B frames as references when coding other (B or P) frames. This extra flexibility can improve compression efficiency, but it can cause propagation of errors if some data becomes lost or corrupted. One popular structure for use with the newer designs is the use of a hierarchy of B frames. Hierarchical B frames can provide very good compression efficiency and can also limit the propagation of errors, since the hierarchy can ensure that the number of pictures affected by any data corruption problem is strictly limited.{{cite web |title=Hierarchical B-Frames or B-Pyramid - Video Compression |url=https://www.ramugedia.com/hierarchical-b-frames-or-b-pyramid |website=www.ramugedia.com}}

Generally, the more I frames the video stream has, the more editable it is. However, having more I frames substantially increases bit rate needed to code the video.

Structure

The GOP structure is often referred by two numbers, for example, {{math|1=M=3, N=12}}. The first number tells the distance between two anchor frames (I or P), also known as the length of a "mini-GOP".{{cite web |last1=Vijayanagar |first1=Krishna Rao |title=Closed GOP and Open GOP - Simplified Explanation - OTTVerse |url=https://ottverse.com/closed-gop-open-gop-idr/ |website=ottverse.com |date=17 December 2020}} The second one tells the distance between two full images (I-frames): it is the GOP size.{{Cite web|url=https://help.apple.com/compressor/mac/4.0/en/compressor/usermanual/#chapter=18%26section=5|title=Compressor 4 User Manual}} Instead of the M parameter, the maximal count of B-frames between two consecutive anchor frames can be used; this is the approach used by ffmpeg.{{cite web |title=FFmpeg Codecs Documentation |url=https://ffmpeg.org/ffmpeg-codecs.html |website=ffmpeg.org |quote=bf integer (encoding,video) Set max number of B frames between non-B-frames.}}

Examples:

For {{math|1=M=3, N=12}}, the GOP structure is {{mono|IBBPBBPBBPBB}}. There are 2 B-frames between two consecutive anchor frames.
For the sequence {{mono|IBBBBPBBBBPBBBB}}, GOP size {{math|1=N=15}}, anchor-distance {{math|1=M=5}}. There are 4 B-frames between two consecutive anchor frames.

The GOP structure does not need to stay fixed throughout encoding. Varying {{mvar|N}} to insert an I-frame on scene change is a well-known technique.{{cite journal |title=Adaptive Intra-Frame Assignment and Bit-Rate Estimation for Variable GOP Length in H.264 |journal=IEEE Transactions on Circuits and Systems for Video Technology |date=October 2006 |volume=16 |issue=10 |pages=1271–1279 |doi=10.1109/TCSVT.2006.881856 |url=https://www.researchgate.net/publication/3308980 |last1=Jeehong Lee |last2=Ilhong Shin |last3=Hyunwook Park }} Newer techniques also vary {{mvar|M}} based on the amount of motion in the video.{{cite web |title=Docs/Appendix-Adaptive-Prediction-Structure.md · master · Alliance for Open Media / SVT-AV1 · GitLab |url=https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Docs/Appendix-Adaptive-Prediction-Structure.md |website=GitLab |language=en |date=23 August 2023}}

Additional concepts

With H.264 and later designs which allow highly flexible reference structures, a B frame in one GOP is able to reference a frame in a different GOP, in particular even before the I frame, which makes I frame non-IDR (not a keyframe).{{Cite web |date=2024-07-01 |title=Broken frames due to H.264 Open-GOP (DVB MPEG-TS) ? |url=https://avidemux.org/smif/index.php/topic,18247.0.html?PHPSESSID=93ada9353fecc73c9d64e84432e462c5 |access-date=2024-07-01 |website=Avidemux Forum |language=en-US}} A GOP that contains any such outward-referencing frame is known as an "open GOP". The opposite is a self-contained GOP, known as a "closed GOP". In presentation order GOP can begin with a B-frame, but it cannot end with one. Open GOP starts with a B-frame and it is a little more efficient because starting with an I-frame means that an extra P-frame must be added to the end (a GOP cannot end with a B-frame).{{Cite web |title=MPEG and H.264 compression |url=https://www.andrew.cmu.edu/user/lshea/2.Tech_PDFs/Mpeg_and-h264_compression.pdf |access-date=2024-07-02}}

References

Category:MPEG

Category:Video compression

Group of pictures

Elements

Structure

Additional concepts

See also

References