The erratic fall of the plane among aerobatics. Aerobatics with names. Performing the aerobatic maneuver “Dead Loop”

The erratic fall of the plane among aerobatics.  Aerobatics with names.  Performing the aerobatic maneuver “Dead Loop”
The erratic fall of the plane among aerobatics. Aerobatics with names. Performing the aerobatic maneuver “Dead Loop”

Currently, every video lover has the opportunity to create his own home video studio based on his personal computer. It is known that when working with video files, there is a need to process and store very large amounts of information, for example, one minute of a digital video signal with SIF resolution (comparable to VHS) and true color (millions of colors) will take (288 x 358) pixels x 24 bits x 25 frames/s x 60 s = 442 MB, that is, on media used in modern PCs, such as a CD (CD-ROM, about 650 MB) or a hard disk (several tens of gigabytes) to save full-time video recorded It won't work in this format. Using MPEG compression, the volume of video information can be significantly reduced without noticeable image degradation. How MPEG works and what other applications it has will be discussed further.

MPEG

The word MPEG is short for Moving Picture Expert Group, the name of an ISO expert group that works to develop standards for encoding and compressing video and audio data. The official name of the group is ISO/IEC JTC1 SC29 WG11. Often the acronym MPEG is used to refer to the standards developed by this group. The following are currently known:

  • MPEG-1 is designed for recording synchronized video (usually in SIF format, 288 x 358) and audio on CD-ROM, taking into account maximum speed reading about 1.5 Mbit/s. The quality parameters of video data processed by MPEG-1 are in many ways similar to conventional VHS video, so this format is used primarily in areas where it is inconvenient or impractical to use standard analog video media.
  • MPEG-2 is designed for processing video images comparable in quality to television bandwidth data transmission systems range from 3 to 15 Mbit/s, and in professional equipment they use streams at speeds of up to 50 Mbit/s. Many television channels are switching to technologies based on MPEG-2; a signal compressed in accordance with this standard is broadcast via television satellites and is used to archive large volumes of video material.
  • MPEG-3 - intended for use in high-definition television (HDTV) systems with a data rate of 20-40 Mbit/s, but later became part of the MPEG-2 standard and is no longer mentioned separately. By the way, the MP3 format, which is sometimes confused with MPEG-3, is intended only for audio compression and the full name of MP3 is MPEG-Audio Layer-3.
  • MPEG-4 defines the principles for working with the digital representation of media data for three areas: interactive multimedia (including products distributed on optical disks and via the Internet), graphics applications (synthetic content) and digital television.

HOW COMPRESSION OCCURS

The basic encoding object in the MPEG standard is the television image frame. Since in most scenes the background of the image remains fairly stable and the action occurs only in the foreground, compression begins with the creation of the original frame. Initial (Intra) frames are encoded only using intra-frame compression using algorithms similar to those used in JPEG. The frame is divided into blocks of 8x8 pixels. A discrete cosine transform (DCT) is performed on each block, followed by quantization of the resulting coefficients. Due to the high spatial correlation of brightness between adjacent image pixels, DCT leads to a concentration of the signal in the low-frequency part of the spectrum, which, after quantization, is effectively compressed using variable length coding. Predicted frames are processed using forward prediction from previous original or predictable frames. The frame is divided into macroblocks of 16×16 pixels, each macroblock is assigned the most similar section of the image from the reference frame, shifted by the displacement vector. This procedure is called motion analysis and compensation. The permissible degree of compression for predictable frames is 3 times higher than that possible for original frames. Depending on the nature of the video image, Bi-directional Interpolated frames are encoded in one of four ways: forward prediction; inverse prediction with motion compensation - used when new image objects appear in the encoded frame; bidirectional prediction with motion compensation; intra-frame prediction - when there is a sudden change in the scene or when high speed moving image elements. Bidirectional frames are associated with the deepest compression of video data, but since a high compression ratio reduces the accuracy of reconstruction of the original image, bidirectional frames are not used as reference frames. If the DCT coefficients were transmitted accurately, the reconstructed image would completely match the original one. However, errors in the reconstruction of DCT coefficients associated with quantization lead to image distortions. The coarser the quantization is performed, the smaller the volume occupied by the coefficients and the stronger the signal compression, but also the more visual distortion.

Due to the fact that MPEG is developed by such an authoritative organization as ISO and is quite universal method compression (can be used in video recording, television broadcasting, home video editing, multimedia programs (educational, gaming), teleconferencing, creating videos for presentations on the Internet), it has become the dominant standard for digital video compression, eliminating the need to use many incompatible video compression methods that existed before it.

How MPEG video works

The color digital image from the compressed sequence is converted into color space YUV (YCbCr). The Y component represents intensity, and U and V represent chroma. Since the human eye is less sensitive to color than to intensity, the resolution of color components can be reduced by 2 times vertically, or both vertically and horizontally. For animation and high-quality studio video, resolution reduction is not applied to preserve quality, but to household use, where the flows are lower and the equipment is cheaper, such an action does not lead to noticeable losses in visual perception while preserving precious bits of data.


The basic idea of ​​the whole scheme is to predict motion from frame to frame and then apply discrete cosine transform (DCT) to redistribute the redundancy in space. DCT is performed on blocks of 8x8 pixels, motion prediction is performed on the intensity channel (Y) on blocks of 16x16 pixels, or, depending on the characteristics of the original image sequence (interlace, content), on blocks of 16x8 pixels. In other words, a given 16x16 block of pixels in the current frame is searched for in a corresponding larger region in previous or subsequent frames. The DCT coefficients (the original data or the difference between this block and the corresponding one) are quantized, that is, divided by a certain number in order to discard unimportant bits. Many coefficients after such an operation turn out to be zero. The quantization factor can be changed for each “macroblock” (a macroblock is a block of 16x16 points of Y-components and corresponding blocks of 8x8 in the case of a YUV ratio of 4:2:0, 16x8 in the case of 4:2:2 and 16x 16 in the case of 4:4:4. DCT coefficients, quantization parameters, motion vectors, etc. are encoded using fixed tables defined by the standard. The encoded data is added into packets that form a stream according to MPEG syntax.

Frame ratio and types

There are three types of encoded frames. I-frames are frames encoded as still images - without reference to subsequent or previous ones. They are used as starters. P-frames are frames predicted from previous I- or P-frames. Each macroblock in a P-frame can come with a vector and the difference of DCT coefficients from the corresponding block of the last decoded I or P, or can be encoded as in I if no corresponding block is found.

Finally, there are B-frames, which are predicted from the two closest I or P-frames, one previous and one subsequent. Matching blocks are searched in these frames and the best one is selected. The forward vector is searched, then the reverse one, and the average between the corresponding macroblocks in the past and future is calculated. If this does not work, then the block can be encoded as in an I-frame.

The sequence of decoded frames usually looks like
I B B P B B P B B P B B I B B P B B P B ...

There are 12 frames from frame I to frame I. This is based on the random access requirement that the starting point be repeated every 0.4 seconds. The relationship between P and B is based on experience.

For the decoder to work, the first P frame in the stream must be encountered before the first B, so the compressed stream looks like this:
0 x x 3 1 2 6 4 5 ...
where numbers are frame numbers. xx may be nothing if it is the beginning of a sequence, or B-frames -2 and -1 if it is a fragment from the middle of the stream.
The I frame must first be decoded, then the P frame, then, with both in memory, the B frame must be decoded. While the P frame is being decoded, the I frame is shown, the B frames are shown immediately, and the decoded P is shown while the next frame is being decoded.

MPEG audio compression algorithm

Audio compression uses highly developed psychoacoustic models derived from experiments with the most discerning listeners to eliminate sounds that are inaudible to the human ear. This is what is called "masking", for example, a large component at some frequency prevents lower coefficient components at nearby frequencies from being heard, where the relationship between the energies of the frequencies that are being masked is described by some empirical curve. There are similar temporal masking effects, as well as more complex interactions where the temporal effect can highlight frequency or vice versa.

Sound is broken down into spectral blocks using a hybrid circuit that combines sine and bandpass transforms, and a psychoacoustic model described in the language of these blocks. Anything that can be removed or reduced is removed and reduced, and the remainder is sent to the output stream. In reality, things are a little more complicated, since the bits must be distributed between the stripes. And, of course, everything that is sent is encoded with redundancy reduction.

Streams, frequencies and frame sizes

Both MPEG-1 and MPEG-2 can be applied to a wide class of streams, frequencies, and frame sizes. MPEG-1, which is familiar to most people, allows 25 fps at 352x288 resolution in PAL or 30 fps at 352x240 resolution in NTSC at bitrates of less than 1.86 Mbps - a combination known as "Constrained Parameters Bitstreams" " These numbers are introduced by the White Book specification for video on CD (VideoCD).

In fact, the syntax allows you to encode images up to 4095x4095 resolution at up to 100 Mbps. These numbers could be infinite if not for the limitation on the number of bits in the headers.

With the advent of the MPEG-2 specification, the most popular combinations were combined into layers and profiles. The most common ones:

  • Source Input Format (SIF), 352 dots x 240 lines x 30 fps, also known as Low Level (LL), and
  • “CCIR 601” (for example 720 dots/line x 480 lines x 30 fps), or Main Level - main level.

Motion compensation replaces macroblocks with macroblocks from previous pictures
Macroblock predictions are formed from corresponding 16x16 blocks of points (16x8 in MPEG-2) from previous reconstructed frames. There are no restrictions on the position of the macroblock in the previous picture, except for its boundaries.

The source frames - reference - (from which the predictions are formed) are shown regardless of their encoded form. Once the frame is decoded, it becomes not a set of blocks, but a regular flat one digital image from points.

In MPEG, the size of the displayed image and frame rate may differ from that encoded in the stream. For example, before encoding, a subset of frames in the original sequence may be omitted, and then each frame is filtered and processed. When restored, interpolated for restoration original size and frame rate. In fact, the three fundamental phases (original frequency, encoded and displayed) may differ in parameters. MPEG syntax describes the encoded and displayed frequency through headers, and the original frame rate and size are known only to the encoder. That is why MPEG-2 headers include elements that describe the screen size for displaying video.
In an I-frame, macroblocks must be encoded as internal - without reference to previous or subsequent ones, unless scalable modes are used. However, macroblocks in a P-frame can be either internal or refer to previous frames. Macroblocks in a B-frame can be either internal, or refer to the previous frame, the next one, or both. Each macroblock header has an element that determines its type.

Without motion compensation:

With motion compensation:

Missed macroblocks in P-frames:

Missed macroblocks in B-frames:

The sequence of frames can have any arrangement of I, P and B frames. IN industrial practice It is common to have a fixed sequence (like IBBPBBPBBPBBPBB), however, more powerful encoders can optimize the choice of frame type depending on the context and global characteristics of the video sequence.
Each frame type has its own advantages depending on the characteristics of the image (movement activity, temporary masking effects,...).
For example, if the sequence of images changes little from frame to frame, it makes sense to encode more B-frames than P. Since B-frames are not used in further process decoding, they can be compressed more strongly, without affecting the quality of the video as a whole.
The requirements of a particular application also influence the choice of frame type: key frames, channel switching, program indexing, error recovery, etc.
When compressing video, the following statistical characteristics are used:
1. Spatial correlation: 8x8 point discrete cosine transform.

2.Features human vision- immunity to high-frequency components: scalar quantization of DCT coefficients with loss of quality.

3. Large spatial correlation of the overall image: predicting the first low-frequency transform coefficient in an 8x8 block (average of the entire block).

4. Statistics of the appearance of syntactic elements in the most probable coded stream: optimal coding of motion vectors, DCT coefficients, types of macroblocks, etc.

5. Sparse matrix of quantized DCT coefficients: coding of repeated zero elements indicating the end of the block.

6. Spatial masking: degree of quantization of a macroblock.

7. Coding of sections taking into account the content of the scene: the degree of quantization of the macroblock.

8. Adaptation to local image characteristics: block coding, macroblock type, adaptive quantization.

9. Constant step size with adaptive quantization: a new quantization degree is set only special type macroblock and is not transmitted by default.

10. Temporal redundancy: forward and backward motion vectors at the macroblock level of 16x16 points.

11. Perception-aware macroblock prediction error coding: adaptive quantization and transform coefficient quantization.

12. Minor prediction error: No error may be signaled for a macroblock.

13. Macroblock-level fine-coding of prediction error: Each of the blocks within a macroblock can be coded or skipped.

14. Motion vectors - slow movement of an image fragment with complex pattern: motion vector prediction.

15. Appearances and disappearances: forward and backward prediction in B-frames.

16. Interframe prediction accuracy: bilinear interpolated (filtered) block differences. IN real world the movements of objects from frame to frame rarely fall on the boundaries of the points. Interpolation allows you to figure out the actual position of an object, often increasing the compression efficiency by 1 dB.

17. Limited movement activity in P-frames: missed macroblocks. When the motion vector and prediction error are zero. Skipped macroblocks are very desirable in an encoded stream because they do not occupy any bits except in the next macroblock header.

18. Coplanar motion in B-frames: skipped macroblocks. When the motion vector is the same and the prediction error is zero.

Announcement

MPEG Video File Format

The difference between MPEG and MPG formats is very small. MPEG - more new format, which is represented by the following subtypes: MPEG-1 - MPEG-4, MPEG-7 and MPEG-21. Lossy compression makes it easier to download and upload files, as well as reduce file sizes while maintaining high quality. These audio and video container files allow both data streams to be synchronized. A large number of online streaming services use MPEG-1 files to broadcast audio, as well as audio/video signals over cable networks and satellites. The MPEG system became the basis for the creation of MP3 files. Mac and Windows operating systems can handle MPEG-1 and MPEG-2 files using various programs that support such formats.

Technical information about MPEG files

The MPEG-1 format encodes video and associated audio for subsequent storage at a bitrate of 1.5 Mb/s (ISO/IEC 11172) (the format is capable of supporting higher bitrates). This allows you to encode files into formats that are close in quality to CDs or low-quality DVDs. The MPEG-2 format is used for high-definition television broadcasting. MPEG-3 is used as a multi-resolution scalable compression standard for the HDTV standard (this format was later merged with MPEG-2 due to its almost complete lack of differences with the latter). The MPEG-4 format allows for a higher compression ratio (compared to MPEG-2) and also achieves higher quality compression using appropriate techniques. Eventually this standard was used to display computer graphics. The MPEG-7 format is described in the ISO-IEV 15938 standard, and the MPEG-21 format is described in the ISO/IEC 21000 standard. Moreover, such standards define general principles formation of media data, and also provide for copyright.

Introduction

The progenitor of this format is MPEG-1, which was discussed in the previous chapter, without hesitation, can be called truly revolutionary, because nothing like it existed before it. The first video discs and satellite TV broadcasts in MPEG-1 format seemed like a miracle - such quality at such a relatively low bitrate. Compressed digital video had a quality comparable to that of a household VCR and had many advantages over analog media. But time passed, progress in the field of digital technology advanced by leaps and bounds, and the old MPEG-1 needed significant improvements to keep up with the wonders of science and technology. The result was the MPEG-2 format, which is not a revolutionary format, but rather an evolutionary format, resulting from the reworking of MPEG-1 to meet customer needs. And the customers of this format were the largest mass media companies that relied on satellite television and non-linear digital video editing.

Now the MPEG-2 format is associated primarily with DVDs, but in 1992, when work on creating this format began, there were no widely available media on which MPEG-2 compressed video information could be recorded, but most importantly, computer technology of that time could not provide the required bandwidth - from 2 to 9 Mbit per second. But this channel could provide satellite television with the latest equipment at that time. Such high requirements for the channel did not mean at all that the compression ratio of MPEG-2 was lower than that of MPEG-1; on the contrary, it was much higher! But the image resolution and number of frames per second are much higher, since high quality at a reasonable bitrate was the main goal that the customers set for the MPEG committee. It was thanks to MPEG-2 that the emergence of television became possible high resolution- HDTV, in which the image is much clearer than that of conventional television.

A few years after the start of work, in October 1995, the first 20-channel TV broadcast using the MPEG-2 standard was carried out through the space television satellite "Pan Am Sat". The satellite carried out and still broadcasts in Scandinavia, Belgium, the Netherlands, Luxembourg, the Middle East and Africa.
Currently there is a wide expansion of HDTV on Far East- in Japan and China.
MPEG-2 compressed video streams with a bitrate of 9 Mbit per second are used for studio recording and high-quality digital video editing.

With the advent of the first DVD players, with enormous capacity and relatively affordable price, MPEG-2, which was quite naturally chosen as the main video data compression format for its high quality and high degree compression. It is films using MPEG-2 that are still the main argument in favor of DVD.

Let's finish with a retrospective review of MPEG-2 and try to delve into its internals. As already mentioned, MPEG-2 is an evolutionary format, which is why it is appropriate to consider it by comparing it with its famous progenitor MPEG-1, indicating what new was introduced into the original format.

MPEG-2. What's new?

It must be said that the MPEG-2 developers approached the solution to the problem creatively. The brainstorming around finding the possibility of removing extra bits and bytes from an already compressed image (remember, MPEG-1 already existed, now it was necessary to compress it) was started from three sides at once. In addition to improving the compression algorithms for video (one side) and audio (the other), it was found alternative path reducing the size of the final file, previously unused.

As it became known from research by the MPEG committee, over 95% of video data, one way or another, is repeated in different frames, more than once. This data is ballast or, to use the term proposed by the MPEG committee, redundant. Redundant data is removed with virtually no damage to the image; repeated sections are replaced by a single original fragment during playback. To the already known compression and deletion algorithms redundant information, which we encountered in the MPEG-1 format, another one was added, apparently the most effective. After breaking the video stream into frames, this algorithm analyzes the contents of the next frame for duplicate, redundant data. A list is being compiled original plots and a table of repeating sections. The originals are retained, copies are deleted, and the repeat table is used when decoding the compressed video stream. The result of the redundant information removal algorithm is an excellent high-definition image at a low bitrate. Such a size/quality ratio was considered unattainable before the advent of MPEG-2.

But this algorithm also has limitations. For example, repeating fragments must be large enough, otherwise it would be necessary to create an entry in the table of repeating sections for almost every pixel, which would reduce the usefulness of the table to zero, since its size would exceed the size of the frame. And this circumstance also makes this algorithm less effective - the most useful and effective would be to apply this algorithm not to individual frames, but to the entire video as a whole, since the probability of finding repeating sections in a large video section is much higher than in a single frame. And the total size of the tables for all frames is much larger than the possible size of one common table. But, unfortunately, MPEG-2 is a streaming format that was originally intended to be sent over satellite channels or cable networks, so the presence of frames is a prerequisite.

So, we looked at one approach that provided a significant reduction in the size of the encoded file, but if this trick was alone, then the developers would never have achieved such impressive results that we saw in MPEG-2. Of course, they had to work hard on existing algorithms, literally licking them and squeezing out every last byte. Video compression algorithms have undergone very significant modernization.

Changes in video compression algorithms compared to MPEG-1.

The main changes affected quantization algorithms, that is, algorithms for converting continuous data into discrete ones. MPEG-2 uses a nonlinear discrete cosine transform process that is much more efficient than its predecessor. The MPEG-2 format provides users and programmers with significantly greater freedom compared to MPEG-1. So now it has become possible during the encoding process to set the accuracy of the frequency coefficients of the quantization matrix, which directly affects the quality of the image obtained as a result of compression (and the size too). Using MPEG-2, the user can set the following quantization precision values ​​- 8, 9, 10 and 11 bits per element value, which makes this format significantly more flexible compared to MPEG-1, which had only one fixed value - 8 bits per element.

It is also possible to load a separate quantization matrix immediately before each frame, allowing for very high quality images, although it is quite labor-intensive. How can I use a quantization matrix to improve image quality? It's no secret that fast moving areas are traditionally weakness for the MPEG family, while static parts of the image are encoded very well. This leads to the conclusion that it is impossible to encode static areas and areas with movement in the same way. Since image quality depends on the quantization stage, which largely depends on the quantization matrix used, changing these matrices for different sections of the video can improve image quality. Many MPEG-2 codecs do this automatically, but there are programs that also allow you to set quantization matrices manually, for example the AVI2MPG2 transcoder, which can be found on the Internet at: http://members.home.net/beyeler/bbmpeg.html. If the link is dead, use our copy of the file: bbmpg123.zip

Motion prediction algorithms were not spared from innovations. This section has been enriched with new modes: 16x8 MC, field MC and Dual Prime. These algorithms significantly improved the quality of the picture and, importantly, made it possible to make key frames less frequently compared to MPEG-1, thus increasing the number of intermediate frames and increasing the compression ratio. The main size of blocks into which the image is divided can be 8x8 pixels, like MPEG-1, 16x16 and 16x8, which, however, is used only in the 16x8 MS mode.
Due to some features of the implementation of motion prediction algorithms in MPEG-2, there are some restrictions on the size of the picture. Now it has become necessary that the vertical and horizontal resolution of the image be a multiple of 16 in frame-by-frame encoding mode, and 32 vertically in field-encoder mode, where each field consists of two frames. The frame size has increased to 16383*16383.
Two more ratios of color planes and illuminance planes were introduced - 4:4:4 and 4:2:2.

In addition to the above improvements, several new video compression algorithms that have never been used before were introduced into the MPEG-2 format.
The most important of them are algorithms called Scalable Modes, Spatial scalability, Data Partitioning, Signal to Noise Ratio (SNR) Scalability and Temporal Scalability. There is no doubt that these algorithms have made very important contributions to the success of MPEG-2 and deserve further consideration.

Scalable Modes- a set of algorithms that allows you to determine the level of priorities of different layers of a video stream. The video data stream is divided into three layers - base, middle and high. Highest priority on this moment layer (for example, foreground) is encoded at a higher bitrate

Spatial scalability(spatial scaling) - when using this algorithm, the base layer is encoded with a lower resolution. Subsequently, the information obtained as a result of coding is used in algorithms for predicting the movement of higher priority layers.

Data Partitioning(data fragmentation) - this algorithm splits blocks of 64 elements of the quantization matrix into two streams. One data stream, higher priority, consists of low-frequency (most critical to quality) components, the other, correspondingly lower priority, consists of high-frequency components. These streams are subsequently processed differently. That is why in MPEG-2 both dynamic and statistical scenes look quite good, unlike MPEG-1, where dynamic scenes are traditionally terrible.

Signal to Noise Ratio(SNR) Scalability (signal-to-noise ratio scaling) - when this algorithm operates, layers of different priority are encoded with different quality. Low-priority layers are more sampled, coarser, and therefore contain less data, while the high-priority layer contains Additional information, which, when decoded, allows you to restore a high-quality image.

Temporal Scalability(temporal scaling) - after the action of this algorithm, the number of key blocks of information in the low-priority layer is reduced, while the high-priority layer, on the contrary, contains additional information that allows you to restore intermediate frames using information from the lower priority layer for prediction

All these algorithms have much in common: they all work with layers of video data stream, the use of these algorithms allows you to achieve high compression with almost imperceptible image degradation. But there is one more property of these algorithms, perhaps not so pleasant. Using any of them makes the video completely incompatible with the MPEG-1 format. Therefore, these algorithms were not available in every MPEG-2 codec.
As a result, many formats have appeared, of different resolutions, qualities, with different degrees of compression and with different size/quality ratios. In order to restore order and final standardize MPEG-2, the MPEG Committee introduced the concepts of levels and profiles. It is the levels and profiles, as well as their combinations, that make it possible to unambiguously describe almost any format from the MPEG-2 family.

Levels

level name permission maximum bitrate qualitative compliance
Low 352*240*30 4 Mbps CIF, consumer video cassette
Main 720*480*30 15 Mbps CCIR 601, studio TV
High 1440 1440*1152*30 60 Mbps 4x601, home HDTV
High 1920*1080*30 80 Mbps Hi-End video editing equipment

Profiles

Valid combinations of Profiles and Levels

Simple Main Main+ Next
High No No 4:2:2
High 1440 No Main with Spatial Scalability 4:2:2
Main 90% of all Main with SNR Scalability 4:2:2
Low No Main with SNR Scalability No

The most popular standards.

Name Permission

Characteristics of formats, comparison, history of origin and development.

Introduction

The progenitor of this format, MPEG-1, can without hesitation be called truly revolutionary, because nothing like it existed before it. The first video discs and satellite TV broadcasts in MPEG-1 format seemed like a miracle - such quality at such a relatively low bitrate. Compressed digital video had a quality comparable to that of a household VCR and had many advantages over analog media. But time passed, progress in the field digital technologies was moving by leaps and bounds, and now the old man MPEG-1 needed significant improvements to keep up with the wonders of science and technology. The result was the MPEG-2 format, which is not a revolutionary format, but rather an evolutionary format, resulting from the reworking of MPEG-1 to meet customer needs. And the customers of this format were the largest mass media companies that relied on satellite television and non-linear digital video editing.

Now the MPEG-2 format is associated primarily with DVDs, but in 1992, when work on creating this format began, there were no widely available media on which MPEG-2 compressed video information could be recorded, but most importantly, computer technology of that time could not provide the required bandwidth - from 2 to 9 Mbit per second. But this channel could provide satellite television with the latest equipment at that time. Such high requirements for the channel did not mean at all that the compression ratio of MPEG-2 was lower than that of MPEG-1; on the contrary, it was much higher! But the image resolution and number of frames per second are much higher, since high quality at a reasonable bitrate was the main goal that the customers set for the MPEG committee. It was thanks to MPEG-2 that the emergence of high-definition television - HDTV - was made possible, in which the image is much clearer than that of conventional television.

A few years after the start of work, in October 1995, the first 20-channel TV broadcast using the MPEG-2 standard was carried out through the space television satellite "Pan Am Sat". The satellite carried out and still broadcasts in Scandinavia, Belgium, the Netherlands, Luxembourg, the Middle East and Africa.
Currently, there is a wide expansion of HDTV in the Far East - in Japan and China.
MPEG-2 compressed video streams with a bitrate of 9 Mbit per second are used for studio recording and high-quality digital video editing.

With the advent of the first DVD players, which had enormous capacity and a relatively affordable price, MPEG-2 was quite naturally chosen as the main video data compression format for its high quality and high compression ratio. It is films using MPEG-2 that are still the main argument in favor of DVD.

Let's finish with a retrospective review of MPEG-2 and try to delve into its internals. As already mentioned, MPEG-2 is an evolutionary format, which is why it is appropriate to consider it by comparing it with its famous progenitor MPEG-1, indicating what new was introduced into the original format.

MPEG-2. What's new?

It must be said that the MPEG-2 developers approached the solution to the problem creatively. The brainstorming around finding the possibility of removing extra bits and bytes from an already compressed image (remember, MPEG-1 already existed, now it was necessary to compress it) was started from three sides at once. In addition to improving the compression algorithms for video (one side) and audio (the other), an alternative way was found to reduce the size of the final file, which had not previously been used.

As it became known from research by the MPEG committee, over 95% of video data, one way or another, is repeated in different frames, more than once. This data is ballast or, to use the term proposed by the MPEG committee, redundant. Redundant data is removed with virtually no damage to the image; repeated sections are replaced by a single original fragment during playback. To the already known algorithms for compression and removal of redundant information that we encountered in the MPEG-1 format, another one, apparently the most effective, has been added. After breaking the video stream into frames, this algorithm analyzes the contents of the next frame for duplicate, redundant data.

A list of original sections and a table of repeating sections are compiled. The originals are retained, copies are deleted, and the repeat table is used when decoding the compressed video stream. The result of the redundant information removal algorithm is an excellent high-definition image at a low bitrate. Such a size/quality ratio was considered unattainable before the advent of MPEG-2.

But this algorithm also has limitations. For example, repeating fragments must be large enough, otherwise it would be necessary to create an entry in the table of repeating sections for almost every pixel, which would reduce the usefulness of the table to zero, since its size would exceed the size of the frame. And this circumstance also makes this algorithm less effective - the most useful and effective would be to apply this algorithm not to individual frames, but to the entire video as a whole, since the probability of finding repeating sections in a large video section is much higher than in a single frame. And the total size of the tables for all frames is much larger than the possible size of one common table. But, unfortunately, MPEG-2 is a streaming format that was originally intended to be sent over satellite channels or via cable networks, so the presence of frames is a prerequisite.

So, we looked at one approach that provided a significant reduction in the size of the encoded file, but if this trick was alone, then the developers would never have achieved such impressive results that we saw in MPEG-2. Of course, they had to work hard on existing algorithms, literally licking them and squeezing out every last byte. Video compression algorithms have undergone very significant modernization.

Changes in video compression algorithms compared to MPEG-1.

The main changes affected quantization algorithms, that is, algorithms for converting continuous data into discrete ones. MPEG-2 uses a nonlinear discrete cosine transform process that is much more efficient than its predecessor. The MPEG-2 format provides users and programmers with significantly greater freedom compared to MPEG-1. So now it has become possible during the encoding process to set the accuracy of the frequency coefficients of the quantization matrix, which directly affects the quality of the image obtained as a result of compression (and the size too). Using MPEG-2, the user can set the following quantization precision values ​​- 8, 9, 10 and 11 bits per element value, which makes this format significantly more flexible compared to MPEG-1, which had only one fixed value - 8 bits per element.

It is also possible to load a separate quantization matrix immediately before each frame, which allows for very high image quality, although this is quite labor-intensive. How can I use a quantization matrix to improve image quality? It's no secret that fast moving areas are traditionally a weak point for the MPEG family, while static areas of the image are encoded very well. This leads to the conclusion that it is impossible to encode static areas and areas with movement in the same way. Since image quality depends on the quantization stage, which largely depends on the quantization matrix used, changing these matrices for different sections of the video can improve image quality. Many MPEG-2 codecs do this automatically, but there are programs that also allow you to set quantization matrices manually, for example the AVI2MPG2 transcoder, which can be found on the Internet at: http://members.home.net/beyeler/bbmpeg.html.

Motion prediction algorithms were not spared from innovations. This section has been enriched with new modes: 16x8 MC, field MC and Dual Prime. These algorithms significantly improved the quality of the picture and, importantly, made it possible to make key frames less frequently compared to MPEG-1, thus increasing the number of intermediate frames and increasing the compression ratio. The main size of blocks into which the image is divided can be 8x8 pixels, like MPEG-1, 16x16 and 16x8, which, however, is used only in the 16x8 MS mode.

Due to some features of the implementation of motion prediction algorithms in MPEG-2, there are some restrictions on the size of the picture. Now it has become necessary that the vertical and horizontal resolution of the image be a multiple of 16 in frame-by-frame encoding mode, and 32 vertically in field-encoder mode, where each field consists of two frames. The frame size has increased to 16383*16383.

Two more ratios of color planes and illuminance planes were introduced - 4:4:4 and 4:2:2.

In addition to the above improvements, several new video compression algorithms that have never been used before were introduced into the MPEG-2 format.

The most important of them are algorithms called Scalable Modes, Spatial scalability, Data Partitioning, Signal to Noise Ratio (SNR) Scalability and Temporal Scalability. There is no doubt that these algorithms have made very important contributions to the success of MPEG-2 and deserve further consideration.

Scalable Modes- a set of algorithms that allows you to determine the priority level of different layers of a video stream. The video data stream is divided into three layers - base, middle and high. The currently highest priority layer (for example, the foreground) is encoded at a higher bitrate

Spatial scalability(spatial scaling) - when using this algorithm, the base layer is encoded with a lower resolution. Subsequently, the information obtained as a result of coding is used in algorithms for predicting the movement of higher priority layers.

Data Partitioning(data fragmentation) - this algorithm splits blocks of 64 elements of the quantization matrix into two streams. One data stream, higher priority, consists of low-frequency (most critical to quality) components, the other, correspondingly lower priority, consists of high-frequency components. These streams are subsequently processed differently. That is why in MPEG-2 both dynamic and statistical scenes look quite good, unlike MPEG-1, where dynamic scenes are traditionally terrible.

Signal to Noise Ratio(SNR) Scalability (signal-to-noise ratio scaling) - with this algorithm, layers of different priority are encoded with different quality. Low-priority layers are more sampled, coarser, and therefore contain less data, and a high-priority layer contains additional information, which, when decoded, allows you to restore a high-quality image.

Temporal Scalability(temporal scaling) - after the action of this algorithm, the number of key blocks of information in the low-priority layer is reduced, while the high-priority layer, on the contrary, contains additional information that allows you to restore intermediate frames using information from the lower priority layer for prediction.

All these algorithms have much in common: they all work with layers of video data stream, the use of these algorithms allows you to achieve high compression with almost imperceptible image degradation. But there is one more property of these algorithms, perhaps not so pleasant. Using any of them makes the video completely incompatible with the MPEG-1 format. Therefore, these algorithms were not available in every MPEG-2 codec.
As a result, many formats have appeared, with different resolutions, qualities, with different degrees of compression and with different size/quality ratios. In order to restore order and final standardize MPEG-2, the MPEG Committee introduced the concepts of levels and profiles. It is the levels and profiles, as well as their combinations, that make it possible to unambiguously describe almost any format from the MPEG-2 family.

Levels

Level name

permission

maximum bitrate

qualitative compliance

CIF, consumer video cassette

CCIR 601, studio TV

4x601, home HDTV

Hi-End video editing equipment

Profiles

Valid combinations of Profiles and Levels

Main with Spatial Scalability

90% of all

Main with SNR Scalability

Main with SNR Scalability

Most popular standards

Name

Permission