Introduction to video compression
Video compression algorithms ("codecs") manipulate video signals to dramatically reduce the storage and bandwidth required while maximizing perceived video quality. Understanding the operation of video codecs is essential for developers of embedded systems, processors, and tools targeting video applications. For example,understanding video codecs? processing and memory demands is key to processor selection and software optimization.
In this article, we explore the operation and characteristics of video codecs. We explain basic video compression algorithms, including still-image compression, motion estimation, artifact reduction, and color conversion. We discuss the demands codecs make on processors and the consequences of these demands.
Still-Image Compression
Video clips are made up of sequences of individual images, or "frames." Therefore, video compression algorithms share many concepts and techniques with still image compression algorithms, such as JPEG. In fact, one way to compress video is to ignore the similarities between consecutive video frames, and simply compress each frame independently of other frames. For example, some products employ this approach to compress video streams using the JPEG still-image compression standard. This approach, known as "motion JPEG" or MJPEG, is sometimes used in video production applications. Although modern video compression algorithms go beyond still-image compression schemes and take advantage of the correlation between consecutive video frames using motion estimation and motion compensation, these more advanced algorithms also employ techniques used in still-image compression algorithms. Therefore, we begin our exploration of video compression by discussing the inner workings of transform-based still-image compression algorithms such as JPEG.
Basic Building Blocks of Digital Image Compression
Block Transform
The image compression techniques used in JPEG and in most video compression algorithms are "lossy." That is, the original uncompressed image can?t be perfectly reconstructed from the compressed data, so some information from the original image is lost. Lossy compression algorithms attempt to ensure that the differences between the original uncompressed image and the reconstructed image are not perceptible to the human eye.
The first step in JPEG and similar image compression algorithms is to divide the image into small blocks and transform each block into a frequency-domain representation. Typically, this step uses a discrete cosine transform (DCT) on blocks that are eight pixels wide by eight pixels high. Thus, the DCT operates on 64 input pixels and yields 64 frequency-domain coefficients. The DCT itself preserves all of the information in the eight-by-eight image block. That is, an inverse DCT (IDCT) can be used to perfectly reconstruct the original 64 pixels from the DCT coefficients. However, the human eye is more sensitive to the information contained in DCT coefficients that represent low frequencies (corresponding to large features in the image) than to the information contained in DCT coefficients that represent high frequencies (corresponding to small features). Therefore, the DCT helps separate the more perceptually significant information from less perceptually significant information. Later steps in the compression algorithm encode the low-frequency DCT coefficients with high precision, but use fewer or no bits to encode the high-frequency coefficients, thus discarding information that is less perceptually significant. In the decoding algorithm, an IDCT transforms the imperfectly coded coefficients back into an 8x8 block of pixels.
The computations performed in the IDCT are nearly identical to those performed in the DCT, so these two functions have very similar processing requirements. A single two-dimensional eight-by-eight DCT or IDCT requires a few hundred instruction cycles on a typical DSP. However, video compression algorithms must often perform a vast number of DCTs and/or IDCTs per second. For example, an MPEG-4 video decoder operating at CIF (352x288) resolution and a frame rate of 30 fps may need to perform as many as 71,280 IDCTs per second, depending on the video content. The IDCT function would require over 40 MHz on a Texas Instruments TMS320C55x DSP processor (without the DCT accelerator) under these conditions. IDCT computation can take up as much as 30% of the cycles spent in a video decoder implementation.
Because the DCT and IDCT operate on small image blocks, the memory requirements of these functions are rather small and are typically negligible compared to the size of frame buffers and other data in image and video compression applications. The high computational demand and small memory footprint of the DCT and IDCT functions make them ideal candidates for implementation using dedicated hardware coprocessors.
Download your Full Reports for Video Compression Techniques
A Note About Color
Color images are typically represented using several "color planes." For example, an RGB color image contains a red color plane, a green color plane, and a blue color plane. Each plane contains an entire image in a single color (red, green, or blue, respectively). When overlaid and mixed, the three planes make up the full color image. To compress a color image, the still-image compression techniques described here are applied to each color plane in turn.
Video applications often use a color scheme in which the color planes do not correspond to specific colors. Instead, one color plane contains luminance information (the overall brightness of each pixel in the color image) and two more color planes contain color (chrominance) information that when combined with luminance can be used to derive the specific levels of the red, green, and blue components of each image pixel.
Such a color scheme is convenient because the human eye is more sensitive to luminance than to color, so the chrominance planes are often stored and encoded at a lower image resolution than the luminance information. Specifically, video compression algorithms typically encode the chrominance planes with half the horizontal resolution and half the vertical resolution as the luminance plane. Thus, for every 16-pixel by 16- pixel region in the luminance plane, each chrominance plane contains one eight-pixel by eight-pixel block. In typical video compression algorithms, a "macro block" is a 16-pixel by 16-pixel region in the video frame that contains four eight-by-eight luminance blocks and the two corresponding eight-by-eight chrominance blocks. Macro blocks allow motion estimation and compensation, described below, to be used in conjunction with sub-sampling of the chrominance planes as described above.
Adding Motion To The Mix
Using the techniques described above, still-image compression algorithms such as JPEG can achieve good image quality at a compression ratio of about 10:1. The most advanced still-image coders may achieve good image quality at compression ratios as high as 30:1. Video compression algorithms, however, employ motion estimation and compensation to take advantage of the similarities between consecutive video frames. This allows video compression algorithms to achieve good video quality at compression ratios up to 200:1.
In some video scenes, such as a news program, little motion occurs. In this case, the majority of the eight-pixel by eight-pixel blocks in each video frame are identical or nearly identical to the corresponding blocks in the previous frame. A compression algorithm can take advantage of this fact by computing the difference between the two frames, and using the still-image compression techniques described above to encode this difference. Because the difference is small for most of the image blocks, it can be encoded with many fewer bits than would be required to encode each frame independently. If the camera pans or large objects in the scene move, however, then each block no longer corresponds to the same block in the previous frame. Instead, each block is similar to an eight-pixel by eight-pixel region in the previous frame that is offset from the block?s location by a distance that corresponds to the motion in the image. Note that each video frame is typically composed of a luminance plane and two chrominance planes as described above. Obviously, the motion in each of the three planes is the same. To take advantage of this fact despite the different resolutions of the luminance and chrominance planes, motion is analyzed in terms of macro blocks rather than working with individual eight-by-eight blocks in each of the three planes.
Motion Compensation
In the video decoder, motion compensation uses the motion vectors encoded in the video bit stream to predict the pixels in each macro block. If the horizontal and vertical components of the motion vector are both integer values, then the predicted macro block is simply a copy of the 16-pixel by 16-pixel region of the reference frame. If either component of the motion vector has a non-integer value, interpolation is used to estimate the image at non-integer pixel locations. Next, the prediction error is decoded and added to the predicted macro block in order to reconstruct the actual macro block pixels.
Compared to motion estimation, motion compensation is much less computationally demanding. While motion estimation must perform SAD or SSD computation on a number of 16-pixel by 16-pixel regions per macro block, motion compensation simply copies or interpolates one such region. Because of this important difference, video decoding is much less computationally demanding than encoding. Nevertheless, motion compensation can still take up as much as 40% of the processor cycles in a video decoder, although this number varies greatly depending on the content of a video sequence, the video compression standard, and the decoder implementation. For example, the motion compensation workload can comprise as little as 5% of the processor cycles spent in the decoder for a frame that makes little use of interpolation.
Like motion estimation, motion compensation requires the video decoder to keep one or two reference frames in memory, often requiring external memory chips for this purpose. However, motion compensation makes fewer accesses to reference frame buffers than does motion estimation. Therefore, memory bandwidth requirements are less stringent for motion compensation compared to motion estimation, although high memory bandwidth is still desirable for best processor performance in motion compensation functions.
Download your Full Reports for Video Compression Techniques