Media Compression
Video compression refers to reducing the quantity of data used to represent video images and is a straightforward combination of image compression and motion compensation. This article deals with its applications: compressed video can effectively reduce the bandwidth required to transmit digital video via terrestrial broadcast, via cable, or via satellite services.
Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are implemented in computer software as audio codecs. Generic data compression algorithms perform poorly with audio data, seldom reducing file sizes much below 87% of the original, and are not designed for use in real time. Consequently, specific audio "lossless" and "lossy" algorithms have been created. Lossy algorithms provide far greater compression ratios and are used in mainstream consumer audio devices.
Image compression is the application of Data compression on digital images. In effect, the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form.

Practical Needs For Image And Video Compression:
Needless to say, visual information is of vital importance if human beings are to perceive, recognize,
and understand the surrounding world. With the tremendous progress that has been made in
advanced technologies, particularly in very large scale integrated (VLSI) circuits, and increasingly
powerful computers and computations, it is becoming more than ever possible for video to be
widely utilized in our daily lives. Examples include videophony, videoconferencing, high definition
TV (HDTV), and the digital video disk (DVD), to name a few.
Video as a sequence of video frames, however, involves a huge amount of data. Let us take a look at an illustrative example. Assume the present switch telephone network (PSTN) modem can operate at a maximum bit rate of 56,600 bits per second. Assume each video frame has a resolution of 288 by 352 (288 lines and 352 pixels per line), which is comparable with that of a normal TV picture and is referred to as common intermediate format (CIF). Each of the three primary colors RGB (red, green, blue) is represented for 1 pixel with 8 bits, as usual, and the frame rate in transmission is 30 frames per second to provide a continuous motion video. The required bit rate, then, is 288 x 352 x 8 x 3 x 30 = 72,990,720 bps. Therefore, the ratio between the required bit rate and the largest possible bit rate is about 1289. This implies that we have to compress the video data by at least 1289 times in order to accomplish the transmission described in this example. Note that an audio signal has not yet been accounted for yet in this illustration.
With increasingly complex video services such as 3-D movies and 3-D games, and high video quality such as HDTV, advanced image and video data compression is necessary. It becomes an enabling technology to bridge the gap between the required huge amount of video data and the limited hardware capability.
Removing Redundancy for Essential Media (Image/Video) Compression:
- Statistical Redundancy: Statistical redundancy can be classified into two types: interpixel redundancy and coding redundancy.
By interpixel redundancy we mean that pixels of an image frame and pixels of a group of
successive image or video frames are not statistically independent. On the contrary, they are
correlated to various degrees. This type
of interpixel correlation is referred to as interpixel redundancy. Interpixel redundancy can be divided
into two categories, spatial redundancy and temporal redundancy. By coding redundancy we mean
the statistical redundancy associated with coding techniques.
- Spatial Redundancy: Spatial redundancy represents the statistical correlation between pixels within an image frame.
Hence it is also called intraframe redundancy. Spatial redundancy implies that the intensity value of a pixel can be
guessed
from that of its
neighboring pixels. In other words, it is not necessary to represent each pixel in an image frame independently. Instead, one can predict a pixel from its neighbors. - Temporal Redundancy: Temporal redundancy is concerned with the statistical correlation between pixels from successive frames in a temporal image or video sequence. Therefore, it is also called interframe redundancy.
- Coding Redundancy: The coding redundancy is different. It has nothing to do with information redundancy but with the representation of information, i.e., coding itself. The idea behind it is, instead of natural binary code, where each symbol is encoded with a fixed-length code word, exploit nonuniform probabilities of symbols (nonuniform histogram) and use a variable-length code. Two common methods: Huffman coding and LZW coding.
- Spatial Redundancy: Spatial redundancy represents the statistical correlation between pixels within an image frame.
Hence it is also called intraframe redundancy. Spatial redundancy implies that the intensity value of a pixel can be
guessed
from that of its
- Psychovisual
Redundancy:
While interpixel redundancy inherently rests in image and video data, psychovisual redundancy
originates from the characteristics of the human visual system (HVS).
It is known that the HVS perceives the outside world in a rather complicated way. Its response to visual stimuli is not a linear function of the strength of some physical attributes of the stimuli, such as intensity and color. HVS perception is different from camera sensing. In the HVS, visual information is not perceived equally; some information may be more important than other information. This implies that if we apply fewer data to represent less important visual information, perception will not be affected. In this sense, we see that some visual information is psychovisually redundant. Eliminating this type of psychovisual redundancy leads to data compression.- Luminance Masking: Luminance masking concerns the brightness perception of the HVS.
- Texture Masking: Texture masking is sometimes also called
detail dependence
(Connor et al., 1972), spatial masking (Netravali and Presada, 1977; Lim, 1990), or
activity masking (Mitchell et al., 1997). It states that the discrimination threshold increases with increasing picture detail. That is, the stronger the texture, the larger the discrimination threshold. - Frequency Masking: While the above two characteristics are picture dependent in nature, frequency masking is picture independent. It states that the discrimination threshold increases with frequency increase. It is also referred to as frequency dependence .
- Temporal Masking: Temporal masking is another picture-independent feature of the HVS. It states that it takes a while for the HVS to adapt itself to the scene when the scene changes abruptly. During this transition the HVS is not sensitive to details. The masking takes place both before and after the abrupt change. It is called forward temporal masking if it happens after the scene change. Otherwise, it is referred to backward temporal masking (Mitchell et al., 1997).
- Color Masking: A color, as a sensation of visible light, is an energy with an intensity as well as
a set of wavelengths associated with the electromagnetic spectrum. Obviously, intensity is an
attribute of visible light. The composition of wavelengths is another attribute: chrominance. There
are two elements in the chrominance attribute:
hue
and
saturation
. The hue of a color is characterized
by the dominant wavelength in the composition. Saturation is a measure of the purity of a color.
A pure color has a saturation of 100%, whereas white light has a saturation of 0.
RGB model — The red-green-blue (RGB) primary color system is the best known of several color systems. This is due to the following feature of the human perception of color. The colorsensitive area in the HVS consists of three different sets of cones and each set is sensitive to the light of one of the three primary colors: red, green, and blue.
HSI model — In this model, I stands for the intensity component, H for the hue component, and S for saturation. One merit of this color system is that the intensity component is decoupled from the chromatic components.
YUV model — In this model, Y denotes the luminance component, and U and V are the two chrominance components. The luminance Y can be determined from the RGB model via the following relation:
Y = 0.299 R + 0.587G + 0.114 B
It is noted that the three weights associated with the three primary colors, R, G, and B, are not the same. Their different magnitudes reflect different responses of the HVS to different primary colors. Instead of being directly related to hue and saturation, the other two chrominance components, U and V, are defined as color differences as follows.
U = 0.492(B - Y)
V = 0.877(R - Y)
In this way, the YUV model lowers computational complexity. It has been used in PAL (Phase Alternating Line) TV systems. Note that PAL is an analog composite color TV standard and is used in most European countries, some Asian countries, and Australia.
YIQ model — This color space has been utilized in NTSC (National Television Systems Committee) TV systems for years. Note that NTSC is an analog composite color TV standard and is used in North America and Japan. The Y component is still the luminance. The two chrominance components are the linear transformation of the U and V components defined in the YUV model. Specifically,
I = -0.545U + 0.839V
Q = 0.839U + 0.545V
YDbDr model — The YDbDr model is used in the SECAM (Sequential Couleur a Memoire) TV system. Note that SECAM is used in France, Russia, and some eastern European countries. The relationship between YDbDr and RGB appears below.
Db = 3.059U
Dr = -2.169V
YCbCr model — From the above, we can see that the U and V chrominance components are differences between the gamma-corrected color B and the luminance Y, and the gamma-corrected R and the luminance Y, respectively. The chrominance component pairs I and Q, and Db and Dr are both linear transforms of U and V. Hence they are very closely related to each other. It is noted that U and V may be negative as well. In order to make chrominance components nonnegative, the Y, U, and V are scaled and shifted to produce the YCbCr model, which is used in the international coding standards JPEG and MPEG.
