Perceptual coding
![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Perceptual coding is a method of lossy data compression that exploits the limitations of human sensory system in order to reduce data size. It is widely applied in audio, image and video compression standards, where certain details are removed or simplified because they are unlikely to be noticed under regular listening or viewing conditions.
Perceptual coding is based on models of human hearing and vision, such as those studied in Psychoacoustics and Psychovisual research. By discarding or reducing components of a signal that fall below perceptual thresholds, it achieves significant reductions in bit rate while maintaining a subjectively acceptable quality.
Applications
[edit]Perceptual coding is central to many everyday technologies, including:
- Audio compression: Formats such as MP3, AAC, and Opus apply psychoacoustic models to remove inaudible frequencies or sounds masked by louder ones.
- Image compression: Standards such as JPEG rely on psychovisual principles, for example by using chroma subsampling (reducing color resolution relative to brightness).
- Video compression: Standards such as MPEG-2, H.264/AVC, and HEVC are using similar compression methods as image compression, and adding other principles such as temporal masking (reducing detail during rapid motion).
History
[edit]Early analog applications
[edit]The principles of perceptual coding were applied in analog communication systems long before the advent of digital media.
- Telephony: Early telephone networks restricted audio transmission to a narrow band of roughly 300 Hz–3.4 kHz. Although much of the audible spectrum was discarded, this range was sufficient for intelligible speech, exploiting the fact that human listeners rely primarily on mid-range frequencies for communication[1].
- FM stereo broadcasting: Introduced in the 1960s, FM stereo used a sum-and-difference transmission system. The (L+R) signal carried the main content at full fidelity, while the (L−R) difference was modulated onto a subcarrier. This reduced bandwidth usage by assuming that much of the information in stereo channels is shared, while still providing an adequate sense of spatial separation[2].
- Color television: Beginning in the 1950s, color TV systems such as NTSC, PAL, and SECAM took advantage of the human eye’s greater sensitivity to brightness than to color. They encoded a high-resolution luminance channel alongside lower-resolution chrominance channels, allowing backward compatibility with black-and-white sets and conserving broadcast bandwidth[3].
- Fax transmission: Facsimile (fax) machines, particularly with the ITU-T Group 3 standard (1980), employed compression methods such as run-length encoding to reduce data. Standard fax resolutions (e.g., 200 × 100 dpi) were chosen to preserve legibility of text while omitting fine details like paper texture or ink gradients, relying on the psychovisual observation that readers perceive documents as intact even when such subtleties are lost[4].[5]
These analog systems demonstrated the effectiveness of tailoring transmission to the characteristics of human perception, laying the groundwork for digital perceptual coding methods.
Digital development
[edit]Research in the 1970s and 1980s on psychoacoustics and psychovisual modeling provided the basis for digital perceptual coding. In audio, this led to the late-1980s development of the MPEG audio formats (such as MP3), which achieved major reductions in bit rate by discarding inaudible sound components. At the same time, the MPEG standards applied similar principles to video, using techniques such as chroma subsampling and motion-adaptive coding.
During the 1990s and 2000s, perceptual coding was embedded in widely used formats including AAC, MPEG-2 video, and H.264/AVC, supporting the rise of digital media distribution on CDs, DVDs, and early internet platforms. More recent codecs, such as HEVC, AV1, and Opus, continue to refine perceptual models to balance compression efficiency with quality on modern networks and devices.
Relation to other fields
[edit]Perceptual coding draws directly on psychoacoustics (the study of auditory perception) and psychovisual research (the study of visual perception). These disciplines provide the models that determine which parts of a signal can be safely removed without affecting perceived quality.
See also
[edit]References
[edit]- ^ Mathialagan, A. (1984-10-01). "Automatic Telephony: Components". IETE Journal of Education. doi:10.1080/09747338.1984.11436022. ISSN 0974-7338.
- ^ Sterling, Christopher H. (1971-06-01). "Decade of Development: FM Radio in the 1960s". Journalism Quarterly. 48 (2): 222–230. doi:10.1177/107769907104800204. ISSN 0022-5533.
- ^ Sterne, Jonathan; Mulvin, Dylan (2014-08-01). "The Low Acuity for Blue: Perceptual Technics and American Color Television". Journal of Visual Culture. 13 (2): 118–138. doi:10.1177/1470412914529110. ISSN 1470-4129.
- ^ McCullough, T. L. (1980-05-19). "CCITT standardization for digital facsimile". Proceedings of the May 19-22, 1980, national computer conference. AFIPS '80. New York, NY, USA: Association for Computing Machinery: 409–413. doi:10.1145/1500518.1500582. ISBN 978-1-4503-7923-6.
- ^ Mitchell, Joan L. (1980-05-19). "Facsimile image coding". Proceedings of the May 19-22, 1980, national computer conference. AFIPS '80. New York, NY, USA: Association for Computing Machinery: 423–426. doi:10.1145/1500518.1500584. ISBN 978-1-4503-7923-6.