CSC361/661 -- Digital Media
Spring 2002
Burg/Wong
Digital Audio
I. Compression
Methods and Standards
There are two issues to consider: compression methods and file types. File types are generally identified to the user by the suffix on
the file name, like .au, .midi, or .mp3.
A given file type might be compressible in more than one format. For example, a .au file can be compressed
with m-law
encoding or ADPCM (see below).
The file header indicates the file type and compression method so that the decoder knows how to decode it.
Linear vs. Logarithmic Encoding
In linear encoding, the unit of measurement of the represented sound is constant from one sample to the next. In logarithmic encoding, on the other hand, the unit gets larger as the sample value increases.
Logarithmic encoding has the advantage of representing a greater range of sound levels with fewer total bits. However, it has more noise than linear encoding.
m-law (mu-law) and A-law encoding
m-law (mu-law) is used in North America and Japan, and A-law encoding is used in the rest of the world. ITU Recommendation G.711 defines the standard for m-law encoding. m-law encoding is widely used in telecommunications (e.g., for voice transmission over telephones). m-law (mu-law) and A-law encoding can be applied to a number of file types, including Sun and NeXT’s .au files and Microsoft’s .wav files. Typically, m-law compressed speech is carried in 8 bit samples. m-law (mu-law) and A-law encoding use a non-linear companding technique. That means it carries more detailed information about the low amplitude signals than about the high amplitude signals. Compression rate 2:1.
ADPCM
The Adaptive Differential Pulse Code Modulation (ADPCM) compression technique quantizes the difference between the sound signal and a prediction that has been made of the sound signal (as opposed to quantizing the sound signal directly). If the predictions are generally accurate, then the difference between the real and predicted speech samples will have a lower variance than the real speech samples, and the information can be accurately quantized with fewer bits than would be needed to quantize the original speech samples.
The performance of this compression technique is aided by using adaptive prediction and quantization, so that the predictor and difference quantizer adapt to the changing characteristics of the speech being coded. That is, fewer bits are used to represent the samples when the values are smaller.
ADPCM can be applied to .wav files. Generally, you start with 16 bit samples and ADPCM compresses them to 4 bits, for a 4:1 compression ratio.
ADPCM is standardized as CCITT G721. (CCITT stands for Consultive Committee on International Telephone and Telegraph)
MACE – Apple has a proprietary version called ACE/MACE. MACE stands for Macintosh Audio Compression Environment. MACE allows for 3:1 to 6:1 compression of .aiff files. Can be used with the sound portion of QuickTime movies. Supports only 8 bit sound data.
IMA/ADPCM – IMA stands for Interactive Multimedia Association. It offers reasonably fast encoding and decoding, and it degrades the sample quality only slightly. The standard IMA compression ratio is 4:1 for 16 bit sound. (8 bit is not supported.) Apple Computer has integrated IMA 4:1 audio compression into both QuickTime and SoundManager, and Microsoft has integrated it into Video for Windows.
MPEG – MPEG stands for both a file type and a compression method. Actually, it is a whole class of file types/compression methods. Here’s a description borrowed from http://www-cs-students.stanford.edu/~franke/SoundApp/formats.html#mpeg
MPEG Audio: (.mp, .mp2, .mp3, .m1a, .m2a, .mpg, .mpeg, .swa) MPEG stands for the "Moving Picture Experts Group." MPEG audio files can be either layer I, II or III. Increasing layer numbers add complexity to the format and require more effort to encode and decode. However, they also provide higher playback quality for the sample bit rate. To further complicate matters, MPEG files come in two flavors, MPEG-1 and MPEG-2. The encodings for the three layers are mostly the same; however, MPEG-2 streams have lower sampling rates for better fidelity at lower bit rates. Files can have sampling rates of 32000, 44100 and 48000 Hz for MPEG-1 and 16000, 22050 and 24000 Hz for MPEG-2. MPEG data can be in stereo or mono and decompresses to 16-bit resolution. MPEG compression is a lossy algorithm based on perceptual encodings, which can achieve high rates of compression without a noticeable decrease in quality. Typical compression rates are around 10-to-1. Macromedia's Shockwave streaming audio system uses a layer III encoding with a non-standard header. These Shockwave audio files frequently have a ".swa" suffix.
MPEG files have a data rate of
about 1.5 Mbits/sec for audio and video
about 1.2 Mbits/sec for video alone
about 0.3 Mbits/sec for audio alone
(compared to uncompressed CD audio, 44,100 samples/sec > 1.5 Mbits/sec)
The compression factor for MPEG ranges from 2.7:1 to 24:1.
MPEG audio supports sampling frequencies of 32, 44.1, and 48 kHz
|
File Type |
Acronym For |
Originally Created By |
Type of Compression |
Platforms |
Other Information |
|
.aiff, .aif, .aifc |
Audio Interchange File Format |
Apple, adopted later by Silicon Graphics |
.aiff generally not compressed; .aifc allows MACE 3:1 or 6:1 or ADPCM/IMA 4:1 compression |
Mac and SGI and now also on Windows |
flexible format, allowing arbitrary sample rates, size, # channels |
|
.aiff |
|
|
|
|
|
|
.wav |
|
IBM and Microsoft (Microsoft’s is referred to as RIFF WAVE) |
supports different compression formats; supports IMA/ ADPCM 4:1 compression for 16 bit sound |
cross-platform; widely supported in many browsers; no need for plug-in |
good sound quality; arbitrary sampling rate and sample size; up to 16-bit, 44.1 KHz stereo |
|
.au and .snd |
Also called mu-law or Sun mu-law format |
Sun and NeXT |
mu-law encoding serves to compress the file at a ratio of 2:1; slow decompression |
|
good sound quality; arbitrary sampling rates and multi-channel sounds |
|
.ra or .rm |
Real Audio |
|
can use SONY’s ATRAC3 codec (Adaptive Transform Acoustic Coding) |
|
very high degree of compression; files can be streamed; sound quality poorer than .mp3 |
|
.mp3 |
Moving Pictures Experts Group |
Moving Pictures Experts Group |
DCT with Huffman. Lossy. See class notes |
cross-platform |
MPEG-1 layer 2; sound quality can rival CD; small files, but larger than Real Audio |
|
.swa |
Shockwave |
Macromedia |
Uses layer III MPEG encoding |
cross- platform |
can be streamed |
|
.asf |
advanced streaming format |
Windows |
|
|
can be encrypted with DRM (Digital Rights Management) |