CSC361/661 -- Digital Media

Spring 2002

Burg/Wong

Digital Audio

 

 

I.  Compression Methods and Standards

 

There are two issues to consider:  compression methods and file types.  File types are generally identified to the user by the suffix on the file name, like .au, .midi, or .mp3.  A given file type might be compressible in more than one format.  For example, a .au file can be compressed with m-law encoding or ADPCM (see below).

 

The file header indicates the file type and compression method so that the decoder knows how to decode it.

 

Linear vs. Logarithmic Encoding

In linear encoding, the unit of measurement of the represented sound is constant from one sample to the next.  In logarithmic encoding, on the other hand, the unit gets larger as the sample value increases. 

 

Logarithmic encoding has the advantage of representing a greater range of sound levels with fewer total bits.  However, it has more noise than linear encoding. 

 

m-law (mu-law) and A-law encoding

m-law (mu-law) is used in North America and Japan, and A-law encoding is used in the rest of the world.  ITU Recommendation G.711 defines the standard for m-law encoding.  m-law encoding is widely used in telecommunications (e.g., for voice transmission over telephones). m-law (mu-law) and A-law encoding can be applied to a number of file types, including Sun and NeXT’s .au files and Microsoft’s .wav files.  Typically, m-law compressed speech is carried in 8 bit samples.  m-law (mu-law) and A-law encoding use a non-linear companding technique.  That means it carries more detailed information about the low amplitude signals than about the high amplitude signals.  Compression rate 2:1.

 

ADPCM

The Adaptive Differential Pulse Code Modulation (ADPCM) compression technique quantizes the difference between the sound signal and a prediction that has been made of the sound signal (as opposed to quantizing the sound signal directly).  If the predictions are generally accurate, then the difference between the real and predicted speech samples will have a lower variance than the real speech samples, and the information can be accurately quantized with fewer bits than would be needed to quantize the original speech samples. 

 

The performance of this compression technique is aided by using adaptive prediction and quantization, so that the predictor and difference quantizer adapt to the changing characteristics of the speech being coded.  That is, fewer bits are used to represent the samples when the values are smaller. 

 

ADPCM can be applied to .wav files.  Generally, you start with 16 bit samples and ADPCM compresses them to 4 bits, for a 4:1 compression ratio.

 

ADPCM is standardized as CCITT G721.  (CCITT stands for Consultive Committee on International Telephone and Telegraph)

 

MACE – Apple has a proprietary version called ACE/MACE.  MACE stands for Macintosh Audio Compression Environment.  MACE allows for 3:1 to 6:1 compression of .aiff files.  Can be used with the sound portion of QuickTime movies.  Supports only 8 bit sound data.

 

IMA/ADPCM – IMA stands for Interactive Multimedia Association.  It offers reasonably fast encoding and decoding, and it degrades the sample quality only slightly.  The standard IMA compression ratio is 4:1 for 16 bit sound.  (8 bit is not supported.)  Apple Computer has integrated IMA 4:1 audio compression into both QuickTime and SoundManager, and Microsoft has integrated it into Video for Windows.

 

MPEG – MPEG stands for both a file type and a compression method.  Actually, it is a whole class of file types/compression methods.  Here’s a description borrowed from http://www-cs-students.stanford.edu/~franke/SoundApp/formats.html#mpeg

MPEG Audio: (.mp, .mp2, .mp3, .m1a, .m2a, .mpg, .mpeg, .swa) MPEG stands for the "Moving Picture Experts Group."  MPEG audio files can be either layer I, II or III. Increasing layer numbers add complexity to the format and require more effort to encode and decode. However, they also provide higher playback quality for the sample bit rate. To further complicate matters, MPEG files come in two flavors, MPEG-1 and MPEG-2. The encodings for the three layers are mostly the same; however, MPEG-2 streams have lower sampling rates for better fidelity at lower bit rates. Files can have sampling rates of 32000, 44100 and 48000 Hz for MPEG-1 and 16000, 22050 and 24000 Hz for MPEG-2. MPEG data can be in stereo or mono and decompresses to 16-bit resolution. MPEG compression is a lossy algorithm based on perceptual encodings, which can achieve high rates of compression without a noticeable decrease in quality. Typical compression rates are around 10-to-1. Macromedia's Shockwave streaming audio system uses a layer III encoding with a non-standard header. These Shockwave audio files frequently have a ".swa" suffix.

 

MPEG files have a data rate of

about 1.5 Mbits/sec for audio and video

about 1.2 Mbits/sec for video alone

about 0.3 Mbits/sec for audio alone

(compared to uncompressed CD audio, 44,100 samples/sec > 1.5 Mbits/sec)

 

The compression factor for MPEG ranges from 2.7:1 to 24:1.

 

MPEG audio supports sampling frequencies of 32, 44.1, and 48 kHz

 


 

File Type

Acronym For

Originally Created By

Type of Compression

Platforms

Other Information

.aiff,

.aif,

.aifc

 

 

 

 

Audio Interchange File Format

Apple, adopted later by Silicon Graphics

.aiff generally not compressed;

.aifc allows MACE 3:1 or 6:1 or ADPCM/IMA 4:1 compression

Mac and SGI and now also on Windows

flexible format, allowing arbitrary sample rates, size, # channels

.aiff

 

 

 

 

 

.wav

 

 

 

 

 

IBM and Microsoft

(Microsoft’s is referred to as RIFF WAVE)

supports different compression formats; supports IMA/ ADPCM 4:1 compression for 16 bit sound

cross-platform; widely supported in many browsers; no need for plug-in

good sound quality; arbitrary sampling rate and sample size; up to 16-bit, 44.1 KHz stereo

.au and .snd

 

 

 

 

Also called mu-law or Sun mu-law format

Sun and NeXT

mu-law encoding serves to compress the file at a ratio of 2:1; slow decompression

 

good sound quality; arbitrary sampling rates and multi-channel sounds

.ra or .rm

 

 

 

 

Real Audio

 

can use SONY’s ATRAC3 codec (Adaptive Transform Acoustic Coding)

 

very high degree of compression; files can be streamed; sound quality poorer than .mp3

.mp3

Moving Pictures Experts Group

Moving Pictures Experts Group

DCT with Huffman.  Lossy.  See class notes

cross-platform

MPEG-1 layer 2; sound quality can rival CD; small files, but larger than Real Audio

.swa

 

 

 

 

Shockwave

Macromedia

Uses layer III MPEG

encoding

cross- platform

can be streamed

.asf

advanced streaming format

Windows

 

 

can be encrypted with DRM (Digital Rights Management)