Digital Audio Compression - Multimedia Computing - Lecture Slides, Slides of Multimedia Applications

Multimedia Computing, In this short course we study the basic concept of the principle of computer architecture. In these lecture slides the key points cover in these slides are:Digital Audio Compression, Speech Compression, General Audio Compression, Psychoacoustics, Equal-Loudness Relations, Threshold of Hearing, Frequency Masking, Critical Bands, Human Hearing Range, Temporal Masking

Typology: Slides

2012/2013

Uploaded on 04/23/2013

sarasvatir
sarasvatir ๐Ÿ‡ฎ๐Ÿ‡ณ

4.5

(28)

86 documents

1 / 33

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Digital Audio Compression
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21

Partial preview of the text

Download Digital Audio Compression - Multimedia Computing - Lecture Slides and more Slides Multimedia Applications in PDF only on Docsity!

Digital Audio Compression

Speech Compression

  • Compression of voice data
    • We have previously mentioned several methods that are used to compress voice data - mu-law and A-law companding - ADPCM and delta modulation
    • These are examples of methods which work in the time domain (as opposed to the frequency domain) - Often they are not even considered compression methods

General Audio Compression

  • If we want to compress general audio (not

just speech), different techniques are

needed

  • In particular, music compression is a more general form of audio compression
  • We make use of psychoacoustical

modeling

  • Enable perceptual encoding based upon an analysis of the ear and brain perceive sound
  • Perceptual encoding exploits audio elements that the human ear cannot hear well

Psychoacoustics

  • If you have been listening to very loud music,

you may have trouble afterwards hearing soft

sounds (that normally you could hear)

  • Temporal masking
  • A loud sound at one frequency (a lead guitar)

may drown out a sound at another frequency

(the singer)

  • Frequency masking

Equal-Loudness Relations

Threshold of Hearing

  • The following image is a plot of the threshold

of human hearing for pure tones โ€“ at loudness

below the curve, we donโ€™t hear a tone

Frequency masking

  • We can determine how a pure tone at a

particular frequency affects our ability to hear

tones at nearby frequencies

  • Then, if a signal can be decomposed into

frequencies, for those frequencies that are

only partially masked, only the audible part

will be used to set the quantization noise

thresholds

Critical Bands

  • Human hearing range divides into critical

bands

  • Human auditory system cannot resolve sounds better than within about one critical band when other sounds are present
  • Critical bandwidth represents the earโ€™s resolving power for simultaneous tones
  • At lower frequencies the bands are narrower than at higher frequencies
  • The band is the section of the inner ear which responds to a particular frequency

Critical Bands

  • Generally, the audio frequency range for

hearing (20 Hz โ€“ 20 kHz) can be

partitioned into about 24 critical bands

(25 are typically used for coding

applications

  • The previous slide does not show several of the highest frequency critical bands
  • The critical band at the highest audible frequency is over 4000 Hz wide
  • The ear is not very discriminating within a critical band

Temporal Masking

  • A loud tone causes the hearing receptors in

the inner ear to become saturated, and they

require time to recover

  • This leads to the temporal masking effect
  • After the loud tone we cannot immediately hear another tone โ€“ post-masking - The length of the masking depends on the duration of the masking tone
  • A masking tone can also block sounds played just before โ€“ pre-masking (shorter time)

MPEG Audio Compression

  • MPEG (Motion Picture Experts Group) is a

family of standards for compression of both

audio and video data

  • MPEG-1 (1991) CD quality audio
  • MPEG-2 (1994) Multi-channel surround sound
  • MPEG-4 (1998) Also includes MIDI, speech, etc.
  • MPEG-7 (2003) Not compression โ€“ searching
  • MPEG-21 (2004) Not compression โ€“ digital rights management

MPEG Audio Compression

  • MPEG-1 defined three downward

compatible layers of audio compression

  • Each layer offers more complexity in the psychoacoustic model used and hence better compression
  • Increased complexity leads to increased delay
  • Compatibility achieved by shared file header information
  • Layer 1 โ€“ used for Digital Audio Tape
  • Layer 2 โ€“ proposed for digital audio broadcasting
  • Layer 3 โ€“ music (MPEG-1 layer 3 == mp3)

MPEG Audio Compression

  • PCM input filtered into 32 bands
  • PCM FFT transformed for PA model
  • Windows of samples (384, 576, 1152) coded

at a time

MPEG Audio Compression

  • Since the sub-bands overlap, aliasing may

occur

  • This is overcome by the use of a quadrature mirror filter bank - Attenuation slopes of adjacent bands are mirror images