Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

MIT EECS: 6.003 Signal Processing lecture notes (Spring 2019), Study notes of World Music

Gulf University for Science and Technology (GUST)World Music

Why is the DFT undesireable in this case, given our goal of compression? Page 6. Discrete Cosine Transform. It is much more common to use the ...

Typology: Study notes

2021/2022

Uploaded on 08/01/2022

fioh_ji 🇰🇼

4.5

(70)

814 documents

1 / 22

This page cannot be seen from the preview

Don't miss anything!

6.003: Signal Processing

Fourier-Based Audio Compression

•Review of Lossy Compression, Discrete Cosine Transform (DCT)

•Brief Introduction to MDCT

•Additional Considerations for Audio Encoding

2 May 2019

Discover Study notes of World Music Gulf University for Science and Technology (GUST)

Partial preview of the text

Download MIT EECS: 6.003 Signal Processing lecture notes (Spring 2019) and more Study notes World Music in PDF only on Docsity!

6.003: Signal Processing

Fourier-Based Audio Compression

Review of Lossy Compression, Discrete Cosine Transform (DCT)
Brief Introduction to MDCT
Additional Considerations for Audio Encoding

2 May 2019

Today: Lossy Compression

As opposed to “lossless” compression (LZW, Huffman, zip, gzip, xzip, ...), “lossy” compression achieves a decrease in file size by throwing away information from the original signal.

Goal: convey the “important” parts of the signal using as few bits as possible.

Lossy Compression: High-level View

To Encode:

Split signal into “frames”
Transform each frame into Fourier representation
Throw away (or attentuate) some coefficients
Additional lossless compression (LZW, RLE, Huffman, etc.)

To Decode:

Undo lossless compression
Transform each frame into time/spatial representation

This is pretty standard! Both JPEG and MP3, for example, work roughly this way.

Given this, one goal is to get the “important” information in a signal into relatively few coefficients in FD (“energy compaction”).

Energy Compaction

One goal is to get the “important” information in a signal into relatively few coefficients in FD (“energy compaction”).

It turns out the DFT has some problems in this regard. Consider the following signal, broken into 8-sample-long frames:

original signal

n 0 8 sample “frame”

n 0 Why is the DFT undesireable in this case, given our goal of compression?

DCT: Relationship to DFT

XC [ k ] = N^1

N ∑ − 1

n =

x [ n ] cos

π N

n +^12

2 N

N ∑ − 1

n =

x [ n ]

ej^ Nπ ( n +1 / 2) k^

e − j^ Nπ ( n +1 / 2) k )

2 N

e − j^ Nπ^12 k N ∑^ −^1 n =

x [ n ]

ej^ Nπ ( n +1) k^

e − j^ Nπ nk )

2 N e

− j (^) Nπ^12 k

( N − 1

n =

x [ n ] e − j^

2 π 2 N (− n −1) k^ +

N ∑ − 1

n =

x [ n ] e − j^

2 π 2 N nk

2 N e

− j (^) Nπ^12 k

N ∑ − 1

n =− N

x ˜[ n ] e − j^ 22 Nπ nk

e − j^ Nπ^12 k^ )^ ˜ X [ k ]

where x ˜[·] is given by the following, and the DFT coefficients X ˜[·] are computed with an analysis window of length 2 N :

x ˜[ n ] = ˜ x [ n + 2 N ] =

{ (^) x [ n ] if 0 ≤ n < N x [− n − 1] if − N < n < 0

Discrete Cosine Transform

The DCT is commonly used in compression applications.

We can think about computing the DCT by first putting a mirrored copy of a windowed signal next to itself, and then computing the DFT of that new signal (shifted by 1/2 sample):

8 sample “frame”

n 0 16-sample shifted, mirrored frame

n 0

Why is the DCT more appropriate, given our goals? How does this approach fix the issue(s) we saw with the DFT?

Energy Compaction Example: Ramp

For many authentic signals (photographs, etc), the DCT has good “energy compaction”: most of the energy in the signal is represented by relatively few coefficients.

Consider DFT vs DCT of a “ramp:”

(^00 2 4 6) n 8 10 12 14 2 46

(^108) 1214

x [ n ]

Energy Compaction Example: Ramp

For many authentic signals (photographs, etc), the DCT has good “energy compaction”: most of the energy in the signal is represented by relatively few coefficients.

Consider DFT vs DCT of a “ramp:”

(^00 2 4 6) n 8 10 12 14 24

1012

x [ n ]

(^0 2 4 6) k 8 10 12 14 0

| X [ k ]|

(^0 2 4 6) k 8 10 12 14 0

| XC [ k ]|

Audio Compression

That didn’t sound very good, really... :(

What were the most noticeable artifacts in the reconstructed version? Where did they come from? How did this compare to what we saw with JPEG?

Audio Compression v

Let’s try a different approach:

Rather than zeroing out coefficients below the threshold, let’s quan- tize them differently (for example, use 8 bits for each sample below the threshold and 16 bits for each value above the threshold).

How does this compare? What artifacts remain? How can we explain them?

MDCT

x [ n ]

window

MDCT

reconstructed

window

MDCT

reconstructed

0 100 200 300 400 500

sum2.

MDCT

Formally, the MDCT is defined by:

XM [ k ] = 1 2 N

2 N ∑ − 1

n =

x [ n ] cos

π N

n +^1 2

+ N

k +^1 2

y [ n ] =

N ∑ − 1

k =

XM [ k ] cos

π N

n +

2 +^

N

k +

Including a window function on both x [·] and y [·] can avoid disconti- nuities at the endpoints. Similar to DCT in terms of energy com- paction, but avoids issues with discontinuities on frame boundaries.

What Else is There?

We have been able to achieve decent compression rates, but nothing close to MP3, for example. MP3 can ahieve around a 6:1 compres- sion ratio before expert listeners are able to distinguish between compressed and original audio.

This approach is actually somewhat similar to MP3, but we’re not quite there, so what are we missing?

Psychoacoustic Modeling

Importantly, our goal is ultimately to throw away information that is perceptually unimportant. To this end, MP3 includes a model of human perception of audio, including:

Critical bands: neighborhood of frequencies that excite the same nerve cells (∼25 distinct bands of varying bandwidth)
Threshold of hearing: how loud must a signal be in order to hear it?
Frequency masking: a loud component at a particular frequency “masks” nearby fre- quencies
Temporal masking: when two tones are close together in time, one can mask the other.

MIT EECS: 6.003 Signal Processing lecture notes (Spring 2019), Study notes of World Music

Related documents

Partial preview of the text

Download MIT EECS: 6.003 Signal Processing lecture notes (Spring 2019) and more Study notes World Music in PDF only on Docsity!

6.003: Signal Processing

Fourier-Based Audio Compression

Today: Lossy Compression

Lossy Compression: High-level View

Energy Compaction

DCT: Relationship to DFT

N ∑ − 1

2 N

N ∑ − 1

2 N

( N − 1

N ∑ − 1

N ∑ − 1

Discrete Cosine Transform

Energy Compaction Example: Ramp

Energy Compaction Example: Ramp

Audio Compression

Audio Compression v

MDCT

MDCT

2 N ∑ − 1

+ N

N ∑ − 1

2 +^

N

What Else is There?

Psychoacoustic Modeling