














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Why is the DFT undesireable in this case, given our goal of compression? Page 6. Discrete Cosine Transform. It is much more common to use the ...
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















2 May 2019
As opposed to “lossless” compression (LZW, Huffman, zip, gzip, xzip, ...), “lossy” compression achieves a decrease in file size by throwing away information from the original signal.
Goal: convey the “important” parts of the signal using as few bits as possible.
To Encode:
To Decode:
This is pretty standard! Both JPEG and MP3, for example, work roughly this way.
Given this, one goal is to get the “important” information in a signal into relatively few coefficients in FD (“energy compaction”).
One goal is to get the “important” information in a signal into relatively few coefficients in FD (“energy compaction”).
It turns out the DFT has some problems in this regard. Consider the following signal, broken into 8-sample-long frames:
original signal
n 0 8 sample “frame”
n 0 Why is the DFT undesireable in this case, given our goal of compression?
XC [ k ] = N^1
n =
x [ n ] cos
π N
n +^12
k
n =
x [ n ]
ej^ Nπ ( n +1 / 2) k^
e − j^ Nπ^12 k N ∑^ −^1 n =
x [ n ]
ej^ Nπ ( n +1) k^
2 N e
− j (^) Nπ^12 k
n =
x [ n ] e − j^
2 π 2 N (− n −1) k^ +
n =
x [ n ] e − j^
2 π 2 N nk
2 N e
− j (^) Nπ^12 k
n =− N
x ˜[ n ] e − j^ 22 Nπ nk
e − j^ Nπ^12 k^ )^ ˜ X [ k ]
where x ˜[·] is given by the following, and the DFT coefficients X ˜[·] are computed with an analysis window of length 2 N :
x ˜[ n ] = ˜ x [ n + 2 N ] =
{ (^) x [ n ] if 0 ≤ n < N x [− n − 1] if − N < n < 0
The DCT is commonly used in compression applications.
We can think about computing the DCT by first putting a mirrored copy of a windowed signal next to itself, and then computing the DFT of that new signal (shifted by 1/2 sample):
8 sample “frame”
n 0 16-sample shifted, mirrored frame
n 0
Why is the DCT more appropriate, given our goals? How does this approach fix the issue(s) we saw with the DFT?
For many authentic signals (photographs, etc), the DCT has good “energy compaction”: most of the energy in the signal is represented by relatively few coefficients.
Consider DFT vs DCT of a “ramp:”
(^00 2 4 6) n 8 10 12 14 2 46
(^108) 1214
x [ n ]
For many authentic signals (photographs, etc), the DCT has good “energy compaction”: most of the energy in the signal is represented by relatively few coefficients.
Consider DFT vs DCT of a “ramp:”
(^00 2 4 6) n 8 10 12 14 24
68
1012
14
x [ n ]
(^0 2 4 6) k 8 10 12 14 0
1
2
3
4
5
6
7
| X [ k ]|
(^0 2 4 6) k 8 10 12 14 0
1
2
3
4
5
6
7
| XC [ k ]|
That didn’t sound very good, really... :(
What were the most noticeable artifacts in the reconstructed version? Where did they come from? How did this compare to what we saw with JPEG?
Let’s try a different approach:
Rather than zeroing out coefficients below the threshold, let’s quan- tize them differently (for example, use 8 bits for each sample below the threshold and 16 bits for each value above the threshold).
How does this compare? What artifacts remain? How can we explain them?
x [ n ]
window
MDCT
reconstructed
window
MDCT
reconstructed
0 100 200 300 400 500
sum2.
Formally, the MDCT is defined by:
XM [ k ] = 1 2 N
n =
x [ n ] cos
π N
n +^1 2
k +^1 2
y [ n ] =
k =
XM [ k ] cos
π N
n +
k +
Including a window function on both x [·] and y [·] can avoid disconti- nuities at the endpoints. Similar to DCT in terms of energy com- paction, but avoids issues with discontinuities on frame boundaries.
We have been able to achieve decent compression rates, but nothing close to MP3, for example. MP3 can ahieve around a 6:1 compres- sion ratio before expert listeners are able to distinguish between compressed and original audio.
This approach is actually somewhat similar to MP3, but we’re not quite there, so what are we missing?
Importantly, our goal is ultimately to throw away information that is perceptually unimportant. To this end, MP3 includes a model of human perception of audio, including: