






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Orthobasis expansions, specifically the cosine transform and its discrete version, the discrete cosine transform (DCT). The cosine transform is an alternative to Fourier series, with real-valued basis functions and different symmetries. The DCT is used in image and video compression standards like JPEG and MPEG. definitions, derivations, and applications of these transforms.
Typology: Study Guides, Projects, Research
1 / 10
This page cannot be seen from the preview
Don't miss anything!







In the next few lectures, we will look at a few examples of orthobasis expansions that are used in modern signal processing.
The cosine-I transform is an alternative to Fourier series; it is an expansion in an orthobasis for functions on [0, 1] (or any interval on the real line) where the basis functions look like sinusoids. There are two main differences that make it more attractive than Fourier series for certain applications:
The discrete version of cosine-I (the “DCT”) is used in both the JPEG image compression standard and the MPEG video compres- sion standard; we will discuss this more later in this section.
Definition. The cosine-I basis functions for t ∈ [0, 1] are
ψk(t) =
{ (^1) √ k = 0 2 cos(πkt) k = 1, 2 ,...
We can derive the cosine-I basis from the Fourier series in the follow- ing manner. Let x(t) be a signal on the interval [0, 1]. Let ˜x(t) be its “reflection extension” on [− 1 , 1]. That is
x˜(t) =
{ x(−t) − 1 ≤ t ≤ 0 x(t) 0 ≤ t ≤ 1
x(t) x˜(t)
(^00) 0.2 0.4 0.6 0.8 1
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
−1^0 −0.5 0 0.5 1
0.
0.
0.
0.
0.
We can use Fourier series to synthesis ˜x(t):
x˜(t) =
∑^ ∞
k=−∞
αk ejπkt.
Since ˜x(t) is real, we will have α−k = αk, and so we can rewrite this as
x˜(t) = a 0 +
∑^ ∞
k=
ak cos(πkt) +
∑^ ∞
k=
bk sin(πkt),
where a 0 = α 0 , ak = 2 Re {αk}, and bk = −2 Im {αk}. Since ˜x(t) is even and sin(πkt) is odd, 〈x˜(t), sin(πkt)〉 = 0 and so
bk = 0, for all k = 1, 2 , 3 ,... ,
and so ˜x(t) on [− 1 , 1] can be written as
x˜(t) = a 0 +
∑^ ∞
k=
ak cos(πkt).
Since we can use this expansion to build up any symmetric function on [− 1 , 1], it means that the right hand side of the function on [0, 1]
The discrete cosine transform (DCT)
Just as there is a version of Fourier series for sampled signals on an interval (i.e. finite dimensional signals in CN^ ), this is the discrete Fourier transform (DFT), there is a version of the cosine-I transform for real-valued finite signals as well. This is called the discrete cosine transform, or DCT.
The DCT basis functions for RN^ are
ψk[n] =
√ 1 √ N^ k^ = 0 2 N cos^
(πk N (n^ +^
1 2 )
) k = 1,... , N − 1
for sample indices n = 0, 1 ,... , N − 1. Showing that
N∑ − 1
n=
ψk[n]ψ`[n] =
{ 1 k = 0 k 6 =
is an exercise you can do at home. Notice that the samples of the cosines are on the half-sample points (we see (n + 1/2) in the expres- sion above instead of n).
Just as the cosine-I transform can be computed from the Fourier series coefficients of a symmetric extension of the signal, the DCT can be computed from the DFT of a symmetric extension. That means we have a fast algorithm for computing the DCT — the cost is essentially the same as for an FFT, O(N log N ).
Just as for Fourier series and the discrete Fourier transform, we can leverage the 1D cosine-I basis and the DCT into separable bases for 2D images.
Definition. Let {ψk(t)}k≥ 0 be the cosine-I basis in (1). Set
ψ k2D 1 ,k 2 (s, t) = ψk 1 (s)ψk 2 (t).
Then {ψ2D k 1 ,k 2 (s, t)}k 1 ,k 2 ∈N is an orthonormal basis for L 2 ([0, 1]^2 )
This is just a particular instance of a general fact. It is straight- forward to argue (you can do so at home) that if {ψγ(t)}γ∈Γ is an orthonormal basis for L 2 ([0, 1]), then {ψγ 1 (s)ψγ 2 (t)}γ 1 ,γ 2 ∈Γ is an or- thonormal basis for L 2 ([0, 1]^2 ).
The DCT extends to 2D in the same way.
Definition. Let {ψk[n]} 0 ≤k≤N − 1 be the DCT basis in (2). Set
ψ j,k2D [m, n] = ψj [m]ψk[n].
Then {ψ2D j,k [m, n]} 0 ≤j,k≤N − 1 is an orthonormal basis for RN^ × RN^.
The DCT is basis of the popular JPEG image compression standard. The central idea is that while energy in a picture is distributed more or less evenly throughout, in the DCT transform domain it tends to be concentrated at low frequencies.
JPEG compression work roughly as follows:
There are some details we are leaving out here, probably the most important of which is how the three different color bands are dealt with, but the above outlines the essential ideas.
The basic idea is that while the energy within an 8 × 8 block of pixels tends to be more or less evenly distributed, the DCT concentrates this energy onto a relatively small number of transform coefficients. Moreover, the significant coefficients tend to be at the same place in the transform domain (low spatial frequencies).
849 850 851 852 853 854 855 856
297 298 299 300 301 302 303 (^3041 2 3 4 5 6 7 )
1 2 3 4 5 6 7 8
8 × 8 block 2D DCT coeffs ordering
To get a rough feel for how closely this model matches reality, let’s look at a simple example. Here we have an original image 2048×2048, and a zoom into a 256 × 256 piece of the image:
original
900 950 1000 1050 1100
250
300
350
400
450
Here is the same piece after using 1 of the 64 coefficients per block (1/ 64 ≈ 1 .6%), 3/ 64 ≈ 4 .6% of the coefficients, and 10/ 64 ≈ 15 .62%: 1.6%
900 950 1000 1050 1100
250 300 350 400 450
4.6%
900 950 1000 1050 1100
250 300 350 400 450
14.6%
900 950 1000 1050 1100
250 300 350 400 450 1 / 64 3 / 64 10 / 64
So the “low frequency” heuristic appears to be a good one.
Video compression
The DCT also plays a fundamental role in video compression (e.g. MPEG, H.264, etc.), but in a slightly different way. Video codecs are complicated, but here is essentially what they do:
Here is an example video frame, along with the differences between this frame and the next two frames (in false color):
x(t 0 ) x(t 1 ) − x(t 0 ) x(t 2 ) − x(t 0 )
50 100 150 200 250 300 350 400 450 500
50 100 150 200 250 300 350 400 450 (^50050 100 150 200 250 300 350 400 450 )
50 100 150 200 250 300 350 400 450 (^50050 100 150 200 250 300 350 400 450 )
50 100 150 200 250 300 350 400 450 500
The only activity is where the car is moving from left to right.