Orthobasis Expansions: Cosine Transforms and Discrete Cosine Transform, Study Guides, Projects, Research of Fourier Transform and Series

Orthobasis expansions, specifically the cosine transform and its discrete version, the discrete cosine transform (DCT). The cosine transform is an alternative to Fourier series, with real-valued basis functions and different symmetries. The DCT is used in image and video compression standards like JPEG and MPEG. definitions, derivations, and applications of these transforms.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 08/01/2022

fioh_ji
fioh_ji 🇰🇼

4.5

(70)

814 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
In the next few lectures, we will look at a few examples of orthobasis
expansions that are used in modern signal processing.
Cosine transforms
The cosine-I transform is an alternative to Fourier series; it is an
expansion in an orthobasis for functions on [0,1] (or any interval on
the real line) where the basis functions look like sinusoids. There are
two main differences that make it more attractive than Fourier series
for certain applications:
1. the basis functions and the expansion coefficients are real-
valued;
2. the basis functions have different symmetries.
The discrete version of cosine-I (the “DCT”) is used in both the
JPEG image compression standard and the MPEG video compres-
sion standard; we will discuss this more later in this section.
Definition. The cosine-I basis functions for t[0,1] are
ψk(t) = (1k= 0
2 cos(πkt)k= 1,2, . . . .(1)
We can derive the cosine-I basis from the Fourier series in the follow-
ing manner. Let x(t) be a signal on the interval [0,1]. Let ˜x(t) be
its “reflection extension” on [1,1]. That is
˜x(t) = (x(t)1t0
x(t) 0 t1
88
Georgia Tech ECE 6250 Fall 2019; Notes by J. Romberg and M. Davenport. Last updated 14:07, September 16, 2019
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Orthobasis Expansions: Cosine Transforms and Discrete Cosine Transform and more Study Guides, Projects, Research Fourier Transform and Series in PDF only on Docsity!

In the next few lectures, we will look at a few examples of orthobasis expansions that are used in modern signal processing.

Cosine transforms

The cosine-I transform is an alternative to Fourier series; it is an expansion in an orthobasis for functions on [0, 1] (or any interval on the real line) where the basis functions look like sinusoids. There are two main differences that make it more attractive than Fourier series for certain applications:

  1. the basis functions and the expansion coefficients are real- valued;
  2. the basis functions have different symmetries.

The discrete version of cosine-I (the “DCT”) is used in both the JPEG image compression standard and the MPEG video compres- sion standard; we will discuss this more later in this section.

Definition. The cosine-I basis functions for t ∈ [0, 1] are

ψk(t) =

{ (^1) √ k = 0 2 cos(πkt) k = 1, 2 ,...

We can derive the cosine-I basis from the Fourier series in the follow- ing manner. Let x(t) be a signal on the interval [0, 1]. Let ˜x(t) be its “reflection extension” on [− 1 , 1]. That is

x˜(t) =

{ x(−t) − 1 ≤ t ≤ 0 x(t) 0 ≤ t ≤ 1

x(t) x˜(t)

(^00) 0.2 0.4 0.6 0.8 1

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

−1^0 −0.5 0 0.5 1

0.

0.

0.

0.

0.

We can use Fourier series to synthesis ˜x(t):

x˜(t) =

∑^ ∞

k=−∞

αk ejπkt.

Since ˜x(t) is real, we will have α−k = αk, and so we can rewrite this as

x˜(t) = a 0 +

∑^ ∞

k=

ak cos(πkt) +

∑^ ∞

k=

bk sin(πkt),

where a 0 = α 0 , ak = 2 Re {αk}, and bk = −2 Im {αk}. Since ˜x(t) is even and sin(πkt) is odd, 〈x˜(t), sin(πkt)〉 = 0 and so

bk = 0, for all k = 1, 2 , 3 ,... ,

and so ˜x(t) on [− 1 , 1] can be written as

x˜(t) = a 0 +

∑^ ∞

k=

ak cos(πkt).

Since we can use this expansion to build up any symmetric function on [− 1 , 1], it means that the right hand side of the function on [0, 1]

The discrete cosine transform (DCT)

Just as there is a version of Fourier series for sampled signals on an interval (i.e. finite dimensional signals in CN^ ), this is the discrete Fourier transform (DFT), there is a version of the cosine-I transform for real-valued finite signals as well. This is called the discrete cosine transform, or DCT.

The DCT basis functions for RN^ are

ψk[n] =

  

√ 1 √ N^ k^ = 0 2 N cos^

(πk N (n^ +^

1 2 )

) k = 1,... , N − 1

for sample indices n = 0, 1 ,... , N − 1. Showing that

N∑ − 1

n=

ψk[n]ψ`[n] =

{ 1 k = 0 k 6 =

is an exercise you can do at home. Notice that the samples of the cosines are on the half-sample points (we see (n + 1/2) in the expres- sion above instead of n).

Just as the cosine-I transform can be computed from the Fourier series coefficients of a symmetric extension of the signal, the DCT can be computed from the DFT of a symmetric extension. That means we have a fast algorithm for computing the DCT — the cost is essentially the same as for an FFT, O(N log N ).

The cosine-I and DCT for 2D images

Just as for Fourier series and the discrete Fourier transform, we can leverage the 1D cosine-I basis and the DCT into separable bases for 2D images.

Definition. Let {ψk(t)}k≥ 0 be the cosine-I basis in (1). Set

ψ k2D 1 ,k 2 (s, t) = ψk 1 (s)ψk 2 (t).

Then {ψ2D k 1 ,k 2 (s, t)}k 1 ,k 2 ∈N is an orthonormal basis for L 2 ([0, 1]^2 )

This is just a particular instance of a general fact. It is straight- forward to argue (you can do so at home) that if {ψγ(t)}γ∈Γ is an orthonormal basis for L 2 ([0, 1]), then {ψγ 1 (s)ψγ 2 (t)}γ 1 ,γ 2 ∈Γ is an or- thonormal basis for L 2 ([0, 1]^2 ).

The DCT extends to 2D in the same way.

Definition. Let {ψk[n]} 0 ≤k≤N − 1 be the DCT basis in (2). Set

ψ j,k2D [m, n] = ψj [m]ψk[n].

Then {ψ2D j,k [m, n]} 0 ≤j,k≤N − 1 is an orthonormal basis for RN^ × RN^.

The DCT in image and video compression

The DCT is basis of the popular JPEG image compression standard. The central idea is that while energy in a picture is distributed more or less evenly throughout, in the DCT transform domain it tends to be concentrated at low frequencies.

JPEG compression work roughly as follows:

  1. Divide the image into 8 × 8 blocks of pixels
  2. Take a DCT within each block
  3. Quantize the coefficients — the rough effect of this is to keep the larger coefficients and remove the samller ones
  4. Bitstream (losslessly) encode the result.

There are some details we are leaving out here, probably the most important of which is how the three different color bands are dealt with, but the above outlines the essential ideas.

The basic idea is that while the energy within an 8 × 8 block of pixels tends to be more or less evenly distributed, the DCT concentrates this energy onto a relatively small number of transform coefficients. Moreover, the significant coefficients tend to be at the same place in the transform domain (low spatial frequencies).

849 850 851 852 853 854 855 856

297 298 299 300 301 302 303 (^3041 2 3 4 5 6 7 )

1 2 3 4 5 6 7 8

8 × 8 block 2D DCT coeffs ordering

To get a rough feel for how closely this model matches reality, let’s look at a simple example. Here we have an original image 2048×2048, and a zoom into a 256 × 256 piece of the image:

original

900 950 1000 1050 1100

250

300

350

400

450

Here is the same piece after using 1 of the 64 coefficients per block (1/ 64 ≈ 1 .6%), 3/ 64 ≈ 4 .6% of the coefficients, and 10/ 64 ≈ 15 .62%: 1.6%

900 950 1000 1050 1100

250 300 350 400 450

4.6%

900 950 1000 1050 1100

250 300 350 400 450

14.6%

900 950 1000 1050 1100

250 300 350 400 450 1 / 64 3 / 64 10 / 64

So the “low frequency” heuristic appears to be a good one.

Video compression

The DCT also plays a fundamental role in video compression (e.g. MPEG, H.264, etc.), but in a slightly different way. Video codecs are complicated, but here is essentially what they do:

  1. Estimate, describe, and quantize the motion in between frames.
  2. Use the motion estimate to “predict” the next frame.
  3. Use the (block-based) DCT to code the residual.

Here is an example video frame, along with the differences between this frame and the next two frames (in false color):

x(t 0 ) x(t 1 ) − x(t 0 ) x(t 2 ) − x(t 0 )

50 100 150 200 250 300 350 400 450 500

50 100 150 200 250 300 350 400 450 (^50050 100 150 200 250 300 350 400 450 )

50 100 150 200 250 300 350 400 450 (^50050 100 150 200 250 300 350 400 450 )

50 100 150 200 250 300 350 400 450 500

The only activity is where the car is moving from left to right.