Information Theory Homework Solutions: Entropy and Mutual Information Calculations, Assignments of Electrical and Electronics Engineering

Solutions to problem 1 in the ece 563 information theory course, which involves calculating the entropy of a random variable x using its probability mass function and the concept of self-information. The document also discusses the relationship between entropy and the number of questions asked to determine a value of x, as well as the chain rule for mutual information and the markov property. Additionally, it covers the concept of conditional entropy and its relationship to the entropy of a function of a random variable.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-cbw
koofers-user-cbw 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECE 563 Information Theory
Homework 1 Solutions
Instructor: R. Srikant TA: Akshay Kashyap
2.1
(a) Say P(outcome of toss is heads) = pfor each toss, and let q= 1 pbe the probability of tails. Tosses
are independent, therefore P(X=n) = pqn1. So, the entropy of Xis
H(X) =
X
n=1
pqn1log(pqn1)
="
X
n=0
pqnlog p+
X
n=0
npqnlog q#,
which, given the series summations already given in the text, simplifies to
plog p
1qpq log q
p2
=plog pqlog q
q
=H(p)
p.
Here, p= 1/2, so H(p) = 1, and H(X) = 2.
(b) Intuitively, the best questions are the ones that are equally likely to have “yes” or “no” as answers.
The following series of questions has this property: “Is X= 1?,” if not, then “Is X= 2?,” if not then
“Is X= 3?” and so on till we get a “yes” for an answer. Then, the expected number of questions is
P
n=1 n(1/2n) = 2 = H(X).
2.2
Let y=g(x). Then, p(y) = Px:y=g(x)p(x). So, for any y,
X
x:y=g(x)
p(x) log p(x)X
x:y=g(x)
p(x) log p(y) = p(y) log p(y),
with equality if and only if there is only one xsuch that g(x) = y. Therefore,
H(X) = X
x
p(x) log p(x)
=X
yX
x:y=g(x)
p(x) log p(x)
X
y
p(y) log p(y),
with equality if and only if y=g(x) is one-to-one. In (a), g(x) = 2xis one-to-one, and so H(X) = H(Y).
In (b), g(x) = cos(x) is not necessarily one-to-one, so in general H(X)H(Y), with equality if cosine is
one-to-one on the range of X(for example if all values of Xlie in [0, π]).
1
pf3

Partial preview of the text

Download Information Theory Homework Solutions: Entropy and Mutual Information Calculations and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

ECE 563 Information Theory

Homework 1 Solutions

Instructor: R. Srikant TA: Akshay Kashyap

(a) Say P (outcome of toss is heads) = p for each toss, and let q = 1 − p be the probability of tails. Tosses are independent, therefore P (X = n) = pqn−^1. So, the entropy of X is

H(X) = −

∑^ ∞

n=

pqn−^1 log(pqn−^1 )

[ ∞

n=

pqn^ log p +

∑^ ∞

n=

npqn^ log q

]

which, given the series summations already given in the text, simplifies to

p log p 1 − q

pq log q p^2

= −p log p − q log q q

= H(p) p

Here, p = 1/2, so H(p) = 1, and H(X) = 2.

(b) Intuitively, the best questions are the ones that are equally likely to have “yes” or “no” as answers. The following series of questions has this property: “Is X = 1?,” if not, then “Is X = 2?,” if not then “Is∑ X = 3?” and so on till we get a “yes” for an answer. Then, the expected number of questions is ∞ n=1 n(1/^2

n) = 2 = H(X).

Let y = g(x). Then, p(y) =

x:y=g(x) p(x). So, for any^ y, ∑

x:y=g(x)

p(x) log p(x) ≤

x:y=g(x)

p(x) log p(y) = p(y) log p(y),

with equality if and only if there is only one x such that g(x) = y. Therefore,

H(X) = −

x

p(x) log p(x)

y

x:y=g(x)

p(x) log p(x)

y

p(y) log p(y),

with equality if and only if y = g(x) is one-to-one. In (a), g(x) = 2x^ is one-to-one, and so H(X) = H(Y ). In (b), g(x) = cos(x) is not necessarily one-to-one, so in general H(X) ≥ H(Y ), with equality if cosine is one-to-one on the range of X (for example if all values of X lie in [0, π]).

2.5 Assume that there exists an x, say x 0 and two different values of y, say y 1 and y 2 such that p(x 0 , y 1 ) and p(x 0 , y 2 ) are both positive. Then, p(x 0 ) ≥ p(x 0 , y 1 ) + p(x 0 , y 2 ) > 0, and p(y 1 |x 0 ) are not equal to 0 or

  1. Thus,

H(Y |X) = −

x

p(x)

y

p(y|x) log p(y|x)

≥ p(x 0 ) [−p(y 1 |x 0 ) log p(y 1 |x 0 ) − p(y 2 |x 0 ) log p(y 2 |x 0 )]

0 ,

since −t log t ≥ 0 for t ∈ [0, 1], with strict inequality if t /∈ { 0 , 1 }. There H(Y |X) = 0 ⇒ Y is a function of X.

By the chain rule for mutual information,

I(X 1 ; X 2 ,... Xn) = I(X 1 ; X 2 ) + I(X 1 ; X 3 |X 2 ) +... + I(X 1 ; Xn|X 2 ,... , Xn− 2 ).

But since the past and the future are conditionally independent given the present by the Markov property, all terms except the first one on the right hand side are zero, and so

I(X 1 ; X 2 ,... Xn) = I(X 1 ; X 2 ).

Since R is a function of the sequence X = (X 1 ,... , Xn), clearly H(X) ≥ H(R). However, any Xi (and in particular, Xn) together with the run length sequence uniquely determines X, and so

H(X) = H(Xn, R) = H(R) + H(Xn|R) ≤ H(R) + H(Xn) ≤ H(R) + 1.

P (p(X) ≤ d) log(1/d) =

x:p(x)≤d

p(x) log(1/d)

x:p(x)≤d

p(x) log(1/p(x))

x

p(x) log(1/p(x))

= H(X).

Let P 1 be the distribution (p 1 , p 2 ,... , pm), and P 2 = ( p^1 + 2 p^2 , p^1 + 2 p^2 , p 3 ,... , pm) (where without loss of gen- erality, we have taken i = 1, j = 2).

Then,

H(P 2 ) − H(P 1 ) = −2(

p 1 + p 2 2

) log

p 1 + p 2 2

  • p 1 log p 1 + p 2 log p 2