

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Solutions to problem 1 in the ece 563 information theory course, which involves calculating the entropy of a random variable x using its probability mass function and the concept of self-information. The document also discusses the relationship between entropy and the number of questions asked to determine a value of x, as well as the chain rule for mutual information and the markov property. Additionally, it covers the concept of conditional entropy and its relationship to the entropy of a function of a random variable.
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


ECE 563 Information Theory
Instructor: R. Srikant TA: Akshay Kashyap
(a) Say P (outcome of toss is heads) = p for each toss, and let q = 1 − p be the probability of tails. Tosses are independent, therefore P (X = n) = pqn−^1. So, the entropy of X is
n=
pqn−^1 log(pqn−^1 )
n=
pqn^ log p +
n=
npqn^ log q
which, given the series summations already given in the text, simplifies to
−
p log p 1 − q
pq log q p^2
= −p log p − q log q q
= H(p) p
Here, p = 1/2, so H(p) = 1, and H(X) = 2.
(b) Intuitively, the best questions are the ones that are equally likely to have “yes” or “no” as answers. The following series of questions has this property: “Is X = 1?,” if not, then “Is X = 2?,” if not then “Is∑ X = 3?” and so on till we get a “yes” for an answer. Then, the expected number of questions is ∞ n=1 n(1/^2
n) = 2 = H(X).
Let y = g(x). Then, p(y) =
x:y=g(x) p(x). So, for any^ y, ∑
x:y=g(x)
p(x) log p(x) ≤
x:y=g(x)
p(x) log p(y) = p(y) log p(y),
with equality if and only if there is only one x such that g(x) = y. Therefore,
H(X) = −
x
p(x) log p(x)
y
x:y=g(x)
p(x) log p(x)
y
p(y) log p(y),
with equality if and only if y = g(x) is one-to-one. In (a), g(x) = 2x^ is one-to-one, and so H(X) = H(Y ). In (b), g(x) = cos(x) is not necessarily one-to-one, so in general H(X) ≥ H(Y ), with equality if cosine is one-to-one on the range of X (for example if all values of X lie in [0, π]).
2.5 Assume that there exists an x, say x 0 and two different values of y, say y 1 and y 2 such that p(x 0 , y 1 ) and p(x 0 , y 2 ) are both positive. Then, p(x 0 ) ≥ p(x 0 , y 1 ) + p(x 0 , y 2 ) > 0, and p(y 1 |x 0 ) are not equal to 0 or
H(Y |X) = −
x
p(x)
y
p(y|x) log p(y|x)
≥ p(x 0 ) [−p(y 1 |x 0 ) log p(y 1 |x 0 ) − p(y 2 |x 0 ) log p(y 2 |x 0 )]
0 ,
since −t log t ≥ 0 for t ∈ [0, 1], with strict inequality if t /∈ { 0 , 1 }. There H(Y |X) = 0 ⇒ Y is a function of X.
By the chain rule for mutual information,
I(X 1 ; X 2 ,... Xn) = I(X 1 ; X 2 ) + I(X 1 ; X 3 |X 2 ) +... + I(X 1 ; Xn|X 2 ,... , Xn− 2 ).
But since the past and the future are conditionally independent given the present by the Markov property, all terms except the first one on the right hand side are zero, and so
I(X 1 ; X 2 ,... Xn) = I(X 1 ; X 2 ).
Since R is a function of the sequence X = (X 1 ,... , Xn), clearly H(X) ≥ H(R). However, any Xi (and in particular, Xn) together with the run length sequence uniquely determines X, and so
H(X) = H(Xn, R) = H(R) + H(Xn|R) ≤ H(R) + H(Xn) ≤ H(R) + 1.
P (p(X) ≤ d) log(1/d) =
x:p(x)≤d
p(x) log(1/d)
x:p(x)≤d
p(x) log(1/p(x))
x
p(x) log(1/p(x))
= H(X).
Let P 1 be the distribution (p 1 , p 2 ,... , pm), and P 2 = ( p^1 + 2 p^2 , p^1 + 2 p^2 , p 3 ,... , pm) (where without loss of gen- erality, we have taken i = 1, j = 2).
Then,
p 1 + p 2 2
) log
p 1 + p 2 2