CourseNotesEE501.pdf stasticial inference, Essays (university) of Statics

CourseNotesEE501.pdf stasticial inference

Typology: Essays (university)

2017/2018

Uploaded on 03/05/2018

jutt-da-gaana
jutt-da-gaana 🇵🇰

5

(2)

12 documents

1 / 231

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture Notes: An Introduction to the Theory of
Statistical Communication
Germain Drolet
Department of Electrical & Computer Engineering,
Royal Military College of Canada,
P.O. Box 17000, Station Forces,
Kingston, Ontario, CANADA
K7K 7B4
Tel: (613) 541-6000, extension: 6192
Fax: (613) 544-8107
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download CourseNotesEE501.pdf stasticial inference and more Essays (university) Statics in PDF only on Docsity!

Lecture Notes: An Introduction to the Theory of

Statistical Communication

Germain Drolet

Department of Electrical & Computer Engineering,

Royal Military College of Canada,

P.O. Box 17000, Station Forces,

Kingston, Ontario, CANADA

K7K 7B

Tel: (613) 541-6000, extension: 6192

Fax: (613) 544-

Email: [email protected]

ii

Copyright ⃝c2006 by G. Drolet. All rights reserved. Permission is granted to make and distribute verbatim copies of these notes provided the copyright notice and this permission notice are preserved on all copies.

iv PREFACE

Chapter 5: presents the foundation of Coding Theory. The material is taken from Dr. S´eguin’s lecture notes for a course taught in 1984 at the Uni- versit´e Laval, Ste-Foy, Qu´ebec, QC, CANADA. Some of the results are contained in W&J Chapter 5. This chapter is self-contained and no refer- ences are made to the textbook by W&J.

Suggestions and comments to improve the notes are welcome.

August 2006

Germain Drolet Department of Electrical & Computer Engineering, Royal Military College of Canada, P.O. Box 17000, Station Forces, Kingston, Ontario, CANADA K7K 7B

Tel: (613) 541-6000, extension: 6192 Fax: (613) 544- Email: [email protected]

Contents

CONTENTS vii

B.3 Effect of sampling frequency..................... 212 B.4 Continuous time signals....................... 214

viii CONTENTS

2 CHAPTER 2. PROBABILITY THEORY

2.1 Fundamental Definitions

Refer to Wozencraft & Jacobs pp 13 - 20 for the details. The following definitions are intuitive and serve as basis for the formal ax- iomatic definition of probability system that will follow.

Definition 2.1.1. A random experiment is a procedure producing an outcome that cannot be predicted accurately and with certainty at the same time. We have:

  1. outcome: produced by a random experiment,
  2. set of all outcomes: which may be infinite,
  3. result: set of some (or all or none) of the outcomes,
  4. a sequence of length N is a list of N outcomes obtained by repeating a random experiment N times.
  5. N (seq, A) is the number of occurrences of event A in sequence seq.
  6. fN (seq, A) = N^ (seq,A N ) is the relative frequency of occurrence of result A in sequence seq of length N.
  7. relative frequency of occurrence of a result A, denoted by f∞(A) is defined by

f∞(A) , lim N →∞ fN (seq, A)

= lim N →∞

N (seq, A) N

Remarks.

  1. We need not specify the sequence in f∞(A) since, as N → ∞, all sequences will converge to the same ratio; this is called statistical regularity and will be discussed in more details at section 2.11.1.
  2. Notice that Dom(f∞) = {all results}̸ = {all outcomes} (W&J, page 14).
  3. As shown in W&J, pp 14 - 15, f∞(A ∪ B) = f∞(A) + f∞(B) when A, B are mutually exclusive results.

Example 2.1.1.

random experiment: tossing of a die,

set of all outcomes: { 1 , 2 , 3 , 4 , 5 , 6 },

result: { 1 } and { 3 , 6 } ≡ {multiples of 3} are two examples of results,

2.1. FUNDAMENTAL DEFINITIONS 3

relative frequency of occurrence:

f∞({ 1 }) = lim N →∞

N (seq, { 1 }) N

f∞({ 1 , 3 }) = lim N →∞

N (seq, { 1 , 3 }) N

= lim N →∞

N (seq, { 1 }) + N (seq, { 3 }) N

= lim N →∞

N (seq, { 1 })) N

  • lim N →∞

N (seq, { 3 })) N = f∞({ 1 }) + f∞({ 3 }) = 1/ 3

The above definitions are adequate to describe the physical concept but too loose and imprecise to describe a mathematical concept. We next give the formal axiomatic definition of probability system. Under certain physical conditions a random experiment can be modeled as a probability system and obeys the same laws. After giving the definitions and its immediate consequences, we illustrate the relationship between “probability system” and “random experiment”. This abstract definition should be well understood; this will make it easier to grasp the concepts of random variables and random processes later.

Definition 2.1.2. (Axiomatic definition) A probability system consists of a non- empty collection of objects Ω, a non-empty collection F of subsets of Ω and a function P : F → R satisfying the following (axioms):

  1. Ω ∈ F
  2. A ∈ F ⇒ A¯ ∈ F
  3. A 1 , A 2 ,... , An, An+1,... ∈ F ⇒ A 1 ∪ A 2 ∪ A 3 ∪... ∈ F , i.e. F is closed under countable unions.
  4. P (Ω) = 1.
  5. P (A) ≥ 0 , ∀A ∈ F.
  6. A 1 , A 2 ,... ∈ F and Ai ∩ Aj = ∅, ∀j ̸= i (pairwise disjoints) ⇒ P (A 1 ∪ A 2 ∪ A 3 ∪.. .) = P (A 1 ) + P (A 2 ) +...

Ω is called sample space and its elements are called sample points. The elements of F are called events (F is called class of events). A probability space will be denoted by (Ω, F , P : F → [0, 1] ⊂ R) or simply (Ω, F , P ).

Remarks.

  1. The definitions given by Wozencraft & Jacobs are limited to the case |F |̸ = ∞ (W&J, page 20).

2.1. FUNDAMENTAL DEFINITIONS 5

The correspondence between “probability system” and “random experi- ment” is summarized in the following table:

Probability System Random Experiment Sample space Set of all outcomes Sample point Outcome Event Result Probability measure Relative frequency of occurrence

Remarks.

  1. The axiomatic definition has the following advantages:

(a) It is very precise and formal and uses objects/concepts suitable to the development of a strong theory. (b) It does not rely on the existence of a physical random experiment.^3 We can construct many examples of probability systems more freely.

  1. The axiomatic definition has the disadvantage of being suitable to the modeling of a random experiment only under very specific controlled con- ditions. We have to be careful when using a specific law from the proba- bility theory into the context of a real world random experiment.

We will often refer to simple real world random experiments for example/il- lustration purposes. Conversely, for every random experiment we can construct a corresponding probability system to be used as idealized mathematical model. This is done as follows:

  1. Ω , set of all possible outcomes,
  2. F , set of all results, the probability of which may have to be calculated, plus any other required subset of Ω, required to satisfy the axioms of a class of event (if Ω is finite it is most convenient to take all the subsets of Ω),
  3. P (A) , f∞(A), ∀A ∈ F.

Relation of the Model to the Real World: Wozencraft & Jacobs pages 24 - 29. This is a difficult paragraph and may be viewed as comprising two intercon- nected parts:

  1. Of primary importance are the definitions of compound experiment and binomial distribution (this is used in the problems) from the bottom of page W&J 25 after equation (2.15c) up to the paragraph starting with “Our primary interest... ” on page W&J 27. (^3) refer to the examples in “Real line sample space”, W&J, page 21, for a purely mathemat- ical construction.

6 CHAPTER 2. PROBABILITY THEORY

  1. Of secondary importance (remainder of this paragraph from page W&J 27 to the bottom of page W&J 29) W&J illustrates that in a long sequence of M (independent) outcomes of a random experiment, the fraction of these outcomes that lie in an event A of probability p = P (A), is very likely close to p, or in other words f∞(A) = P (A). This will be explained again with the Weak Law of Large Numbers in §2.11.1 and may be skipped for now.

Definition 2.1.3. A compound experiment is a random experiment which con- sists of a sequence of M independent (defined later) trials of a simpler experi- ment.

If A is an event of the simpler experiment with P (A) = p ̸= 0, P ( A¯) = 1 − p, then the following set is a sample space of the compound experiment (M trials):

ΩM = {(A, A,... , A),( A, A, A,... , A¯ ), (A, A, A, A,... , A¯ ), ( A,¯ A, A, A,... , A¯ ),... , ( A,¯ A,... ,¯ A¯)}

Let FM be the set of all subsets of ΩM , i.e. FM = 2ΩM^. We also define the mapping: PM : FM → [0, 1] PM : {x} 7 → p# of^ A^ in^ x(1 − p)# of^ A¯ in x

for any sequence x ∈ ΩM , and this together with the axioms of a probability measure completely determines PM for every event E ∈ FM :

P (E) =

x∈E

p# of^ A^ in^ x(1 − p)# of^ A¯ in x

One can easily verify that (ΩM , FM , PM ) is a probability system, i.e. FM is a valid class of events and PM is a valid probability measure. The following M + 1 events defined below are of special interest:

Ai^ = {x ∈ ΩM :x has i occurrences of A and M − i occurrences of A¯} ∈ FM ⊂ ΩM ,

for i = 0, 1 ,... , M. It is then seen (bottom of page 26) that

PM (Ai) =

M

i

pi(1 − p)M^ −i, i = 0, 1 ,... , M,

where we recall that

(M

i

= (^) i!(MM −^ !i)!. This is called the binomial distribution (other distributions will be defined later). We see that

Ai^ ∩ Aj^ = ∅, whenever i ̸= j ∪Mi=0Ai^ = ΩM

so we expect [from axioms (4) and (5)]

∑M

i=0 PM^ (A i) = PM (ΩM ) = 1. This is

verified by the binomial theorem (top of page 27).

Example 2.1.2. The probability of obtaining five heads when throwing a coin 15 times is

5

2

2

8 CHAPTER 2. PROBABILITY THEORY

Theorem 2.

  1. If A 1 , A 2 ,... ∈ F and Ai ∩ Aj ∩ B = ∅, ∀j, ∀i ̸= j, then:

P (A 1 ∪ A 2 ∪... |B) =

∑^ ∞

i=

P (Ai|B).

If moreover A 1 ∪ A 2 ∪... ⊃ B then

i=1 P^ (Ai|B) = 1; this is known as the theorem of total probability.^4

  1. Let A 1 , A 2 ,... , An ∈ F , be pairwise disjoint and P (Aj ) ̸= 0, ∀j. If B ∈ F satisfies P (B) ̸= 0 and B ⊂ ∪nj=1Aj , then

(a) P (B) =

∑n j=1 P^ (Aj^ )P^ (B|Aj^ ). (b) P (Ai|B) = ∑njP=1^ ( APi ()APj^ ( )BP| (ABi)|Aj ) , for any i = 1, 2 ,... , n; this is known as Bayes theorem.^5

Proof. We prove the second part of the theorem only; the proof of the first part is left as an exercise.

  1. B ⊂ ∪nj=1Aj ⇒ B = B ∩

∪nj=1Aj

= ∪nj=1(B ∩ Aj ) and all the B ∩ Aj are pairwise disjoint. It follows that P (B) =

∑n j=1 P^ (B^ ∩^ Aj^ ) from which the result follows.

  1. P (Ai|B) = P^ (Ai P)P (B^ (B) |Ai)from equation (2.21) in W&J. The desired result follows from part 1 of the theorem.

Bayes theorem is useful in situations where P (B|Aj ) is given for every j but P (Aj |B) is not directly known. The following example illustrates this.

Example 2.1.3. (Refer to figure 2.1) We are given three boxes A 1 , A 2 , A 3 containing coloured balls as follows:

A 1 : 2 red balls and 3 black balls, A 2 : 3 red balls and 5 black balls, A 3 : 4 red balls and 4 black balls.

The (random) experiment consists in first choosing a box at random among the three boxes (equiprobably, i.e. each has a probability 1/3 of being chosen) and then draw a ball at random from the box chosen. Let B denote the event “a red ball has been drawn”. Calculate the probability that the ball was drawn from box number 2 if it is red, i.e. P (A 2 |B).

(^4) cf: Wozencraft & Jacobs page 31 (^5) cf: Wozencraft & Jacobs problem 2.

2.1. FUNDAMENTAL DEFINITIONS 9

Solution: From the data given in the problem we have that P (Ai) = 1/ 3 , i = 1 , 2 , 3, and clearly B ⊂ ∪^3 j=1Aj. It follows from the theorem that:

P (A 2 |B) =

P (A 2 )P (B|A 2 )

j=1 P^ (Aj^ )P^ (B|Aj^ )

(1/3) [2/5 + 3/8 + 4/8]

Figure 2.1:

Example 2.1.4. A binary source transmits at random one of two messages m 0 or m 1 , with probabilities P (m 0 ) = 1/ 3 , P (m 1 ) = 2/3. The message is fed through a random (noisy) channel of input and output respectively denoted as T X and RX, and with the following transition probabilities:

P (RX = 0 | T X = m 0 ) = 0. 99 P (RX = 1 | T X = m 0 ) = 0. 01 P (RX = 0 | T X = m 1 ) = 0. 01 P (RX = 1 | T X = m 1 ) = 0. 99

Calculate

  1. P (error | RX = 0) = P (T X = m 1 | RX = 0),
  2. P (error | RX = 1) = P (T X = m 0 | RX = 1).

2.2. COMMUNICATION PROBLEM 11

Solution: we first notice that A = (A ∩ B) ∪ (A ∩ B¯). Next,

P (A ∩ B¯) = P (A) − P (A ∩ B) = P (A) − P (A)P (B) = P (A)(1 − P (B)) = P (A)P ( B¯)

2.2 Communication problem

Refer to Wozencraft & Jacobs, pages 33 to 37. In this example, a probability system is constructed by combining a random experiment (message source), a random transformation (digital communication channel) and a deterministic transformation (decision element).

  1. Source: We define:

Sample space: Ωsource = {m 0 , m 1 } Class of events: 2 Ωsource^ = {∅, {m 0 }, {m 1 }, {m 0 , m 1 }} Probability function:

PS (∅) = 0 , PS ({m 0 }) = Pm 0 , PS ({m 1 }) = Pm 1 , 1 = PS (Ωsource) = PS ({m 0 , m 1 }) = Pm 0 + Pm 1 = 1.

Pm 0 , Pm 1 are called a priori message probabilities. ( Ωsource, 2 Ωsource^ , PS

satisfies all the axioms of a probability system.

  1. Discrete communication channel: is a transformation with unpredictable (random) characteristics. It is determined by its transition probabilities P [rj |mi] ≥ 0 as in figure 2.10 (W&J). To be valid it must satisfy: ∑

all j

P [rj |mi] = 1

for all i. Combining the source and the discrete communication channel we define:

Sample space:

ΩDCC = {(m 0 , r 0 ), (m 0 , r 1 ), (m 0 , r 2 ), (m 1 , r 0 ), (m 1 , r 1 ), (m 1 , r 2 )}

Class of events: 2 ΩDCC^ = {∅, {(m 0 , r 0 )}, {(m 0 , r 1 )},... , ΩDCC }

12 CHAPTER 2. PROBABILITY THEORY

Probability function:

PDCC ({(mi, rj )}) = PS ({mi})P [rj |mi] ,

for every i and every j. One easily verifies that (you should verify this): PDCC (ΩDCC ) =

all i

all j

PDCC ({(mi, rj )}) = 1.

(ΩDCC , 2 ΩDCC^ , PDCC ) forms a probability system.

  1. The decision element is a deterministic (non-random) mapping:

m b : {r 0 , r 1 , r 2 } → {m 0 , m 1 } m b : r 0 7 → m 0 m b : r 1 7 → m 1 m b : r 2 7 → m 1

This is just one example; there are 8 possible map- pings!

This mapping mb can be used together with the probability system (ΩDCC , 2 ΩDCC^ , PDCC ) to define the probability system (ΩD , 2 ΩD^ , PD ) de- scribed below:

Sample Space: ΩD = {(m 0 , m 0 ), (m 0 , m 1 ), (m 1 , m 0 ), (m 1 , m 1 )} where the first component of each pair denotes the message transmitted and the second component denotes the decision made. Events: 2 ΩD^ = {∅, {(m 0 , m 0 )}, {(m 0 , m 1 )},... , ΩD } Probability function: completely specified by the probability function PDCC ( ), the decision mapping mb( ), the axioms and the following:

PD ({(m 0 , m 0 )}) = PDCC ({(m 0 , r 0 )}) PD ({(m 0 , m 1 )}) = PDCC ({(m 0 , r 1 ), (m 0 , r 2 )}) PD ({(m 1 , m 0 )}) = PDCC ({(m 1 , r 0 )}) PD ({(m 1 , m 1 )}) = PDCC ({(m 1 , r 1 ), (m 1 , r 2 )})

(this corresponds to the above example of mapping mb( )).

The probability system (ΩD , 2 ΩD^ , PD) describes the overall operation of the discrete communication channel. Its performance is measured by its probability of correct decision P (C ) (page 35, W&J). We call C = {(m 0 , m 0 ), (m 1 , m 1 )} ⊂ ΩD the correct decision event. For the above example of mapping mb( ), the correct decision event C ⊂ ΩD corresponds to C˜ ⊂ ΩDCC given by:

C˜ = {(m 0 , r 0 ), (m 1 , r 1 ), (m 1 , r 2 )}

In general, C˜ is given by:

C˜ = {(mi, rj ) : ˆm(rj ) = mi, ∀i ∈ { 0 ,... , M − 1 }, ∀j ∈ { 0 ,... , J − 1 }} = {( ˆm(rj ), rj ) : j ∈ { 0 ,... , J − 1 }}