Probability Density Functions - Instrumentation, Measurements, Statistics - Lecture Notes, Study notes of Electronic Measurement and Instrumentation

Complete lecture series on Instrumentation, Measurements, Statistics course is available at docsity. Its free to download for everyone. This lecture contains following keywords: Probability Density Functions, Histogram, Pdf Curve, Continuous System, Standard Deviation, Population Standard Deviation

Typology: Study notes

2012/2013

Uploaded on 10/02/2013

sonu-kap
sonu-kap 🇮🇳

4.4

(40)

162 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Probability Density Functions, Page 1
Probability Density Functions
Probability Density Functions
Probability density function – In simple terms, a probability density function (PDF) is constructed by
drawing a smooth curve fit through the
vertically normalized histogram as
sketched. You can think of a PDF as the
smooth limit of a vertically normalized
histogram if there were millions of
measurements and a huge number of bins.
o The main difference between a
histogram and a PDF is that a
histogram involves discrete data
(individual bins or classes), whereas a PDF involves continuous data (a smooth curve).
x
f(x)
x1 x2 x3 ...
0.02
0.03
0
0.01
o Mathematically, f(x) is defined as
()
22
ii
i
dx dx
Px x x
fx dx
⎛⎞
−<+
⎜⎟
⎝⎠
=, where 22
ii
dx dx
Px x x
⎛⎞
−<+
⎜⎟
⎝⎠
represents the probability that variable x lies in the given range, and f(x) is the probability density
function (PDF). In other words, for the
given infinitesimal range of width dx
between xidx/2 and xi + dx/2, the
integral under the PDF curve is the
probability that a measurement lies
within that range, as sketched.
x
f(x)
xi + dx/2
0.02
0.03
0
0.01
xidx/2
dx
xi
22
ii
dx dx
Px x x
⎛⎞
−<+
⎜⎟
⎝⎠
o As shown in the sketch, this probability
is equal to the area (shaded blue region)
under the f(x) curve – i.e., the integral
under the PDF over the specified
infinitesimal range of width dx.
o The usefulness of the PDF is as follows: Suppose we choose a range of variable x, say between a and b.
The probability that a measurement lies
between a and b is simply the integral
under the PDF curve between a and b,
as sketched, where we define the
probability as
()(
xb
xa
Pa x b f xdx
=
=
<≤ =
x
f(x)
b
0.02
0.03
0
0.01
a
P(a < x b)
)
o If a and b +, the probability
must equal 1 (100%), i.e.,
()()
1
x
x
Px fxdx
=∞
=−∞
−∞ < < = =
.
In other words, the probability that x lies between – and + is 100% (a fact that should be obvious,
since there are no other possibilities for real number x).
o Once we have defined the probability density function f(x), we leave the system of discrete random
variables and enter the system of continuous random variables, on which we make some more formal
definitions:
Expected value is defined in terms of the probability density function as the mean of all possible x
values in the continuous system. Namely,
() ()
expected value
E
xxfx
μ
−∞
== =
dx
. In an ideal
situation in which f(x) exactly represents the population,
μ
is the mean of the entire population of x
values, and that is why it is called the “expected” value. It is therefore also called the population
mean. In general,
x
μ
, but x
μ
when n is large, i.e., the sample mean approaches the
docsity.com
pf3

Partial preview of the text

Download Probability Density Functions - Instrumentation, Measurements, Statistics - Lecture Notes and more Study notes Electronic Measurement and Instrumentation in PDF only on Docsity!

Probability Density Functions

Probability Density Functions

  • Probability density function – In simple terms, a probability density function ( PDF ) is constructed by drawing a smooth curve fit through the vertically normalized histogram as sketched. You can think of a PDF as the smooth limit of a vertically normalized histogram if there were millions of measurements and a huge number of bins. o The main difference between a histogram and a PDF is that a histogram involves discrete data (individual bins or classes), whereas a PDF involves continuous data (a smooth curve).

x

f ( x )

x 1 x 2 x 3 ...

o Mathematically, f ( x ) is defined as (^) ( )

i (^) 2 i 2 i

dx dx P x x x f x dx

= ⎝^ ⎠, where i (^) 2 i 2

dx dx P^ ⎛⎜^ x − < xx + ⎞⎟ ⎝ ⎠ represents the probability that variable x lies in the given range, and f ( x ) is the probability density function (PDF). In other words, for the given infinitesimal range of width dx between xidx /2 and xi + dx /2, the integral under the PDF curve is the probability that a measurement lies within that range, as sketched.

x

f ( x )

x i + dx /

x i – dx /

dx

xi

i i

dx dx

P x x x

⎜ −^ <^ ≤^ + ⎟

o As shown in the sketch, this probability is equal to the area (shaded blue region) under the f ( x ) curve – i.e., the integral under the PDF over the specified infinitesimal range of width dx. o The usefulness of the PDF is as follows: Suppose we choose a range of variable x , say between a and b. The probability that a measurement lies between a and b is simply the integral under the PDF curve between a and b , as sketched, where we define the probability as

( ) (

x b P a x b (^) x af x dx

=

x

f ( x )

b

a

P ( a < x ≤ b )

)

o If a → – ∞ and b → +∞, the probability

must equal 1 (100%), i.e., ( ) ( ) 1

x P x (^) x f x dx

=∞

In other words, the probability that x lies between –∞ and +∞ is 100% (a fact that should be obvious, since there are no other possibilities for real number x ). o Once we have defined the probability density function f ( x ), we leave the system of discrete random variables and enter the system of continuous random variables , on which we make some more formal definitions: ƒ Expected value is defined in terms of the probability density function as the mean of all possible x values in the continuous system. Namely, expected value μ E ( x ) xf ( x )

= = = ∫−∞ dx. In an ideal

situation in which f ( x ) exactly represents the population, μ is the mean of the entire population of x

values, and that is why it is called the “expected” value. It is therefore also called the population

mean. In general, x ≠ μ, but x → μ when n is large , i.e., the sample mean approaches the

expected value when n is large. x and μ are often used interchangeably, but this should be done

only if n is large. ƒ Standard deviation is defined in terms of the PDF as

2

standard deviation σ x μ f x dx

= = ∫−∞ −. In an ideal situation in which f ( x ) exactly represents

the population, σ is the standard deviation of the entire population. It is therefore also called the

population standard deviation. If n is large, S → σ. Often, S and σ are used interchangeably, but

this should be done only if n is large.

  • Normalized probability density function – a normalized probability density function is constructed by transforming both the abscissa (horizontal axis) and ordinate (vertical axis) of the PDF plot as follows: x z

= and f ( z ) = σ f ( x ).

o The above transformations accomplish two things: ƒ The first transformation normalizes the abscissa such that the PDF is centered around z = 0. ƒ The second transformation normalizes the ordinate such that the PDF is spread out in similar fashion regardless of the value of standard deviation. o When normalized in this way, the normalized PDF can be directly compared to standard PDFs, which we discuss in a later learning module. o To summarize, here are several steps used in Excel to generate a normalized PDF of experimental data:

  1. Generate the histogram with Excel as discussed in the histogram learning module. Excel generates a table called a frequency table. The table contains two columns, bin and frequency. Bin is the maximum value of the range of each bin, and frequency is the number of data points in that bin range. (For example, suppose there are 200 data points total, the mean value of x is 10.0, and the standard deviation of the data set is 3.0. Also suppose that 8 of those data points lie in the bin with x between 4 and 6 (4 < x ≤ 6). Thus, for this bin, Bin = 6 and Frequency = 8.)
  2. Create a new column called probability in which you divide each frequency by the total number of data points. This gives the probability that a data point lies in that bin, i.e. probability = frequency / n. (In the example here, probability = 8/200 = 0.040 or 4.0%.)
  3. Create a new column called x mid in which you list the mid value of each bin: x mid (^) = ( x min (^) + x max)/ 2. (In the example here, the mid value of the sample bin is (4 + 6)/2 = 5.0.)
  4. Create a new column called f ( x ) in which you divide each probability by the appropriate bin width, i.e., f ( ) x = probability /Δ x. (In the example here, the bin width of the sample bin is Δ x = 6 – 4 = 2, and f ( x ) = 0.04/2 = 0.02 at x = x mid = 5.0.) A smoothed plot of f ( x ) versus x is the PDF.
  5. Create a new column called z in which you normalize the x values into nondimensional z values.

This is accomplished by converting each mid value of x into z : z = ( x − μ ) /σ.

(In the example here, z for the sample bin is z = (5.0 – 10.0)/3.0 = –1.667.)

  1. Create a new column called f ( z ) in which you normalize the PDF into the f ( z ) values. This is accomplished by converting each f ( x ) into f ( z ): f ( ) z = σ⋅ f ( ) x. (In the example here, f ( z ) of the sample bin is f ( z ) = 0.02*3.0 = 0.060 at z = –1.667.)
  2. Finally, a plot of f ( z ) vs. z can be generated. A smooth curve through these data represents the normalized PDF.
  • Example : Given: The same 1000 temperature measurements used in a previous example for generating a histogram. The data are provided in an Excel spreadsheet (Temperature_data_analysis.xls) on the website. To do: Generate a PDF of these data. Normalize the PDF. Solution: o In a previous example (see the Histogram learning module), we generated a histogram of the temperature data. We begin with the bin and frequency data generated in Excel.