Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

An overview of probability distributions, focusing on smoothing, interpolation, and extrapolation. It covers discrete distributions like binomial, geometric, and poisson, as well as continuous distributions such as normal, gamma, exponential, and beta. The concept of expected value and its properties, and introduces the central limit theorem. It also provides resources for further study.

Typology: Study notes

Pre 2010

1 / 18

Download Probability Distributions: Smoothing, Interpolation, Extrapolation, Discrete & Continuous and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! 6 Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Note: parametric: assume a theoretical distribution (e.g., Gauss) Non-parametric: no assumption made about the distribution Advantages of assuming a parametric probability distribution: Compaction: just a few parameters Smoothing, interpolation, extrapolation Parameter: e.g.: µ,! population mean and standard deviation Statistic: estimation of parameter from sample: x ,s sample mean and standard deviation Discrete distributions: (e.g., yes/no; above normal, normal, below normal) Binomial: E 1 = 1(yes or success); E 2 = 0 (no, fail). These are MECE. P(E 1 ) = p P(E 2 ) = 1! p . Assume N independent trials. How many “yes” we can obtain in N independent trials? x = (0,1,...N !1, N ) , N+1 possibilities. Note that x is like a dummy variable. P( X = x) = N x ! "# $ %& p x 1' p( ) N ' x , remember that N x ! "# $ %& = N ! x!(N ' x)! , 0!= 1 Bernouilli is the binomial distribution with a single trial, N=1: x = (0,1), P( X = 0) = 1! p, P( X = 1) = p Geometric: Number of trials until next success: i.e., x-1 fails followed by a success. P( X = x) = (1! p) x!1 p x = 1,2,... x x s+ 7 Poisson: Approximation of binomial for small p and large N. Events occur randomly at a constant rate (per N trials) µ = Np . The rate per trial p is low so that events in the same period (N trials) are approximately independent. Example: assume the probability of a tornado in a certain county on a given day is p=1/100. Then the average rate per season is: µ = 90 *1 / 100 = 0.9 . P( X = x) = µ xe!µ x! x = 0,1,2... Question: What is the probability of having 0 tornados, 1 or 2 tornados in a season? Expected Value: “probability weighted mean” Example: Expected mean: µ = E( X ) = x.P( X = x) x ! Example: Binomial distrib. mean µ = E( X ) = x x=0 N ! N x " #$ % &' p x (1( p)1( x = Np Properties of expected value: E( f ( X )) = f (x).P( X = x); x ! E(a. f ( X ) + b.g( X )) = a.E( f ( X )) + b.E(g( X )) Example Variance Var( X ) = E(( X ! µ)2 ) = (x ! µ)2 x " P( X = x) = = x 2 x " P( X = x) ! 2µ x x " P( X = x) + µ 2 P( X = x) = E( X 2 ) ! x " µ 2 E.g.: Binomial: Var( X ) = Np(1! p) Geometric: Var( X ) = (1! p) p 2 Poisson: Var( X ) = Np = µ 10 Expected value: Probability weighted average E(g( X )) = g(x) f (x)dx x ! . Same as for discrete distributions: E = g(x i ) p(x i ) i ! . For example, the mean µ = E( X ) = xf (x)dx x ! , and the variance Var( X ) = E(( X ! µ)2 ) = (x ! µ)2 f (x)dx x " = (x 2 ! 2µx + µ2 ) f (x)dx x " = = x2 f (x)dx ! 2µ xf (x)dx x " + µ 2 f (x)dx x " = E( X 2 ) ! x " µ 2 An excellent resource is the NIST website www.itl.nist.gov/div898/handbook/index.htm and in particular the “gallery of distributions” www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm The dataplot software seems very nice, but I have not tried it: www.itl.nist.gov/div898/software/dataplot/ 11 Gaussian or normal prob. distribution: f (x) = 1 ! 2" e # ( x#µ )2 2! 2 Standard form (z is dimensionless) z = x ! µ " # x ! x s ; f (z) = e! z 2 / 2 2$ Central limit theorem: the average of a set of independent observations will have a Gaussian distribution for a large enough average. “Well behaved” atmospheric variables (e.g., T): even a one-day average is approximately Gaussian. Multimodal or skewed variables (e.g., pp): require longer averages to look Gaussian. How to use a table of Gaussian probabilities: Estimate µ and! from a sample and convert to a standard variable: z = x ! µ " ! x ! x s . The table gives F(z) = P(Z ! z) . The area is P(z 1 ! Z ! z 2 ) = F(z 2 ) " F(z 1 ) Example: What is the probability of a January average temperature of T ! 0C if µ ! TJan = 4 oC and ! ! s = 2 oC? z = T ! 4 2 = !4 2 = !2 (-2 standard deviations) F(!2) = 0.023 . Note that P(| Z |! 2" ) = 2 * F(#2) = 2 *0.023 ! 0.05 = 5% What are the Gaussian terciles for the temperature distribution? F(z)=0.666 z=0.43 T = µ ± 0.43! = 4oC+/-0.86oC daily averages pp monthly average monthly averages pp z2 z1 z -2 2 .023 .023 0 12 Other probability distributions: Gamma: for data>0, positively skewed, such as precipitation. f (x) = (x / !)" #1e# x /! !$(" ) ! : scale parameter ! : shape parameter !(" ) # (" $1)!(" $1) : gamma function (For integers !(n) = (n "1)!) µ = E( X ) = xf (x)dx = !" 0 # $ , ! 2 = E( X 2 ) " E( X )2 = #$ 2 For ! = 1the gamma distribution becomes the exponential distribution: f (x) = e! x /" " if x > 0, 0 otherwise. The cumulative distribution function is F(x) = 1! e ! x /" if x > 0, 0 otherwise. Beta: For data between 0 and 1 (e.g., RH, cloud cover, probabilities) f (x) = !(p + q) !(p)!(q) " # $ % & ' x p(1 (1( x)q(1, 0 ) x ) 1, p,q > 0 This is a very flexible function, taking on many different shapes depending on its two parameters p and q. Figure 4.11. 6 8 10 .5 2 2 4 f(x) .5! = 2! = 4! = 15 Parameter estimation (fitting a distribution to observations) 1) Moments fitting: compute the first two moments from a sample and then use distribution: x = x i n i=1 n ! ; s 2 = (x i " x )2 n "1 i=1 n ! , and then use these values in the Gaussian or other distribution. For example, for the Gaussian distribution, simply use µ̂ = x ; !̂ 2 = s 2 , and for the gamma distribution, x = !̂"̂;s2 = !̂"̂ 2 # "̂ = s 2 x ; !̂ = x 2 s 2 The NIST web site gives the mean and the standard deviation relationship to the parameters of the p[probability distribution. 2) Maximum likelihood method: maximize the probability of the distribution fitting all the observations {x i }. The probability of having obtained the observations is the product of the probabilities for each observation (for a Gaussian distribution), i.e. I(µ,! ) = f (x i ) = i=1 n " 1 ! n (2# )n e $ ( x i $µ )2 2! 2i=1 n % i=1 n " , or maximizing its logarithm: L(µ,! ) = ln(I ) = "n ln! " n ln (2# ) " (x i " µ)2 2! 2i=1 n $ . Then !L !µ = 0, !L !" = 0 gives the maximum likelihood parameters µ,! . Note that for the Gaussian distribution, this gives µ̂ = 1 n x i i=1 n ! (same as with the momentum fitting), but that !̂ 2 = 1 n (x i " x )2 i=1 n # . The most likely value for the standard deviation is not the unbiased estimator s. 16 Note: Likelihood is the probability distribution of the truth given a measurement. It is equal to the probability distribution of the measurement given the truth (Edwards, 1984). Goodness of fit Methods to test goodness of fit: a) Plot a PDF over the histogram and check how well it fits, (Fig 4.14) or b) check how well scatter plots of quantiles from the histogram vs quantiles from the PDF fall onto the diagonal line (Fig 4.15) 1) A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value. 2) Both axes are in units of their respective data sets. That is, the actual quantile level is not plotted. If the data sets have the same size, the q-q plot is essentially a plot of sorted data set 1 against sorted data set 2. c) Use the chi-square test (see above): X 2 = (O i ! E i ) 2 E ii=1 n " The fit is considered good (at a 5% level of significance) if X 2 < !(0.05,n"1) 2 Extreme events: Gumbel distribution 17 Examples: coldest temperature in January, maximum daily precipitation in a summer, maximum river flow in the spring. Note that there are two time scales: a short scale (e.g., day) and a long scale: a number of years. Consider now the problem of obtaining the maximum (e.g., warmest temperature) extreme probability distribution: CDF: F(x) = exp !exp ! x ! " # $ % & ' ( ) * + , -, . / , 0, : this can be derived from the exponential distribution (von Storch and Zwiers, p49). The PDF can be obtained from the CDF: PDF: f (x) = 1 ! exp "exp " x " # ! $ % & ' ( ) " x " # ! * + , -, . / , 0, Parameter estimation for the maximum distribution !̂ = s 6 " ; #̂ = x $ %!̂, % = 0.57721... , Euler constant. Note that x = ˆ! + "#̂ indicates that for the maximum Gumbel distribution, the mean is to the right of ˆ! , which is the mode (value for which the pdf is maximum, or most popular value, check the pdf figure). Therefore, for the minimum distribution, since the mean is to the left of the mode (check the pdf figure), the parameters are: !̂ = s 6 " ; #̂ = x + $!̂ (i.e., x = #̂ % $!̂), $ = 0.57721... ξ ξ Maximum Gumbel Minimum Gumbel 20 If instead of looking for the maximum extreme event we are looking for the minimum (e.g., the coldest) extreme event, we have to reverse the normalized x ! " # to ! x ! " # . The Gumbel minimum distributions become (with the standard version in parenthesis) PDF: f (x) = 1 ! exp "exp x " # ! $ % & ' ( ) + x " # ! * + , -, . / , 0, (= e"e x + x ) The integral of e !e x + x is !e !e x + const = 1! e !e x so that CDF: F(x) = 1! exp !exp x ! " # $ % & ' ( ) * + , -, . / , 0, (= 1! e!e x ) The PDF and CDF plots follow: 21 The return time of 10 years (marked in blue) is the extreme value that has a cumulative probability of 0.1 (for the minimum) or 0.9 (for the maximum). Return year: For Minimum Gumbel distribution How many years we need wait to see x<=X happen again? P(x<= -2.3) =CDF(x=-2.3)=0.1 return year= 1/CDF =10YEARS 22 Example of PDF and CDF for Gumbel (maximum) distribution: ! = 1, " = 1.79= ln6 . Note: The return time is computed from the CDF. The CDF probability 0.5 (which means that on the average it happens every other year) corresponds to a return time of 2 years, 0.9 to 10 years, etc. The PDF is only used to compare with a histogram. Multivariate Gaussian distributions For multivariate problems, Gaussian distributions are in practice used almost exclusively… For one (scalar) variable, the Gaussian distribution can be written as f (x) = 1 ! 2" e # ( x#µ )!#2 ( x#µ ) 2 or f (z) = 1 2! e " ( z )( z ) 2 For two variables f (x 1 ,x 2 ) = 1 2!( ) 2 " 1 2 x 1 ' x 2 ' x 1 ' x 2 ' " 2 2 e # x 1 x 2 $ % & ' ( ) # µ 1 µ 2 $ % & ' ( ) * + , , - . / / T 0#1 x 1 x 2 $ % & ' ( ) # µ 1 µ 2 $ % & ' ( ) * + , , - . / / 2 2 4 6 8 10 0 x 10 100 1000 2 Return time ! .2 .4 .6 .9