Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Probability Distributions in Hydrology: Binomial, Poisson, Normal, and Lognormal, Study notes of Research Methodology

An overview of various probability distributions commonly used in hydrology, including the binomial, poisson, normal, and lognormal distributions. The application of these distributions to hydrologic random variables and provides formulas for calculating their probability mass or density functions. It also discusses the distinction between functional and statistical relations in regression analysis and the use of anova for testing hypotheses about treatment effects.

Typology: Study notes

2012/2013

Uploaded on 10/03/2013

abani
abani 🇮🇳

4.4

(33)

85 documents

1 / 7

Related documents


Partial preview of the text

Download Probability Distributions in Hydrology: Binomial, Poisson, Normal, and Lognormal and more Study notes Research Methodology in PDF only on Docsity! 1 Applied Statistics in FE and LWE 1. Probability Functions (Compiled primarily from Yevjevich [1982]) Nearly all distributions of hydrologic random variables are empirical in nature, obtained from limited and observed data. Their best description is by a proper fit of probability distribution functions. It is already known that some distribution functions fit empirical distributions of particular hydrologic random variables well. Often, however, two or three functions fit them equally well. 1.1. Binomial Distribution The binomial discrete probability distribution applies to populations that have only two discrete but complementary events. It applies to hydrology whenever there are two events with the characteristics of occurrence and non-occurrence of a property (rainy and non-rainy day), yes-no (decisions to seed or not to seed clouds in a random seeding experiment), either-or, zero-greater than zero, plus-minus, zero-one, and other similar complementary events. The binomial probability mass functions is (1) where n is the number of independent observations (Bernoulli trials), x is the number of occurrences of an event (x = 0, 1, ..., n), n!x is the number of occurrences of its complementary event (n!x = n, n!1, ..., 0), p is the probability of the first event in an infinite number of trials, q is probability of the complementary event, p(x) is the probability of x to occur in n observations, and is given by . 1.2. Poisson Distribution If n and p of the binomial distribution are mutually dependent, and p is very small while n is very large, the binomial distribution becomes less applicable. Theoretically, if n goes to infinity while p tends to zero, but their product, np, tends to a constant 8, the distribution of x (x = 0, 1, ..., 4) follows the Poisson discrete distribution. In practice, however, it is not necessary that p 6 0 and n 6 4, but that p is relatively small and n relatively large, for the Poisson function to become applicable under the condition that 8 = np stays approximately constant while n and p change. The constant 8 is the only parameter of distribution that is (2) where x is the number of occurrences of an event of the given small probability, p, in a large number of trials, n, and p(x) is the probability of x = 0, 1, 2, ..., 4. 1.3. Normal Distribution The normal (Gaussian, Gauss-Laplacian) pdf of a random continuous variable x is (3) with : the expected mean, and F2 the variance of variable x. Eq 3 is often symbolized N(:, F2). The 2 standardized form, z = (x!:)/F, is (4) Four conditions are necessary for a variable to have a normal probability distribution: a very large number of causative factors affect the outcome of variable values, each factor taken separately has a relatively small influence on the outcome, the effect of each factor is independent of the effects of all other factors with the effects of various factors on the outcome being additive. Normal distribution is perhaps the probability distribution most commonly used in natural sciences. In hydrology, there are five major applications: (1) for fitting symmetric empirical frequency distributions of hydrologic random variables, (2) as a pdf in the analysis of random errors, (3) as the bench mark distribution for comparison with other distributions, (4) used for various types of statistical inferences as many hydrologic parameters may be exactly or approximately normally distributed, and (5) to be used in Monte Carlo simulation. 1.4. Lognormal Distribution When logarithms, ln x, of a variable x are normally distributed, then the distribution of x is said to follow logarithmic-normal probability distribution. The logarithmic-normal pdf of the variable ln x is given by , for x $ 0; f(x) = 0, for x < 0 (5) 2. Regression (Compiled primarily from Neter et al. [1989]) Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable can be predicted from the other, or others. In applying regression to BSysE research, one should distinguish between functional relation (perfect relation) and statistical relation (non- perfect relation). Regression analysis may be categorized into: simple linear regression, general linear regression (including multiple regression, polynomial regression, and autocorrelation in time series data), correlation analysis, and non-linear regression. Two general methods for finding estimators in regression analysis are the Maximum Likelihood method and the Least Squares method. 3. Analysis of Variance (Compiled primarily from Montgomery [1991]) We give the simplest, single-factor (one-way), fixed-effects model as an example illustrating how experimental data can be analyzed in a statistical manner. Suppose we have a treatments or different levels of a single factor that we wish to compare. The observed response from each of the a treatments is a random variable. The data would appear as in Table 1. Any entry in Table 1, e.g., Yij represents the jth observation taken under treatment i. There will be, in general, n observations under the ith treatment. These observations can be described with a linear statistical model 5 (1) Flow Time Method 25 1 A 26 1 A 34 1 B 35 1 B 45 2 A 43 2 A ... (2) Flow Time Method Replica 25 1 A 1 34 1 B 1 45 2 A 1 26 1 A 2 35 1 B 2 43 2 A 2 ... Depending on the format your original data were keyed in, you can choose between the aforementioned two data formats in performing SAS analysis. 6 Lab 7 Assignment 1. (a) Use data in file “Lab7A13.xls” (Table 1) to perform linear regression analysis. Specifically, distance is the predicting (or independent) variable, and all other variables (except Plot Nr.) are dependent variables. (b) Perform a correlation analysis using data (except the Plot Nr.) in the same file (Table 2). (c) What do the results suggest? Discuss the physical conditions in the field that might have led to these results. (7.5 pts) 2. (a) Use data in file “Lab7A2.xls” to test the normality of the soil hydraulic properties, porosity in cm3/cm3, hydraulic conductivity at saturation Ksat in cm/hr, organic matter (OM) content in percentage. (b) Ksat distribution seems peculiar, right? What transformation do you need to make for further normality test? Do it after your idea is approved by the instructor or TA. (c) What do you conclude? Any justification for such results and conclusions? (7.5 pts) 3. (a) Use data in file “Lab7A13.xls” (Table 3) to perform a one-way ANOVA. (b) What do those terms: Sum of Squares, Mean Square, Error, Root MSE, DF, F, Pr > F, stand for? Give formulas for calculating them if you can (Hint: find the answer from any book on experimental design and data analysis or from relevant lecture notes that are posted online.) (c) What do you conclude? Speculate the reasons that might have led to the results and conclusions. (Hint: use your most wild and creative thinking! Also, you can discuss with others.) (10 pts) 7 Helpful Information: • The plant community diversity of the study area was characterized by calculating the Shannon-Wiener index for each plot. The index (H) was calculated based on live shoot biomass according to Smith (1980) and Ram et al. (1989): H = 3.322 [log10N – 1/(NENilog10Ni)] where N is the total shoot biomass and Ni is the shoot biomass of species i. • Problem 1 (a) DATA A; INFILE 'C:\JOAN\TEACHING\BSYSE512\SPRING07\Lab71A.PRN'; INPUT PLOTNO H COVER LS A DIST; PROC PRINT; PROC REG; MODEL H = DIST; MODEL COVER = DIST; MODEL LS = DIST; MODEL A = DIST; RUN; • Problem 1 (b) DATA A; INFILE 'C:\JOAN\TEACHING\BSYSE512\SPRING07\Lab71B.PRN'; INPUT PLOTNO H COVER INC INV DEC LS A DIST; PROC PRINT; PROC CORR; VAR H COVER INC INV DEC LS A DIST; RUN; • Problem 2 DATA A; INFILE 'C:\JOAN\TEACHING\BSYSE512\SPRING07\Lab72.PRN'; INPUT P KSAT OM; KSATLOG = LOG (KSAT); PROC PRINT; PROC CORR; PROC UNIVARIATE NORMAL; VAR P KSAT KSATLOG OM; RUN; • Problem 3 DATA A; INFILE 'C:\JOAN\TEACHING\BSYSE512\SPRING07\Lab73.PRN'; INPUT PLOTNO BIOMASS CATEGORY $; PROC PRINT; PROC GLM; CLASS CATEGORY; MODEL BIOMASS = CATEGORY; MEANS CATEGORY/TUKEY; RUN;
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved