Download Probability Distributions and Statistical Analysis: Mean and Variance and more Papers Physical Chemistry in PDF only on Docsity! Chem 114 Statistics Lectures In all of our experiments, we are going to try to measure some characteristic of some solution, solid, materials, etc. Say that we label this value V The final answer that is presented in our report (and in the abstract of our report) is not just V, but rather V +/- ∆V (1) In this case, ∆V is the uncertainty in V. This series of lectures is devoted to building an understanding on obtaining a reliable ∆V. Assume we do a lab in which we measure the volume of a box. In this case, Volume = Vo = Ho X Lo X Wo (2) The uncertainty in volume is going to be dependent upon how well we measure the three quantities (height, length, and width) upon which it depends. Since the error associated with each individual measurement is going to propogate our experimental measurement, impacting the error associated with the final measurement, we do what is called a “Propogation of error” calculation. In each lab, you will do such a calculation, setting up the equation that is particular to your own lab experiment. For the case of the volume of the box, the equation looks something like the following: V = Vo +/- ∆L (δV/δL)WoHo + ∆W(δV/δW)LoHo + ∆H(δV/δH)LoWo (3) which can readily be solved to yield V = Vo +/- ∆L X Wo X Ho + ∆W X Lo X Ho + ∆H X Lo X Wo (4) So now we know that ∆V comes from ∆W, ∆L, and ∆H. Our problem, rather than getting simpler, is rapidly getting more complicated! Furthermore, we also need to define three other new constants if we want to solve (4) -- Wo, Ho, and Lo. Let’s deal with these first. Suppose that we have just measured the width of the box 11 times, and arrived at the following numbers: Widthi Column 1 82.9 83.5 Mean 83.52 83.7 Standard Error 0.12 83.5 Median 83.5 83.9 Mode 82.9 83.4 Standard Deviation 0.40 83.2 Variance 0.16 82.9 Kurtosis -0.788537037 83.6 Skewness -0.249861798 84 Range 1.2 84.1 Minimum 82.9 Maximum 84.1 Sum 918.7 Count 11 In the table above, the measurements are given in the left hand column. In the middle and right columns are statistical terms and their respective numbers. Let’s discuss what these statistical terms are, one at a time. The median is defined as the following: The median of the parent population (called µ1/2) is defined as that value for which, in the limit of an infinite number of determinations of Wi, half the observations will be less than the median, and half will be greater. The mean of N measurements of W is defined as: Wo = µ = 1/N(ΣWi) (5) For the given example, N = 11, and µ is the symbol representing the mean. Hopefully, the concepts of the median and the mean are not new to you. The mode is the most likely value. In the above example, it is listed as 82.9, but it could just as easily have been listed as 83.5, since there are two occurrences of each of these values in the table of observations. This value only takes significance in the limit of a large number of observations, and obviously such a limit has not been reached here. Now we come to numbers that describe the spread in the data. The first number is the variance, or σ2. σ2 = lim [ (1/N) Σ (xi - µ)2 ] = lim [ (1/N) Σ xi2] - µ2 (6) The standard deviation is the square root of the variance, and is thus denoted as σ. It turns out that the sample variance is a calculation which utilizes all of the individual measurements, plus the average. Since the average value, or the mean, is not an independent variable, then we need to multiply our calculated variance by the following correction: variance = [N/(N-1)]σ2, and the standard deviation is the square root of this number. Thus, both the variance and the standard deviation are actually slightly large numbers than those represented by equations (6). Obviously, in the limit of large numbers of measurements, this definition is the same as that given previously. In the limit of small numbers of measurements, neither the variance nor the standard deviation mean very much to begin with. Above we utilized the ‘generic’ probability distribution P(x), which we didn’t explicitly define. Let’s define it now. If we measure the length of a box several times with a meter stick, then chances are we will end up with some particular distribution of measurements that fits a Gaussian probability distribution. If we can then determine just what that characteristic Gaussian function looks like, then we can operate on it with the mean and variance operators, and determine the average values and their corresponding standard deviations. Thus, let’s consider various probability distribution functions. Recall that above it was mentioned that probability distributions could often be classified as Binomial distribution functions, Gaussian distribution functions, and Poisson distribution functions. Others, such as Lorentzian distributions, Boltzman distributions, etc., are possible as well. By far the most commonly encountered function will be the Gaussian distribution function, and so let’s consider this important function first. The Gaussian Probability Distribution The Gaussian probability function is defined as PG(x;µ,σ) = 1 2 1 2 2 σ π µ σexp − − x (14) A plot of the Gaussian is shown below: There are several ways to quantify the Gaussian curve shown here. First, if a line is drawn such that it is tangent to the steepest part of the curve, it will intersect the curve at +/- σ. It will intersect the x-axis at +/- 2σ. This width may also be quantified as the exp(-1/2) value of the curve. i.e., when the curve is e-1/2 times its value at µ, the x-value will be µ +/- σ. This the same as the standard deviation, or, in other words: ( )P e PG G( ; , ) ; ,/µ σ µ σ µ µ σ± = − 1 2 (15) A second way to specify the width of the curve is to use the full-width-at-half-maximum, of FWHM. This is commonly denoted by the symbol Γ, and may be defined by the following equality: ( )P PG G( ; , ) ; ,µ µ σ µ µ σ± =12 1 2 Γ (16) It turns out that Γ = 2.354σ. The significance of these various ways of measuring the peak width are important. For example, how certain will we be that a value will fall between µ+/-σ, or µ+/-Γ ? In fact, this is not a difficult question to answer. Recall that the Gaussian probability distribution was normalized. Thus, integrating the PG(x;υ,σ) from -σ to +σ, and dividing by 1, gives the fraction of observations that should fall within a single standard deviation. 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 0 1 µ σ 2 σ 3 σ- 3 σ - 2 σ - σ l i n e o f s t e e p e s t d e s c e n t t a n g e n t t o c u r v e a t σ ; i n t e r s e c t s x - a x i s a t 2 σ y = 0 . 6 0 6 = e - 1 / 2 = Γ 1 /2 µ σ µ σ µ σ − + ∫ P x dxG ( ; , ) = fraction of observations expected to fall within 1σ of µ. It turns out that 1σ is about 68% probability limit, and 2σ is about 95% probability limit. 3σ will be near 99%. The other distributions are given by the following equations: The Binomial distribution, or what are the chances for observing x successes out of n tries when the probability for success in each try is p: p x n p n x n x p pB x n x( ; , ) ! !( )! ( )= − − −1 (17) µ = np σ2 = np(1-p) The Poisson distribution . This is similar to the binomial distribution, although n is very large and µ is constant; i.e. it is appropriate for describing small samples from large populations. P x x ep x ( ; ) ! µ µ µ= − σ2 = µ (18) And, finally, the Lorentzian distribution for describing a natural, homogeneously broadened distribution (used for spectroscopic line shapes) P x x L ( ; , ) ( ) ( ) µ π µ Γ Γ Γ= − + 1 2 2 2 2 (19) An example of a Poisson distribution would be die roles. I.e., if you have two dice, then the chances of getting the following numbers are numbers probability 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/36 10 3/36 11 2/36 12 1/36 Now, if we make a set of N observations, then the probability for observing that set of observations is given by the product of the individual probability functions Pi(µ’)s: ( )P P x i N i N iµ µ σ π µ σ' ( ' ) exp ' = = − − = ∑Π 1 21 2 1 2 (25) Now, what we would like is for the product probability function to be a maximum - i.e. that we have chosen a µ’ such that we are most likely to see our set of observations. Thus, we need to maximize (25). It turns out that maximizing (25) is the same as minimizing the argument in the exponential, which we will call X: X x i= − − ∑12 2µ σ ' (26) dX d d d x xi i µ µ µ σ µ σ' ' ' ' = − − = − =∑∑ 1 2 0 2 (at the maximum) (27) which, because σ is a constant, gives µ' _ = ≡ ∑x N xi 1 (28) and, of course, we generate the answer that we should have. This is the method of Least Squares, and it is extremely powerful. We will explore this some more. END LECTURE TWO What is the uncertainty σ? Assume all data points are from the same parent distribution, so all have the same uncertainty σ. Recall: Propogation of error: σ σ δ δ σ δ δy a b y a y b 2 2 2 2 2 = + + ⋅⋅⋅ (29) where all the variance in each data point xi is weighted by the square of the effect δ δ u x i ' that the data has on the result. If all σi = σ, then substituting µ for y in (29), and xi as a varable: δµ δ δ δ ' x x N x Ni i i= =∑ 1 1 (30) substituting in µ’ for the parenthetical term in (30), we get σ σ σ σu i N N N N 2 2 2 2 2 21 1= = =∑ (31) recalling our argument based on dependent variables ( )σu iN x x= − −∑ 1 1 2 (32) Example 4.1: Let’s return to the box: Assume L = µL = 20.000 cm (known value). The student, after about 100 measurements, now has a data set from which to determine L. By considering the box, the ruler, and the student’s own near sightedness, the student determines that each measurement is good to about +/- 0.5 cm, and has the following set of measurements: 0 2 4 6 8 10 12 14 16 18 18 .5 19 19 .5 20 20 .5 21 21 .5 22 Series1 σi = σ = 0.5 cm Assume, for the moment, that the student calculates from the data µ = 19.942 cm if σ = 0.522, as calculated from from ( )σu iN x x= − −∑ 1 1 2 then σu = σ N = = 0522 100 0 0522 . . , so the student reports 19.94 +/- 0.05 cm, to within 1 σ confidence level. Now, with many more measurements, say 104, the student has generated the following set of data. 0 2 4 6 8 10 12 14 16 18 18 .5 19 19 .5 20 20 .5 21 21 .5 22 Series1 Obviously, the data is much cleaner, and the uncertainty of the new data can be readily calculated to be 10 10 4 2 = 10 times better than it was previously, so that now σu = 0.005 cm. At this point, other sources of error are probably important. Absolute and relative calibrations may not be good – for example, the meter stick may have a certain amount of absolute error. Perhaps, during the course of the 48 hour period in which the student took the huge number of measurement, the temperature in the room rose and fell, thus causing the box to slightly expand and contract. All sorts of things can happen during this time period. As a matter of course, if you want to get a better measurement, taking more data points using the same experimental approach is often not the best way to do things. Imagine that we are trying to count photon events, and we take a video camera to record the events. The camera may not be real sensitive, and so we may only be able to see the photon events if several photons arrive simultaneously. We can sit for a long time, and slowly build up statistics. Or, perhaps a better way would be to replace the tv camera with a much more sensitive detector. In this way, we can build up the equally good statistics in a much shorter time. The statistical uncertainty in our data is going to depend linearly on how good the detector is, but only on the square root of the amount of time that we sample. This is a subtle, but important point. If you can imagine a way to make your signal-to-noise improve in a way that is faster than time-averaging, then do it. A better experimental approach is often the way to go. Nonuniform uncertainties Suppose that the student takes the advice from the above paragraph, and, after measuring the box many times, decides to do the measurement differently. This second set of measurement is likely to be characterized by a different uncertainty than the first set. How can the student account for this? Obviously, the student would like to use all of the data that has been measured. In this case, each data point is weighted by its own uncertainty: The expected value of hk(xj) is y(xj) = NP(xj). This means that if we measure some value xj a certain fraction of the time (out of 100 measurements), then we would expect that fraction to be equal to the frequency of xj that corresponds to what is predicted by the parent probability distribution. Obviously we will not always get that frequency. As the matter of fact, we are only likely to get hk(xj) = NP(xj) a relatively small fraction of the time, just like we are only likely to measure xj = µ a small fraction of the time. Thus, each individual histogram bar, or measurement of hk(xj), has associated with it a mean and a standard deviation. Since we have already binned our data into a histogram, and since this means that there are only certain possibilities for xj (i.e. xj is discrete, not continuous), then the statistics that describe hk(xj) are going to be Poisson statistics, even though P(xj) may well be a Gaussian distribution function. Thus, if we call the mean of the 10 measurements of hk(xj) = µj, and the standard deviation is (according to Poisson statistics) given by: σ µj jh( ) = (39) Then, with those definitions, we can calculate χ2: ( ) ( )[ ] ( )χ σ 2 2 2 1 = − = ∑ h x NP x h j j jj n (41) This definition of χ2 implies that χ2 is a statistic that characterized the dispersion of the observed frequencies from the expected frequencies. The numerator is a measurement of the spread of the observations, while the denominator is a measurement of the expected spread. Thus, we might expect that in the case of good agreement, the actual spread over the P(x) xj hk(xj) P(xj) expected spread should be about equal to 1, and that the optimum value of χ2 would be n, the number of bins in our previous plots. This is almost true. If each measurement were to reproduce the predicted probability distribution exactly, then χ2 would equal 0. However, we recognize from our probability discussions that this is not likely to be the case. Instead, the expectation value for χ2 is: χ ν2 = = −n nc (42) In (42), ν is the number of degrees of freedom, and is equal to the number (n) of sample frequencies (in the graph on the previous page, n is equal to 12) minus nc, which is the number of constraints. A constraint is a parameter that has been calculated from the data to describe the probability function, NP(xj). Even if P(xj) is chosen completely independent of the sample distribution, it is still normalized to the total number of events in the distribution, so that the expectation value of χ2 must, at best, be χ2 1= −n . Usually χ2 is given as the reduced chi- square, which is χ χ νν 2 2 ≡ , which has an expectation value of χν2 1= . Values that are much larger than 1 result from large deviations from the assumed distribution, and possibility indicate an incorrect choice of the probability distribution. If the values are much smaller than 1 are also indicate something is wrong in the nature of the experiment. Problem: Assume the following data/histogram. The first column of number corresponds to length measurements, the second column corresponds to frequency. If the parent distribution is Gaussian with m = 20.00 and s = 0.5, then what is m sample and s sample? What is χ2? Plot the histogram with a curve of the parent distribution NP(x). 18.7 1 18.9 3 19.1 4 19.3 7 19.5 13 19.7 14 19.9 11 20.1 12 20.3 16 20.5 11 20.7 4 20.9 1 21.1 1 21.3 2 Least Squares fit of a Straight Line One of the most important and commonly used statistical tools is linear regression of a straight line. Fortunately, the technique for doing this is quite general - meaning that it is possible to take a set of data, and fit it to any particular functional form - not just a straight line. Polynomial fits, exponential fits, fits to Gaussian distributions, etc. are all possible with the technique of Least squares. We have already covered this a little bit, so you should be at least a little familiar with the technique. In this section, we use the technique for fitting to a straight line, and we also try to point out where the technique is general for any functional form. Assume that you have measured a set of data points (xi,yi,). Define ao and bo such that: yo(x) = bo(x) + ao. Each yi is drawn from a Gaussian parent distribution, with µ = yo(xi), and σ = σi. The probability of observing some particular value xi is given by: ( ) P y y x i i i o i i = − − 1 2 1 2 2 σ π σ exp (43) 0 2 4 6 8 10 12 14 16 18 .7 19 .1 19 .5 19 .9 20 .3 20 .7 21 .1 Series1 where ∆ = = − ∑ ∑ ∑ ∑ ∑∑∑ 1 12 2 2 2 2 2 2 2 2 2σ σ σ σ σ σ σ i i i i i i i i i i i i x x x x x (58) If the equation that we were trying to fit the data to was the second order polynomial function y(x) = ax2 + bx + c, then we would have ended up with 3X3 determinants, one each for a, b, and c. Thus, as one goes to more and more complicated equations, the problem gets a little messier. Fortunately, most statistical analysis programs use a single generic routine that sends up N X N determinants for a problem of N variables, and solves for the answers algebraically. Problems: 1. Derive a formula for making a linear fit to data with an intercept at the origin so that y = bx. Apply your method to fit a straight line through the origin to the following coordinate pairs. Assume uniform uncertainties σ = 1.5 for the yi’s. Find χ2 for the fit, and the uncertainty in b. xi 2 4 6 8 10 12 14 16 18 20 22 24 __________________________________________________________________________________ yi 5.3 14.4 20.7 30.1 35.0 41.3 52.7 55.7 63.0 72.1 80.5 87.9 2. Find by numerical integration the probability of observing a value from the Gaussian distribution that is: more than 1 standard deviation from the mean more than 2 standard deviations from the mean more than 3 standard deviations from the mean 3. After measuring the speed of sound several times, a student conclude that the standard deviation of his measurements is σ = 12 m/s. Assume that the uncertainties are random, and that the experiment is not limited by systematic effects and determine how many measurements would be required to give a final uncertainty in the mean of +/- 2.0 m/s. 4. Find the uncertainty σx in x as a function of the uncertainties σu and σv in u and v for the following functions: a. x = ½ (u+v) b. x = uv2 c. x = u-2 d. x = uv2 e. x = u2 + v2 5. If the diameter of a round table is determined to within 1%, how well is its area known? Would it be better to determine its radius to within 1%? 6. Snell’s law relates the angle of refraction θ2 of a light ray travelling in a medium of index of refraction n2 to the angle of incidence θ1 of a ray travelling in a medium of index n1 through the equation n2sinθ2 = n1sinθ1. Find n2 and its uncertainty from the following measurements: θ1 = (22.03 +/- 0.2)o θ2 = (14.45 +/- 0.2)o n1 = 1.000 Assume that there is no uncertainty in n1. Problem # 1. Problem: Assume the following data/histogram. The first column of number corresponds to length measurements, the second column corresponds to frequency. If the parent distribution is Gaussian with µ = 20.00 and σ = 0.5, then what is µ sample and σ sample. What is χ2? Plot the histogram with a curve of the parent distribution NP(x). 18.7 1 18.9 3 19.1 4 19.3 7 19.5 13 19.7 14 19.9 11 20.1 12 20.3 16 20.5 11 20.7 4 20.9 1 21.1 1 21.3 2 The first thing to do here is to calculate µ and σ of the experimental data. µ = 19.94; σ2 = 0.279; σ = 0.528 From this, it is now possible to calculate an expected distribution, and compare that with the parent. Taking the values of µ(parent) = 20.00 and σ(parent) = 0.5, we can calculate P(xj) for xj = 18.7, 18.9, 19.1, etc. Simply plug µ(parent) and σ(parent) into the formula for a Gaussian distribution. Sum up the Gaussian probabilities for the individual histogrammed cells, and you will get a sum of about 4.97 (i.e. P x j x x j j ( ) . . = = ∑ 18 7 21 3 = 4.97). Then, multiply P(xj) by (100/4.97) to normalize to the 100 measurements that are listed in the histogram, and this yields NP(xj). Now, do the same thing for the experimental probability distribution - i.e. take µ=19 .9 4 and σ=0.528 and plug those numbers into a Gaussian probability distribution and calculate experim.(xj). Now, at each xj, calculate a µj, which is simply the square root of experim.(xj; σ=0.528, µ = 19.94) evaluated at 18.7, 18.9, etc. According to Poisson statistics, σj is then µj1/2 Now you have a table of values xj, h(xj) - NP(xj) and µj. Calculate [ ] χ σ 2 2 2= −h x NP xj j j ( ) ( ) = 12.23. From 12.23, we can calculate χ2v by dividing χ2 by (N-1). Since we have taken 14 measurements, N-1 is 13, and χ2v = 0.94. 0 2 4 6 8 10 12 14 16 18 .7 19 .1 19 .5 19 .9 20 .3 20 .7 21 .1 Series1 Problem 2. Derive a formula for making a linear fit to data with an intercept at the origin so that y = bx. Apply your method to fit a straight line through the origin to the following coordinate pairs. Assume uniform uncertainties σ = 1.5 for the yi’s. Find χ2 for the fit, and the uncertainty in b. xi 2 4 6 8 10 12 14 16 18 20 22 24 __________________________________________________________________________________ yi 5.3 14.4 20.7 30.1 35.0 41.3 52.7 55.7 63.0 72.1 80.5 87.9 Here, we just go to back to the least squares of a straight line discussion. Recall equation (45): ( ) ( )χ σ σ2 2 21= − = − − ∑∑ y y x y a bxi i i i i i we use the same equation, except that we set a = 0: ( ) ( )χ σ σ2 2 21= − = − ∑∑ y y x y bxi i i i i i and we minimize with respect to b: ( ) ( )δχδ δ δ σ σ 2 2 2 2 1 2 0 b b y bx x y bx i i i i i i i= − = − − =∑∑ setting all uncertainties to 1.5, we can then solve for b. x y b xi i i= ∑∑ 2 when we do this, we solve for b and get x y xiyi xixi bxi 2 5.3 10.6 4 7.2 4 14.4 57.6 16 14.4 6 20.7 124.2 36 21.6 8 30.1 240.8 64 28.8 10 35 350 100 36 12 41.3 495.6 144 43.2 14 52.7 737.8 196 50.4 16 55.7 891.2 256 57.6 18 63 1134 324 64.8 20 72.1 1442 400 72 22 80.5 1771 484 79.2 24 87.9 2109.6 576 sums 9364.4 2600 b= 3.601692 2. Integrate the equations. 3. To take σ from 12 m/s to 2 m/s, then one has to do (12/2)2, or a factor of 36 times more experiments. 4. Easy partial derivatives 5. It is better to determine the diameter to within 1% by a factor of 2