Error Analysis & Linear Regression in Physics: Determining Speed-Time Linear Relationship, Lecture notes of Law

An in-depth analysis of error bars and their usage in measurements. It includes the determination of the linear relationship between speed and time using the method of least squares and linear regression. The document also covers the concept of power laws and their representation on log-log paper.

Typology: Lecture notes

2021/2022

Uploaded on 07/05/2022

allan.dev
allan.dev 🇦🇺

4.5

(86)

1K documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
GRAPHICAL
ANALYSIS
A purpose
of
many experiments
is
to find the
relationship between measured variables. A good
way to accomplish this task is to plot a graph
of
the data and then analyze the graph. These guide-
lines should be followed in plotting your data:
1.
Use a sharp pencil
or
pen. A broad-tipped
pencil
or
pen will introduce unnecessary inac-
curacies.
2.
Draw your graph on a full page
of
graph
paper. A compressed graph will reduce the
accuracy
of
your graphical analysis.
3. Give the graph a concise title.
4. The dependent variable should be plotted
along the vertical
(y)
axis and the independent
variable should be plotted along the horizontal
(x) axis.
5. Label axes and include units.
6. Select a scale for each axis and start each axis
at
zero,
if
possible.
7. Use error bars to indicate errors in measure-
ments, for example,
Data
point It
Error
range
8. Draw a smooth curve through the data points.
If
the errors are random, then about one-third
of
the points
will
not
lie
within their error
range
of
the best curve.
The microcomputer
is
a powerful tool for data
analysis. Commercial software is available
that
handles
data
and instructs the microcomputer to
carry
out
graphical analysis.
See
your instructor
about the availability
of
this software for your
laboratory .
As an example consider the study
of
the speed
of
an object (dependent variable) as a function
of
time (independent variable). The data are as fol-
lows:
Speed
(m/s) Time (s)
0.45 ± 0.06 I
0.81
±0.06
2
0.91
±0.06
3
1.01
± 0.06 4
1.36
±0.06
5
1.56
±0.06
6
1.65
± 0.06 7
1.85 ± 0.06 8
2.17
±0.06
9
Using the above guidelines, the data are graphed
in Figure
1.7.
The graphed data show
that
the speed v is a
linear function
of
the time t. The general equation
for a straight line
is
y=mx+b
(22)
where m
is
the slope
of
the line and
b,
the vertical
intercept,
is
the value
of
y when x =
O.
Let v =
y,
18
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Error Analysis & Linear Regression in Physics: Determining Speed-Time Linear Relationship and more Lecture notes Law in PDF only on Docsity!

GRAPHICAL ANALYSIS

A purpose of many experiments is to find the relationship between measured variables. A good way to accomplish this task is to plot a graph of the data and then analyze the graph. These guide lines should be followed in plotting your data:

  1. Use a sharp pencil or pen. A broad-tipped pencil or pen will introduce unnecessary inac curacies.
  2. Draw your graph on a full page of graph paper. A compressed graph will reduce the accuracy of your graphical analysis.
  3. Give the graph a concise title.
  4. The dependent variable should be plotted along the vertical (y) axis and the independent variable should be plotted along the horizontal (x) axis.
  5. Label axes and include units.
  6. Select a scale for each axis and start each axis at zero, if possible.
  7. Use error bars to indicate errors in measure ments, for example,

Data point ItError range

  1. Draw a smooth curve through the data points. If the errors are random, then about one-third of the points will not lie within their error range of the best curve.

The microcomputer is a powerful tool for data

analysis. Commercial software is available that handles data and instructs the microcomputer to carry out graphical analysis. See your instructor about the availability of this software for your laboratory. As an example consider the study of the speed of an object (dependent variable) as a function of time (independent variable). The data are as fol lows:

Speed (m/s) Time (s) 0.45 ± 0.06 I 0.81 ±0.06 2 0.91 ±0.06 3 1.01 ± 0.06 4 1.36 ±0.06 5 1.56 ±0.06 6 1.65 ± 0.06 7 1.85 ± 0.06 8 2.17 ±0.06 9

Using the above guidelines, the data are graphed in Figure 1.7.

The graphed data show that the speed v is a

linear function of the time t. The general equation for a straight line is

y=mx+b (22)

where m is the slope of the line and b, the vertical

intercept, is the value of y when x = O. Let v = y,

GRAPHICAL ANALYSIS^19

~ E^ 1.

Ilv

Ilt

o 1 2 3 4 5 6 7 8 9 10

t (s) FIGURE L7 Speed versus time. The graphed data. v versus t. show a linear relation.

x = t, a =m, and Vo= b; then,

v = at + Vo (m/s) (23)

This is the form of the equation for the line drawn through the data, where Vo is the value of the velocity at t = 0 and a is the slope of the line that is the acceleration of the object. From the graph we see that Vo = 0.32 m/s. To determine the slope select two points on the line, but not data points, which are well separated, then

_ I _ av _ 2.35 0.40 (m/s)

a - s ope - At - 10.0 _ O.S (s)

= 1.95 (m/s) =020 (^12) (24) 9.5 (s). m s

The equation for the line is

v = 0.20t + 0.32 (m/s) (25)

The data plotted in Figure 1.7 are analyzed in the section on "Curve Fitting," page 23, as an example of linear regression. As a second example, let us consider the study of the distance traveled by an object as a function of time. The data are as follows:

Distance (m) Time (s)

0.20 ± 0.05 I 0.43 ±0.05 2 0.81 ± 0.05 3 1.57 ± 0.10 4 2.43 ± 0.10 5 3.81 ±O.IO 6 4.80± 0.20 7 6.39±0.20 8

The data are graphed, using the above guidelines, in Figure 1.8. In this instance a straight line through the data points would not be acceptable. An inspection of the graph suggests that d is proportional to t",

where n> I; for example, d may be a quadratic

function of time and, hence, n = 2. Suppose that we know the theoretical relation between d and t is

(m) (26)

where a is the object's acceleration. Often it is

useful to know if the data agree with the theory. If

the data follow the above theoretical relation, then a graph of d versus t^2 should result in a straight line.

o 4

tis) FlGURE L8 Distance versus time. The graphed data. d versus t. show a nonlinear relation.

GRAPHICAL ANALYSIS 21

I I ?:J 001

<II I I IL^ ______________^ _ dx

3 4 x (em) FIGURE 1.12 Light intensity versus sample thickness. The linear relation obtained on semilog paper shows that the data obey Lambert's law.

Again, the general equation of a straight line is of the form:

y=mx+b (30)

Now let y = log I, m = - 0.4341', and b = log 10 ,

Then, if log I is plotted vertically and x is plotted

horizontally, the curve will be a straight line with slope -0.4341' and vertical intercept log 10 , Using semilog paper, I is plotted on the logarithmic axis; the vertical intercept on this axis is 10 , Note that the slope of the line drawn through the data points may be used to calculate 1':

slope = L1(log I) log 10 -log 100 _ 0.294 em-I L1x (3.80-0.40) em (31)

From Lambert's law the theoretical slope is

slope = - 0.4341'

By equating theoretical and experimental slopes, we find that

-0.4341' = -0.294 em-I

and

l' = +0,678 em-I

EXERCISE 2

Suppose the functional relation between the de

pendent variable y and the independent variable x

is given by

where a and b are nonzero constants. Explain why

a graph of y versus x on semilog paper would not give a straight line.

PLO'ITING DATA ON LOG-LOG PAPER

Log-log paper is used to obtain a straight line plot when y and x satisfy a power-law relation:

Y =cx" (32)

where c and n are constants. For example, the

semimajor axis R of the orbit of a planet is related to its period (time for one revolution around the sun) T:

where K is a constant. R is nonlinearly related to T.

(^22) INTRODUcnON

S

~ ~

10 1

100 A (log lOT)

10-1~~~~~~~~~~~~~~~~~~ 10- 1 100 10 1 102 10 3 T(years)

FIGURE 1.13 Planets: Semimajor axis versus period. The linear relation on log-log paper indicates Rand r obey a power law of the form of equation 32.

A straight-line plot is obtained in the following

way. Take logarithms

log R = log(K I/3J'2/^3 )

= log T2/3 + log K I^ {

= 2/3 log T + log K I^ {3 (34)

Let y = log R, x = log T, m =~, and b = log K1/3.

Then a plot of log R versus log T would be a

straight line. Log-log graph paper automatically

takes the logarithm of the plotted data. A log-log

graph is shown in Figure 1.13.

The units used are years and astronomical

units (AU), where 1 AU is the semimajor axis of

earth's orbit. (The errors shown in the graph are

fictitious.) The slope of the log-log plot is

slo = &(log R) = log 102 - log 10°

pe &(log T) log 103 -log 10°

2-0 2 =3-0=) (35)

Note that the slope of the log-log plot is the

exponent of the power law relation. For example,

the power law relation y = ex" plotted on log-log

paper has a slope equal to n. Hence, a log-log

plot is a good way to determine the exponent in a

power law relation.

Another way to obtain a straight-line plot is to

plot y versus x" or R versus T2/3 on regular graph

paper (see Figure 1.14).

A problem with plotting R versus T2/3 is that

values of R less than about 1 AU cannot be

plotted with much accuracy.

In units of years and astronomical units the

constant K is one, and an inspection of the curve

in the figure shows a slope of approximately one.

40

35

30

15

10

5

I I I I I I I

IAR I I I I I _________________ JI AT''!

o 5 10 15 20 25 30 35 40

T'I'l (years''!) FIGURE 1.14 Planets: R versus r2/3. shOwing a linear relation. This graph requires knowing the exponent in the power-law relation.

24

If we minimize the exponent in equation 40, then

P(XI' ... , Xn) will be a maximum. The sum in the

exponent is called the least-squares sum,

and mmlmlzmg it is equivalent to mtrumtzmg

1:: (x - Xi ~ since (I is (assumed) a constant.

Note: We assume the. data points follow the

Gauss distribution, and the method of least

squares is used to find the most probable value.

METHOD OF LEAST SQUARES AND UNEAR REGRESSION

Given n data points (Xi,YI) (for example, Xi could

be the time and Yt the average speed of a falling

object), we would like to find the equation for the

best straight line. Typical data points (XI. YI) and

the equation of the line, which we want to deter

mine, are shown in Figure LIS. We make the

following assumptions:

I. The measured values (XI. Yt) are distributed

according to the Gauss distribution ( this is

usually so if the errors are random).

2. The errors in Xi' /)X i , are negligible in compari

son to the errors in YI' /)y; (then we only con

sider the distribution of the values y;).

3. The errors in yare all the same:

/)Y1 = /)Y2 = ... = /)Yn (then the standard devia

tion (ly is constant).

We approximate the set of n measurements (XI' YI)

FIGURE 1.15 Minimizing the least-squares sum gives the equation for the best straight line.

INTRODUCTION

by a linear relation:

y(X) = llo + 0IX (42)

The probability of obtaining the observed

value YI is

where

Y(X (^) i ) = best estimate for YI = 00 + 01 XI (44)

and (Iy is the theoretical standard deviation. The

probability P(YIo^ .•.^ ,Yn)^ of^ obtaining the set^ of

measurements is

P(YI' •.. , Yn) = P(YI )P(Y2) •.. P(Yn)

oc-1-exp[- f. (Yi- OO- 0 I X;)2] (45)

«(ly)n I_I 2(1;

We want this probability to be a maximum;

hence, the exponent (least-squares sum) must be a

minimum. Minimizing the least-squares sum gives

the equation for the best straight line.

In Figure 1.15, dl is the vertical distance from

each point (XI' Y/) to the line Y =llo + OtX. We

wish to find values of llo and 01 such that we

minimize the function M(oo, 01) defined to be

which is the exponent in. equation 45. Expanding

the squared term and ignoring the (assumed) con

stant (IY' we find that

M = 1:: (y/)2 - 20,1:: XIYI - 200 1:: YI

  • oi 1:: XT + 2ilo0t 1:: Xi + no~ (47)

where 1:: is understood as a sum over the index i.

Next we set

dM =0 and dM =0 (48)

dao dOl

to find 00 and al corresponding to the minimum

CURVE FITIING 25

value of M. This results in two simultaneous

equations:

dM

  • = - 21: Yi + 2at 1: Xi + 2nao = 0 dao

dM = _ 2 1: XiYi + 2al 1: xi + 2ao 1: Xi = 0 ( 49) da (^) l

which when solved for ao (intercept) and a)

( slope) yield

(1: xi) 1: Yi - (1: x;)(1: XiYI)

ao = n 1: xi - (1: X/)

n 1: X1Y/ (1: XI )(1: YI)

a - (51)

) - n 1: xi - (1: x;)

The equation for the best-fitting line is ob

tained by substituting equations 50 and 51 into

equation 42.

We ask this question: "What are the uncertain

ties in ao and a)?" Each Yi has an uncertainty

(assumed the same for all Yi) and, hence, ao and

al will both have uncertainties. These uncertain

ties are the standard deviations of the means, smao

and smal' To calculate smao and Sma 1 , we need the

standard deviation sy-

We ask the question: "What is the statistical

uncertainty in the measurements YI' Y2' •.. ,Yn ?"

In this case the standard deviation Sy is

S y=^ (52)

The standard deviation of the mean Smy is

For each Yi the result to be reported is

i = 1,2, ... ,n (54)

The reason for the factor of n - 2 in the denomi

nator of equation 52 is that the calculation of ao

and a) reduces the number of independent data

points (Xi' YI) from n to n - 2; the denominator in

the equation for the standard deviation is the

number of independent data points.

RellUU'k: It is important to check whether the

estimated errors, JyiO recorded during data taking

are consistent with the calculated statistical error

~my- A standard deviation of the mean Smy, which

is much larger than the estimated errors, JYi'

would indicate estimated errors that are unac

counted for. Experimental errors, JYi' which are

much larger than Smy suggest a too conservative

error estimate, that is, the JYi should have been

estimated as smaller values.

EXERCISE 3

A physicist plans to calibrate her equipment by

determining an average value for some parameter

x. She does this by measuring four values of X and

estimates the error Jx. Suppose that the values of

X + Jx are 2.741 +0.010, 2.832 ± 0.010, 2.678 ± 0.0.0, 2.763 ± 0.010. Calculate the mean, i, and

the standard deviation of the mean, Sm' Is her

estimated error too large, too small, or reason

able? Explain.

We now consider the errors in ao and aI' smao

and smal' Equations 50 and 51 give ao and al as

functions of the measured values (Xi' YI) where the

statistical error for each YI is given in equation 53.

Since ao and a) are known functions of Yi and the

errors in Yi are known, the errors in ao and a) may

be determined by error propagation. The basic

formula for error propagation, equation 12, may

be written as

JQ = L^ n^ (OQ)2 -^ (Jbj )^2 (55) i_I obj

where the measured values are b) ± {)b}, j = 1, 2, ... ,n, and {)Q is the error in the calcu

lated quantity Q(b) , b 2 , ... ,bn). Replacing JQ

and Jbj with standard deviations of the mean smQ

and 5mb'J and squaring, we have

Applying equation 56, smao is

where the partial derivative oaoloy/ is calculated

by using equation 50:

oao 7 xi - ( 7 Xi )X) (58) oY} = n 7 x; - ( 7 XI )

CURVE FITTING 27

such that we minimize the function M defined to or X^2 , test provides the answer to this question. X be is a number, without units, defined by

Taking the partial derivative of M with respect to ak and setting it equal to zero yields

where k = 0, I, 2, ... ,m. Equation 68 is a set of m + 1 equations in the m + 1 variables tlo, al> ... ,am which determines the best-fitting curve.

CHI-SQUARE TEST OF m

If a measurement is repeated many times then the distribution of measured values is expected to follow a theoretical distribution precisely in the limit that the number of measurements ap proaches infinity. The Gauss and Poisson distri butions are two of many theoretical distributions used in physics, corresponding to different kinds of experiments. (The Poisson distribution is dis cussed in Experiment 6.) Suppose we have repeated a measurement n times. We ask the question, "How do we deter mine whether the measurements follow the ex pected theoretical distribution?" The chi-square,

p(:c)

where m is the number of bins, Ok is the number of observed or measured values in the kth bin, and

Ek is the number of expected values in the kth bin.

The n measured values are divided into bins or ranges of values, where the bins must be chosen so that each bin contains several measured values. By assuming that the measurements follow an ex pected theoretical distribution, such as Gauss or Poisson distribution, we can calculate the expected

number Ek of measurements in each bin k:

where Pk is the probability that any measurement falls in bin k. Figure 1.16 shows a Gauss dis tribution with 6 bins and probabilities PI -P 6 , where PI = P 6 = 0.02, P 2 = P (^) s = 0.14, and P 3 = P 4 = 0.34 for the Gauss distribution. The interpretation of X2, calculated from equa tion 69, is as follows:

  1. If X2 = 0, then the measured values follow the theoretical distribution exactly.
  2. If X2 ~ m - c, then the agreement between the distribution of measured values and the theo retical distribution is good, where m is the

number of bins and c is the number of parame

ters that had to be calculated from the data to

i- 2u i- u i+u (^) i+ 2u x FIGURE 1.16 A Gauss distribution with six bins and probabilities PI through P 6 0

28

compute the expected number Erc. In statistical

calculations m - c is the number of degrees of freedom.

3. If 1! ~ m - c, then the agreement is bad.

A more precise interpretation of X^2 is obtained from a table of values of X2.

Example A distance is measured 20 times. The measured values of x (in em) are given in Table 1.1. The mean value, calculated from equation I, is x = 16.70 em. From equation 2 the standard devi ation is s = 0.16 em. To simplify the detennina tion of Prc. we choose the bin boundaries at x - s, x, and x + s, giving four bins as shown in Table

1.2. The probability Prc is shown in Figure 1.

TABLE 1.1 1WENTV MEASUREMENTS OF THE

DISTANCE x

16.7 16.9 16.8 16.7 16.8 16.7 16. 17.0 16.7 16.7 16.9 16.5 16.3 16. 16.8 16.7 (^) 16.6 16.4 16.7 16.

FIGURE 1.17 A Gauss distribution with four bins and probabilities PI through P 4 •

TABLE 1.2 DMDlNG THE 20 MEASURED CALCULATION

INTRODUCTION

and the expected number Erc is calculated from

equation 70 with n = 20. If a measured value falls

on a bin boundary, then the observed number is detennined by alloting 0.5 to each bin. X2 is calculated from equation 69, where m = 4. The

result is X2 = 0.11. To calculate Erc , two parame

ters, x and s, had to be detennined from the data. In addition,

is a constraint. Hence, c = 3 and m - c = 1. Since X2 < 1, the agreement is good. The probability obtained from a table of X values is that, on repeating the series of measure ments, larger deviations from the expected values would be observed. In this example the probabil ity, obtained from tables (see reference I), is between 0.90 and 0.95 that a set of measurements with two degrees of freedom will have X^2 > O.ll. In other words, if the set of measurements was repeated 100 times then we would expect that 90 to 95 cases would yield values of X^2 greater than 0.11. In interpreting the value of P obtained from tables, we may say that if

0.1 <P <0.9 (72)

then the assumed distribution very probably cor responds to the observed one, while if

P < 0.02 or P > 0.98 (73)

then the assumed distribution is very unlikely.

VALUES OF x INTO FOUR BINS FOR A X

Bin Number, k 2 'J 4

Range of x in each bin x <i-$ i-s<x<i i<x<i+s i+s<x or or or or x < 16.54 16.54 <x < 16.70 16.70 < x < 16.86 16.86 < x Probability Pk 0.16 0.34 0.34 0. Expected number Ek = nPk 3.2 6.8 6.8 3.

Observed number q 3 6.5 7.5 3