Linear Regression Exercise: One-Dimensional Arrays and Regression Analysis | Exercises Computer Science

Programming Exercise Eight (and last)

Objective

This assignment provides an example in the use of one-dimensional arrays and introduces the

concept of regression analysis, which is used to estimate a relationship between two variables.

Mathematical Background

If several measurements are made on pairs of experimental data {(xi,yi), i = 1,...,N}, we can use a

technique, known as regression analysis, to determine an approximate equation of a straight line

that gives a best fit to the data. The equation of this best-fit line is written as follows.

y = a + b x

In this equation, we use the symbol ^

y instead of y to indicate that the predicted value found from

the equation, ^

y = a + b x is an approximate result. For a given data point, (xi,yi), the value of yi

represents the actual data and we would obtain the predicted value of y, at the point x = xi from

the equation ^

yi = a + b xi. The difference between the measured and predicted value is |yi - ^

yi|.

Fitted Line

indicates data points

In the chart at the left, the data points are indicated by

the small ellipses. The coordinates of one of a typical

data point are shown by the dotted lines indicating the

coordinates xi and yi. The solid line is the fitted

regression line, ^

y = a + b x. The point where the dotted

line at x = xi crosses the regression line has the

coordinates (xi,^

yi). In this particular example the value of

yi is less than the value of yi. There is a large scatter of

data points about the regression line in this example.

The example plot above might represent calibration data on an instrument. The x values would

denote the instrument reading and the y values would indicate the true value of the quantity being

measured. Once the calibration tests were completed, it would be useful to have a simple

equation to relate the instrument reading (x) to the actual quantity being measured(y).

In addition to finding the values of a and b that give the best-fit line, we would also like to have

some measure of how well the line fits the data. Two different goodness-of-fit measures, the

standard error and the coefficient of variation are presented below in the equations section.

Equations used

The equations used to calculate a and b can be found by an analysis which minimizes the

distances between the actual data points, yi, and the fitted points, ^

yi = a + b xi. The results of this

analysis are shown below. The equations to compute the intercept, a, and the slope, b, in terms of

the entire set of data, {xi,yi}, use the following the definitions of mean values:

 



iix

xandy

docsity.com

Linear Regression Exercise: One-Dimensional Arrays and Regression Analysis, Exercises of Computer Science

Related documents

Partial preview of the text

Download Linear Regression Exercise: One-Dimensional Arrays and Regression Analysis and more Exercises Computer Science in PDF only on Docsity!

Programming Exercise Eight (and last)

Objective

Mathematical Background

Equations used

Task One