Basic Statistics - Instrumentation, Measurements, Statistics - Lecture Notes, Study notes of Electronic Measurement and Instrumentation

Complete lecture series on Instrumentation, Measurements, Statistics course is available at docsity. Its free to download for everyone. This lecture contains following keywords: Basic Statistics, Data Analysis Using Statistics, Statistics Definitions Associated with Systematic Error, Average Absolute Deviation, Sample Standard Deviation, Standard Error, Root Mean Square Error, Mean Bias Error

Typology: Study notes

2012/2013

Uploaded on 10/02/2013

sonu-kap
sonu-kap 🇮🇳

4.4

(40)

162 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Basic Statistics, Page 1
Basic Statistics
Introduction
The purpose of this learning module is to introduce you to some of the fundamental definitions and
techniques related to analyzing measurements with statistics.
In all the definitions and examples discussed here, we consider a collection (sample) of measurements of a
steady parameter. E.g., repeated measurements of a temperature, distance, voltage, etc.
Basic Definitions for Data Analysis using Statistics
First some definitions are necessary:
o Population – the entire collection of measurements, not all of which will be analyzed statistically.
o Sample – a subset of the population that is analyzed statistically. A sample consists of n measurements.
o Statistic – a numerical attribute of the sample (e.g., mean, median, standard deviation).
Suppose a population – a series of measurements (or readings) of some variable x is available. Variable x can
be anything that is measurable, such as a length, time, voltage, current, resistance, etc.
Consider a sample of these measurements – some portion of the population that is to be analyzed
statistically. The measurements are x1, x2, x3, ..., xn, where n is the number of measurements in the sample
under consideration. The following represent some of the statistics that can be calculated:
Mean – the sample mean is simply the arithmetic average, as is commonly calculated, i.e.,
1
1n
i
i
x x
n
,
where i is one of the n measurements of the sample.
o We sometimes use the notation xavg instead of x to indicate the average of all x values in the sample,
especially when using Excel since overbars are difficult to add.
o The sample mean, although it is the simplest statistic to calculate, is not always as useful as the sample
median, which is discussed later.
Deviation – the deviation of a measurement is defined as the difference between a particular measurement
and the mean, i.e., for measurement i, ii
dxx .
o When considering a group or sample of measurements, the deviation of one particular measurement is
the same as the precision error or random error of that measurement.
o Deviation is not the same as accuracy error. Recall that accuracy error (inaccuracy) is defined as the
difference between a particular measurement and the true value of the quantity being measured:
(accuracy error = xixtrue). Because of bias (systematic) error, xtrue is often not even known, and the mean
is not equal to xtrue if there are bias errors.
Average deviation – to get some feel for how much deviation is represented in the sample, we might first
think of averaging all the deviations to obtain some kind of mean or average deviation. It turns out that the
average of all the deviations is zero! Try it for any set of numbers, and you will convince yourself that this is
true. Why? Because by definition, some of the measurements are smaller than the average, and some are
larger, and the average deviation turns out to be a meaningless and worthless calculation – it is always zero.
Average absolute deviation – a better measure of deviation is the average absolute deviation (also called the
average positive error), defined as the average of the absolute value of each deviation. Mathematically,
1
1n
i
i
dd
n
, where |di| is called the absolute deviation or the positive error.
Sample standard deviation – an even better, and more accepted measure of how much deviation or scatter is
in the data is obtained by calculating the sample standard deviation. For n measurements,

2
2
11
11
nn
ii
ii
dxx
Snn




.
o S is kind of like an average of the deviations, but it is constructed by taking the square root of the average
of the squared deviations, since di can be either positive or negative.
o Notice that the denominator is n – 1, not simply n. It turns out that for small sample size (small n), n – 1
yields a better estimate of the standard deviation than does n itself. (Details are beyond the scope of this
course.) As n gets big, the difference between using n or n – 1 in the denominator becomes negligible.
docsity.com
pf3

Partial preview of the text

Download Basic Statistics - Instrumentation, Measurements, Statistics - Lecture Notes and more Study notes Electronic Measurement and Instrumentation in PDF only on Docsity!

Basic Statistics

Introduction  The purpose of this learning module is to introduce you to some of the fundamental definitions and techniques related to analyzing measurements with statistics.  In all the definitions and examples discussed here, we consider a collection (sample) of measurements of a steady parameter. E.g., repeated measurements of a temperature, distance, voltage, etc.

Basic Definitions for Data Analysis using Statistics  First some definitions are necessary: o Population – the entire collection of measurements, not all of which will be analyzed statistically. o Sample – a subset of the population that is analyzed statistically. A sample consists of n measurements. o Statistic – a numerical attribute of the sample (e.g., mean, median, standard deviation).  Suppose a population – a series of measurements (or readings) of some variable x is available. Variable x can be anything that is measurable, such as a length, time, voltage, current, resistance, etc.  Consider a sample of these measurements – some portion of the population that is to be analyzed statistically. The measurements are x 1 , x 2 , x 3 , ..., xn , where n is the number of measurements in the sample under consideration. The following represent some of the statistics that can be calculated:

Mean – the sample mean is simply the arithmetic average , as is commonly calculated, i.e., 1

1 n i i

x x n (^) 

where i is one of the n measurements of the sample. o We sometimes use the notation x avg instead of x to indicate the average of all x values in the sample, especially when using Excel since overbars are difficult to add. o The sample mean, although it is the simplest statistic to calculate, is not always as useful as the sample median , which is discussed later.  Deviation – the deviation of a measurement is defined as the difference between a particular measurement and the mean , i.e., for measurement i , di^ ^ xi^ ^ x. o When considering a group or sample of measurements, the deviation of one particular measurement is the same as the precision error or random error of that measurement. o Deviation is not the same as accuracy error. Recall that accuracy error ( inaccuracy ) is defined as the difference between a particular measurement and the true value of the quantity being measured: (accuracy error = xix true ). Because of bias (systematic) error, x true is often not even known , and the mean is not equal to x true if there are bias errors.  Average deviation – to get some feel for how much deviation is represented in the sample, we might first think of averaging all the deviations to obtain some kind of mean or average deviation. It turns out that the average of all the deviations is zero! Try it for any set of numbers, and you will convince yourself that this is true. Why? Because by definition, some of the measurements are smaller than the average, and some are larger, and the average deviation turns out to be a meaningless and worthless calculation – it is always zero.  Average absolute deviation – a better measure of deviation is the average absolute deviation (also called the average positive error ), defined as the average of the absolute value of each deviation. Mathematically,

1

1 n i i

d d n (^) 

  , where | d i | is called the absolute deviation or the positive error.

Sample standard deviation – an even better , and more accepted measure of how much deviation or scatter is in the data is obtained by calculating the sample standard deviation. For n measurements,

2 2 1 1 1 1

n n i i i i

d x x S n n

 

o S is kind of like an average of the deviations, but it is constructed by taking the square root of the average of the squared deviations, since d (^) i can be either positive or negative. o Notice that the denominator is n – 1, not simply n. It turns out that for small sample size (small n ), n – 1 yields a better estimate of the standard deviation than does n itself. (Details are beyond the scope of this course.) As n gets big, the difference between using n or n – 1 in the denominator becomes negligible.

Sample variance – the sample variance of the sample is simply the square of the sample standard

deviation , namely, sample variance^ ^ S^2.

Relative standard deviation – the relative standard deviation of the sample is simply the sample standard

deviation divided by the mean , namely,

S

RSD

x

o RSD is nondimensional. o RSD is usually written as a percentage (multiply RSD by 100%); it is then sometimes called % RSD.  Standard error – the standard error is the standard deviation divided by the square root of the number of

measurements , namely, standard error^ ^ S^ / n.  Median – the median of the sample is defined as the value at which half of the measurements are lower and half are higher. o A simple way to calculate median is to order all the measurements from lowest to highest. If n is odd, the number in the middle is the median. If n is even, the median is the average of the middle two values. o The median is sometimes more useful than the mean, particularly in cases where one or two values are significantly different than the rest of the values.  Mode – the mode of the sample is the most probable value of the n measurement – the one that occurs most frequently. o Mode is not used as often as mean or median because it can be a misleading quantity, especially if the sample size is small and/or the distribution of measurements is not purely random. o If none of the measurements are repeated, the mode is undefined.

Example : Given: Ten length measurements: 12.1, 12.3, 12.2, 12.2, 12.4, 12.3, 12.2, 12.4, 12.2, and 12.5 m. To do: Calculate the mean, variance, average absolute deviation, standard deviation, median, and mode. Solution: o We use Excel for convenience. A portion of the spreadsheet is shown below:

o In Excel, the simplest way to obtain the statistics is to use the built-in statistical analysis macro:  Office 2003 : Tools-Data Analysis-Descriptive Statistics-OK.  Office 2007 : Click the Data tab instead of Tools – the rest is the same.  Click in the field called Input Range , and highlight all the measurements.  Select Output Range as the Output Option , click in the field for Output Range , and then click on a cell in some clean (available) portion of the spreadsheet.  Select Summary Statistics, and OK.