Statistical Distribution Fitting - Banking - Lecture Slides, Slides of Banking and Finance

E Banking is closely associated with computer sciences. In these Lecture Slides, the lecturer has explained the following aspects of Banking : Statistical Distribution Fitting, Exact Science, Physical, Logical Process, Generates The Data, Consider Range, Infinite Both Ways, Positive, Bounded, Exponential

Typology: Slides

2012/2013

Uploaded on 07/30/2013

ahmad.ali
ahmad.ali 🇮🇳

3.7

(3)

78 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Statistical
Distribution Fitting
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Statistical Distribution Fitting - Banking - Lecture Slides and more Slides Banking and Finance in PDF only on Docsity!

Statistical

Distribution Fitting

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Some Issues in Fitting Input

Distributions

  • Not an exact scienceno “right” answer
  • Consider physical or logical process that

generates the data

  • Consider range of distribution
    • Infinite both ways (e.g., normal)
    • Positive (e.g., exponential, gamma)
    • Bounded (e.g., beta, uniform)
  • Consider ease of parameter manipulation to

affect means, variances - decision variables

  • Outliers, multimodal data
    • Maybe split data set (see textbook for details)
    • Consider theoretical vs. empirical

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Chi-Squared Test

  • Formalizes this notion of

distribution fit

  • Oi represents the number of

observed data values in the i -

th interval.

  • pi is the probability of a data

value falling in the i -th interval

under the hypothesized

distribution.

  • So we would expect to

observe Ei = npi , if we have n

observations

frequency

data values

O i

pdf

data values

p i

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Chi-Squared Test

  • So the chi-squared statistic is
  • By assuming that the O i - E i

terms are normally

distributed,

  • it can be shown that the distribution of the statistic is

approximately chi-squared with k-s-1 degrees of freedom

  • s is the number of parameters of the distribution
  • Hint: consider

 (^) −

k

i i

i i

E

O E

∑ = (^) 

 − =

k

i i

i

i

p

p n

O

χ 0

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Chi-Squared Test

  • If the expected frequencies E i

are too small, then

the test statistic will not reflect the departure of

the observed from the expected frequencies.

  • The test can reject because of noise
  • In practice a minimum of E (^) i ≥ 5 is used
  • If E (^) i is too small for a given interval, then adjacent intervals

can be combined

  • For discrete distributions
    • each possible discrete value can be a class interval
    • combine adjacent values if the E (^) i ’s are too small

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Chi-Squared Test

  • For continuous data
    • intervals that give equal probabilities should be used, not

equal length intervals

  • this gives a better power for the test
    • the power of test is the probability of rejecting a false hypothesis
  • it is not known what probability gives the highest power, but

we want

npi ≥ 5

≥ 5 k

n

k

pi

1

5

n k

Ei ≥ 5

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Eyeballing

  • Another method of seeing if a distribution fits sample data

is the q-q plot

  • x is the q-quantile of a random variable X with cdf F if F(x)=q or x=F -1(q)
  • Take a data sample {x 1 ,…x (^) n} and order them to get y 1y 2...y (^) n
  • y (^) j is an estimate of the (j - 0.5)/n quantile
  • Plot y (^) j versus F-1( (j - 0.5)/n )
  • This should give a straight line

0

10

20

30

40

50

60

70

80

0 5 10 15 20 Order Statistics

Exponential Quantile

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Eyeballing

  • Note:
    • Will never actually be a straight line
    • Order statistics are not independent
    • One point above line will likely be followed by another
    • The variance at the extremes is larger
    • So for exponential, you will likely see more discrepancy at larger values

Order Statistics

Exponential Quantile

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Comparing the Two Tests

  • The Chi-Squared Test
    • Not just a maximum deviation, but a sum of squared

deviations

  • Uses more of the information in the data
  • So it needs more data to be accurate
  • Is more accurate if it has enough data
  • The Kolmogorov-Smirnov Test
  • Just a maximum deviation
  • Needs less data to be accurate
  • Is less accurate with more data

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Empirical Distribution

  • “Fit” Empirical distribution (continuous or

discrete): Fit/Empirical

  • Can interpret results as a Discrete or Continuous

distribution

  • Discrete: get pairs ( Cumulative Probability, Value)
  • Continuous: Arena will linearly interpolate within the data range

according to these pairs (so you can never generate values outside

the range, which might be good or bad)

  • Empirical distribution can be used when “theoretical”

distributions fit poorly, or intentionally

  • When sampling from the empirical distribution, you are just

re-sampling from the data

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Multivariate and Correlated Input

Data

  • Usually we assume that all generated random

observations across a simulation are

independent (though from possibly different

distributions)

  • Sometimes this isn’t true:
    • If a clerk starts to get long jobs, they may get tired and slow

down

  • A “difficult” part requires long processing in both the Prep

and Sealer operations

  • Ignoring such relations can invalidate model

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Checking for Auto-Correlation

  • Suppose we have a series of inter-arrival times
    • What is the relationship between the j-th observation and

the (j-1)st?

  • What is the relationship between the j-th observation and

the (j-2)nd?

  • We are talking about auto-correlation as the

series is correlated with itself

  • How many steps back we are looking is called the

lag

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Time Series Models

  • If the auto-correlation calculations show a

correlation, then you may have to use a time-

series model

  • Such models are auto-regression models and

moving average models

  • Using the auto-correlation and another concept

called the partial auto-correlation, you can fit

these models

  • The details are too much for this course

Simulation with Arena — Statistical Distribution Fitting (^) C5/

Multivariate Input Data

  • A “difficult” part requires long processing in both

the Prep and Sealer operations

  • The service times at the Prep and Sealer areas would be

correlated

  • Some multivariate models are quite easy, for instance the

multivariate normal model

  • You can also use the multiplication rule, to specify the

marginal distribution of one time and then specify the other

time conditional on the first time

f (^) X , Y ( x , y ) = fX | Y ( x | y ) fY ( y )