Image Sampling - Vision Systems - Lecture Notes, Study notes of Ophthalmology

Vision Systems lecture handout. Keywords of this handout are: Image Sampling, Photon Noise, Spatial Sampling, Chromatic Sampling, Absorption Spectra, Rods and Cones, Sampling and Color Matching, Intensity and Temporal Coding, Visual Transduction, Receptor Response Nonlinearity

Typology: Study notes

2011/2012

Uploaded on 12/21/2012

semerkant54_pick
semerkant54_pick 🇮🇳

4.4

(24)

103 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
85
VI. Image sampling
For most image-formation systems, the output image5 (the optical image), can be
completely described by a four-dimensional function n(x, y, t, l) that gives the mean
quantum irradiance as a function of space (x, y), time (t) and wavelength (λ). This is
usually a complete description because, in most situations, the image noise is that of an
inhomogeneous Poisson process (which is completely determined by its mean or
"intensity" function). All the image information available for performing a given visual
task is carried in such four-dimensional functions.
In biological vision systems, and in virtually all artificial (computer) vision systems,
the images formed by the optical system must (because of hardware/wetware limitations)
eventually be coded into a discrete representation in space, time, and wavelength. This
image sampling process is a crucial step in visual processing that can, and often does,
result in significant information loss. The loss results because it is often impossible to
sample all four dimensions with sufficiently high resolution. Thus, compromises must
always be struck.
In the biological vision systems, the first stage of sampling is carried out by the
photoreceptors, in artificial systems, usually by some other two-dimensional array of
elements (e.g., a CCD array or an array of photodiodes). The principles of sampling are
much the same in biological and artificial systems. A fundamental principle of sampling
is captured by the so-called Wittaker-Shannon sampling theorem which is useful for
understanding the information loss due to discrete sampling.
A. The sampling theorem
Below are statements of the Whittaker-Shannon sampling theorem for one and two
dimensions. Similar statements hold for higher dimensions.
The sampling theorem (one-dimensional case): If a one-dimensional function, f(x), is
limited to frequencies below wc cycles per unit value of x, then the function can be
completely reconstructed by taking 2wc evenly spaced samples per unit value of x.
The sampling theorem (two-dimensional case): If a two-dimensional function, f(x,
y), is limited to frequencies below uc cycles per unit value of x (in the x direction),
and to frequencies below vc cycles per unit value of y (in the y direction), then the
function can be completely reconstructed by taking 4ucvc samples per unit area on the
x, y plane.
In other words, what the sampling theorem says is that almost any smooth continuous
function can be represented exactly, with perfect precision, by the values of the function
5 For example, there would be two images in a binocular optical system.
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Image Sampling - Vision Systems - Lecture Notes and more Study notes Ophthalmology in PDF only on Docsity!

VI. Image sampling

For most image-formation systems, the output image^5 (the optical image ), can be completely described by a four-dimensional function n(x, y, t, l) that gives the mean quantum irradiance as a function of space (x, y), time (t) and wavelength (λ). This is usually a complete description because, in most situations, the image noise is that of an inhomogeneous Poisson process (which is completely determined by its mean or "intensity" function). All the image information available for performing a given visual task is carried in such four-dimensional functions.

In biological vision systems, and in virtually all artificial (computer) vision systems, the images formed by the optical system must (because of hardware/wetware limitations) eventually be coded into a discrete representation in space, time, and wavelength. This image sampling process is a crucial step in visual processing that can, and often does, result in significant information loss. The loss results because it is often impossible to sample all four dimensions with sufficiently high resolution. Thus, compromises must always be struck.

In the biological vision systems, the first stage of sampling is carried out by the photoreceptors, in artificial systems, usually by some other two-dimensional array of elements (e.g., a CCD array or an array of photodiodes). The principles of sampling are much the same in biological and artificial systems. A fundamental principle of sampling is captured by the so-called Wittaker-Shannon sampling theorem which is useful for understanding the information loss due to discrete sampling.

A. The sampling theorem

Below are statements of the Whittaker-Shannon sampling theorem for one and two dimensions. Similar statements hold for higher dimensions.

The sampling theorem (one-dimensional case): If a one-dimensional function, f(x), is limited to frequencies below w (^) c cycles per unit value of x, then the function can be completely reconstructed by taking 2wc evenly spaced samples per unit value of x.

The sampling theorem (two-dimensional case): If a two-dimensional function, f(x, y), is limited to frequencies below uc cycles per unit value of x (in the x direction), and to frequencies below v (^) c cycles per unit value of y (in the y direction), then the function can be completely reconstructed by taking 4ucvc samples per unit area on the x, y plane.

In other words, what the sampling theorem says is that almost any smooth continuous function can be represented exactly, with perfect precision, by the values of the function

(^5) For example, there would be two images in a binocular optical system.

at only at discrete set of points. This may seem rather surprising. It suggests that it is, in principle, possible to discretely sample an image without any loss of information. The minimum sampling rate required for perfect reconstruction (2wc for one dimensional

functions and 4u (^) cvc for two dimensional functions) is known as the Nyquist rate.

It is not too difficult to demonstrate why the sampling theorem is true. Consider a one dimensional function f(x). To be concrete, think of x as representing time. A discretely sampled function can be described as the product of the original continuous function, and a sampling function, which is a collection of impulse (delta) functions. The locations of the impulse functions are the locations of the samples. For regularly spaced sampling, with spacing of 1 sec, the sampling function is a comb function , which is defined as follows:

comb ( x ) (^) = δ( x (^) − i ) i = − ∞

where δ(x) is the impulse (delta) function (which is an impulse located at 0). For regularly spaced sampling, with a spacing of 1/w sec, the sampling function is

comb ( wx )^ = δ^ ( wx −^ i ) i =− ∞

where w is the sampling rate in samples/sec. Thus, the discretely sampled version of f(x) is given by,

f^ ˆ ( x ) = f ( x ) comb ( wx ) (6.3)

Now consider the Fourier transform of the sampled function. Using the facts (a) that the Fourier transform of the product of two functions is the convolution of the Fourier transforms of the individual functions, (b) that the Fourier transform of a comb function is another comb function (see Figure 22), and (c) that convolution with the symmetrical comb function is equivalent to cross correlation, we have

F^ ˆ ( u ) =^ F ( u ) ⊗^1 w

comb (

u w

Figure 6.1 illustrates the implications of this result. The top line shows a Fourier transform of a hypothetical function, f(x). The assumption that f(x) is limited to frequencies less that wc is represented by the Fourier transform [ F(u) ] being zero beyond

+/- wc Hz. The second line shows the Fourier transform of the sampling function when the sampling rate, w , equals 2wc ; in this case, the spacing between impulses in the Fourier

domain is 2wc. The rest of the lines in the figure illustrate the cross correlation of the

functions in the first two lines.

overlap region making recovery impossible. In particular the energy from the higher frequencies is added back into the energy at lower frequencies. The picturesque way of saying this is that high frequencies can masquerade (or alias) as low frequencies. This phenomenon is called aliasing. Obviously, the further the sampling rate is below 2wc the

worse will be the aliasing effects.

Parallel arguments can be used to demonstrate the sampling theorem in two or more dimensions.

The sampling theorem shows the minimum sampling rate required to reconstruct the original (frequency-limited) signal under the assumptions (a) that any possible signal (within cutoff frequency limits) might occur and (b) that no noise is present. If constraints exist on the possible signals (e.g., some signals are never or essentially never encountered in the environment) then it may be possible to reconstruct all naturally occurring signals with a sampling rate below the Nyquist rate. If there is noise present then it is generally impossible to exactly reconstruct the signal no matter what the sampling rate.

Constraints on possible visual signals and the implications of such constraints for image sampling and representation will be considered later. However, it is appropriate here to consider briefly the effects of photon noise on image sampling, because photon noise is always present in visual images. (Other sources of noise, such as neural noise, will be considered later.)

B. Effects photon noise on image sampling

Photon noise is always present in images. It is not eliminated, or even reduced, by the blurring/filtering done by the optics of a visual system, because it is reintroduced at the time of photon absorption in photoreceptors or sensors. (Recall that the Poisson noise is introduced at absorption because, at each point in time and space, photons have a very small, independent probability of being absorbed.) Therefore, any reduction in the image contrast due to optics always reduces the strength of the visual signal without reducing the photon noise level. In other words, the optics always produces some irreversible loss of visual information.

A precise sampled representation of the optical image, n(x, y, t, l), that would allow exact reconstruction (as envisioned by the sampling theorem) is impossible because of photon noise. Even if the wavelength and location in space/time of every photon could be precisely recorded (which is not completely possible because of the Heisenberg uncertainty principle), it would still be impossible to precisely reconstruct the optical image. The best one can do to reconstruct the image is use a maximum likelihood procedure to estimate the optical image from the recorded samples. Such a procedure involves picking the most probable (likely) optical image using the encoded samples and any prior knowledge or constraints. This prior knowledge would include the optical

Figure 6.

vision; specifically rods are responsible for encoding images under low light conditions and cones under high light conditions.

Photoreceptors can be divided into five basic parts: the outer segment (at the top), the inner segment, the cell body, the axon, and the synaptic terminal (at the bottom). Each outer segment contains millions of light sensitive molecules, photopigment molecules. A photopigment molecule will change its shape if it happens to absorb a photon in the correct way. This change in shape triggers a series of chemical reactions that ultimately results in modulation of the release of transmitter molecules (glutamate) from the synaptic terminals. The synaptic transmitter molecules, in turn, stimulate the next layer of retinal neurons. (This chain of events will be described in a bit more detail later.)

Figure 6.

Figure 6.4 shows a horizontal cross section of the retina at the receptor layer in the fovea (top), at 5 deg eccentricity (middle), and at 10 deg eccentricity (bottom). The rods are the little black dots that can be seen at 5 and 10 deg eccentricity. The larger circular regions are cones. The rods are 1-2 μ (millionths of a meter) in diameter; the cones are 2- 3 μ in diameter in the fovea, but increase in diameter away from the fovea. As can be seen, the photoreceptors are not uniformly distributed across the retina. The cones are very densely packed in the fovea and quickly decrease in density as a function of eccentricity. The rods are entirely absent in the fovea, and increase in density out to

approximately 20 deg eccentricity, beyond which their density begins to decline. Figure 6.5 shows quantitative data on the density of rods and cones as a function of eccentricity.

Figure 6.

There are a couple of potential weaknesses in this argument that need to be considered. (a) The receptors have a finite light collection area, the receptor aperture , and hence do not sample the image like a collection of delta functions (as assumed in the derivation of the sampling theorem). However, this does not affect the argument because the small blurring effect of the cone aperture can be lumped with point-spread function of the optical system (see later discussion of receptive-field sampling). (b) There are (as we will see below) three different types of cones, each with a different photopigment, sensitive to different ranges of wavelengths. One might think that this would produce aliasing effects for some colors in images. Actually this in not the case because the spectral absorption ranges of the photopigments are large and overlapping, and because naturally occurring chromatic spectra tend to be broad and smooth. (c) A more serious weakness, that was discussed earlier, is that the sampling theorem does not take into account the effects of noise. Because of noise, perfect reconstruction of an image is impossible. However, the general conclusions one can draw from the sampling theorem remain approximately true even when noise is considered.

When the pupil is larger (as in low light conditions) the optics of the eye become worse (see Figure 5.2). Under these conditions there are more receptors than necessary in the fovea. Moving away from the fovea, the cone sampling density drops precipitously (to 15-20 receptors/deg). The optics of the eye do not degrade very fast over this same range. Thus, in the periphery, the cones are often undersampling the retinal image.

The human rods are absent in the fovea and sample at nearly the Nyquist rate in large areas of the periphery. However, the rod responses are spatially averaged in the subsequent layers of the retina, so that all the high-resolution image information in the rod responses is lost. The purpose of the rod receptor system is to encode very low intensity images. Apparently, the increased image noise at low light levels requires pooling the rod responses (see later discussion of the effects of noise).

D. Chromatic sampling by the photoreceptors

Both chromatic information (the wavelength distribution) and intensity information (the retinal irradiance) are encoded in the receptors by light-sensitive photopigment molecules contained in the outer segments (see Figure 28). Human cones contain

between xx and xx photopigment molecules, human rods approximately 10 8 photopigment molecules. Each photopigment molecule consists of two sub-molecules; a large protein molecule, an opsin , and a much smaller molecule, retinal. The opsin determines the sensitivity of the photopigment as a function of wavelength, and the retinal plays a crucial role in making the photopigment molecule sensitive to light.

When the photopigment is ready to absorb light, the molecule of retinal has a particular configuration (it is bent). When the photopigment molecule absorbs a photon, the retinal changes configuration (it straightens). The bent configuration is called 11-cis retinal and straightened configuration all-trans retinal (see Figure 39). This change in

configuration, called isomerization , occurs essentially instantaneously. Before exposure to light, the photopigment appears colored (because it absorbs certain wavelengths); a short while after the photopigment absorbs light it becomes clear or bleached (because it looses the ability to absorb light).

Figure 6.

Cells per degree

Eccentricity (deg)

Rod

Cones

Ganglion

Cells

Chromatic information is represented by the relative numbers of photopigment isomerizations (the number effective photon absorptions) that occur in the different classes of receptor. The mean number of effective photon absorptions that occur per second in a receptor is given by the product of the quantum irradiance distribution of the light entering the receptor, the effective light collection area of the receptor, the absorption spectrum of the receptor (the fraction of photons absorbed as a function of wavelength), and the isomerization efficiency (the mean number of isomerizations per photon absorbed). Thus, the quanta absorbed per second at wavelength λ is given by

N ( λ ) =κ ξα ( λ ) n ( λ ) (6.5)

measurements. Figure 6.7B shows the absorption spectrum for the rod photopigment. As can be seen, the absorption spectra are broad and overlapping. The functions in Figure 6.7 are normalized to a peak of 1.0. The actual fractions of photons absorbed at the peak wavelengths appear to be approximately 0.5 (peak optical density of 0.3) for all four types of human photoreceptors.

There is one other important point to make before leaving the discussion of the absorption spectra. Namely, all evidence points to the conclusion that the receptor response following absorption of a photon is identical regardless of the wavelength of the photon that was absorbed. This implies that the receptor itself cannot signal the wavelength of light; it can only signal the number of photons that were absorbed. This is called the univariance property. Consequently, all wavelength information is necessarily carried by the differences in the response of the different types of receptors within a region. (This univariance property is also true of photodiodes, CCD arrays, and most other light measuring devices.) The univariance property is required in order to infer the absorption spectrum of the photopigment from the psychophysical color matching data or from the electrical responses of cones (the suction electrode technique).

Figure 6. A B

2. Chromatic sampling and color matching

In the human fovea, the photoreceptors sample the retinal image spatially at nearly

the Nyquist rate (14,000 receptors/deg 2 ), and hence are capable of extracting essentially all of the spatial information. However, because there are only three spectrally different

types of cone photoreceptors, they are not capable of extracting very much chromatic information from the retinal image. Specifically, there are only three samples taken over the entire visible range of wavelengths from 350 nm to 750 nm. The rods take another sample, but the evidence is that they contribute little to vision at the light levels where the cones are active. Hence, rods contribute little, if anything, to chromatic sampling in the visual system.

A engineer would certainly be impressed that the visual system samples spatially at the Nyquist rate determined by the diffraction limit of the eye, because that is as good as the physics will allow. On the other hand, an engineer would be very unimpressed by a chromatic sampling rate of less than 1 sample per 100 nm, because in artificial devices (e.g., spectral radiometers), sampling rates of 1 sample per nm, or better, are common and are needed in many applications.

If the receptors do not sample the retinal image very well in the wavelength dimension then there should be substantial losses of information, with important consequences for visual performance. This, in fact, is the case. For example, human ability to discriminate different wavelength distributions is relatively poor compared to what it might be. As we shall see, many wavelength distributions that are physically very different appear identical to humans because there are just three types of cone receptor. For example, a pure red wavelength (e.g., 620 nm) added to the right amount of a pure green wavelength (e.g., 530 nm) is completely indistinguishable from the right amount of a pure yellow wavelength (e.g., 575 nm).

Each of the three cone receptors has a different absorptance spectrum (Figure 6.7). If we let R, G, and B be the total photons absorbed in the individual red, green and blue sensitive receptors, and if we let αr(λ) , αg(λ) , and αb(λ) be their absorptance spectra then

R =∫ κ α r ( λ )^ n ( λ )^ d λ

G =∫ κ α g ( λ ) n ( λ ) d λ

B =∫ κ α^ b (^ λ^ ) n (^ λ^ ) d^ λ

These equations imply that an arbitrary spectral distribution, n(λ), is always represented in the receptors by a triple of three numbers (R, G, B). In other words, the receptors map the potentially infinite dimensional space of spectral irradiance distributions into a three dimensional space -- the quantum absorptions in the three classes of cone.

Because of the univariance property and the additivity of photon absorptions (under most conditions), the low rate of chromatic sampling has some powerful implications concerning which spectral quantum distributions should be indistinguishable to the visual system (i.e., which distributions should be "aliases" of each other). Let n 1 (λ) and n 2 (λ)

It is also important to note that even though the constants are sometime negative, the three constants can be used as a concise, general specification of wavelength distribution, for purposes of viewing by humans. This is the principle underlying the CIE standard for specifying color (see chapter II). The tristimulus functions, x ( λ (^) ), y ( λ (^) )and z ( λ (^) ), are a

linear transformation of the cone absorption spectra (multiplied by the transmittance of the ocular media). Any two stimuli that have identical tristimulus values (X, Y, Z)

[computed from using, x ( λ (^) ), y ( λ (^) ) and z ( λ (^) )] produce the same absorptions in the

receptors, and hence are indistinguishable to human observers.

Finally, it is worth noting that properties equivalent to (6.11) - (6.13) hold for spatial sampling as well. However, because the spatial sampling rate is much high than the chromatic sampling rate, the dimension of the sample space is much higher than three, making the properties less useful.

3. Spatial distributions of R, G and B cones

As shown in Figure 6.4, the rods and cones are not distributed evenly across the retina. The rods are concentrated in the periphery, and absent in the fovea; whereas, the cones are concentrated in the fovea, while existing in smaller, yet substantial, numbers everywhere else. In addition, the three classes of cones are not distributed equally across the retina. The B cones are the fewest in number. They are absent in the central 20 min of the fovea (in most, but not all, humans) and constitute only about 10% of the cones. They form a sparse, fairly regular mosaic. The relative numbers and spatial distributions of the R and G cones are not known precisely. The best current estimates are that the R cones are 1.5 to 2 times more numerous than the G cones, and that both cone types appear to have a random spatial distribution. As mentioned earlier, the large overlap of the R and G cone absorption spectra and the broad spectral power distributions of natural lights minimize aliasing artifacts in the receptor coding of natural images. The greatest potential for aliasing effects is in the B cones because they are so sparse; however, even here the effects on visual performance are difficult to observe except under careful laboratory conditions (Williams, ).

E. Intensity and temporal coding by the photoreceptor

The spatial distribution of light in the retinal image is sampled by a discrete array of photoreceptors. The chromatic distribution of light is sampled by three different classes of cone, each containing a different photopigment. Intensity (or grey level) and the temporal distribution of light are not discretely sampled by the photoreceptors, but are encoded as graded potentials (levels of voltage or current flow) between the inside and outside of the receptor. The only really discrete aspect of intensity and temporal coding in the photoreceptor is the quantal nature of the photon absorption.

Once a photopigment molecule is isomerized by the absorption of a photon, it acts (for a brief period of time) as a catalyst to a series of chemical reactions that ultimately results in a change in the potential (and the current flow) between the inside and outside of the receptor. This change in potential modulates the release of synaptic transmitter substances from the receptor terminals onto the dendritic surfaces the next layer of retinal neurons. This visual transduction process occurs quickly (on the order of msec).

In addition to the visual transduction process, there is a separate, slower chemistry (which takes on the order of minutes) concerned with returning the photopigment molecules back into the 11-cis configuration. This process is known as regeneration. In rods the regeneration process has an exponential time constant of approximately 7 min (i.e., every 7 min, 63% of the remaining bleached photopigment returns to the regenerated state). In cones, the regeneration time constant is approximately 2 min. This slow regeneration is correlated with the slow recovery of light sensitivity that occurs when we transition from a light environment to a dark environment.

1. Visual transduction

Figure 6.8 summarizes some of the important biophysical aspects of the photoreceptor response to light. This figure refers to the rod photoreceptor, but essentially the same story holds for the cone photoreceptors.

Imbedded within the outer membrane of the receptor there are (very small) gated channels selective for sodium ions (Na +), non-gated (leakage) channels selective for potassium ions (K+), and an active transport mechanism that exchanges internal Na +^ for

external K+^ (i.e., a Na +/K+^ pump). In the dark, the sodium channels are held open by the high concentration of an "internal transmitter" molecule called cyclic GMP (cGMP). Under these conditions, the receptor is highly active; sodium (and calcium) is rushing in

through the open Na +^ channels, potassium is rushing out through the K+^ channels, and the Na +/K+^ pump is performing the reverse operation. In the dark, the potential between the inside and outside of the receptor is typically around -40 millivolts (mV) and the current flow (the photocurrent ) is around -50 pico amps (pA). (See Figure 6.8 C.)

As described earlier, when a photopigment molecule absorbs a photon it briefly enters an active state from which the photopigment molecule initiates a chain of chemical reactions inside the receptor. The end result of these chemical reactions is a reduction in the concentration of the internal transmitter, cGMP. When the concentration of cGMP is reduced, some of the Na +^ channels in the receptor membrane close (see Figure 6.8 B).

The black arrows indicate the steps leading to the reduction in the concentration of cGMP, the open arrows the restoration of cGMP.

Figure 6.

a. Impulse response function of the receptor photocurrent

Photocurrent responses of a rod photoreceptor to brief light flashes of various intensities are shown in Figure 6.10A (Baylor, Nunn, and Schnapf, 1984). The photocurrent responses to the different intensities have been superimposed, and have been normalized to the photocurrent in the dark. The flash intensities varied from about 3 effectively absorbed photons (isomerizations) per flash to about 850 absorbed photons per flash. As can be seen, the responses grow steadily as the flash intensity is increased, until the higher flash levels where "response saturation" (flattening of the peak response) begins to occur. However, even in the saturation range, the duration of the photocurrent response increases with flash intensity.

The typical explanation for this behavior is that the concentration of cGMP (to brief flashes) is described (approximately) by a linear system, and that the cGMP controls the

gating of the Na +^ channels. Thus, the flash-response of the cGMP does not saturate, but increases roughly in proportion to flash intensity. The photocurrent is assumed to be

proportional to the number of open Na+^ channels, however, there are only a finite number of channels, and so the photocurrent saturates when all the channels become closed. The longer response to more intense saturating flashes occurs presumably because the larger

cGMP response keeps the Na +^ channels closed for a longer time.

Figure 6.

For pulses of light that are not too intense, the photocurrent is proportional to the internal transmitter concentration. Under these circumstances the receptor transduction (up to the photocurrent) should be described approximately as a linear system. Let the impulse-response function of the cGMP concentration be h 0 (t). In rods, h 0 (t) appears to

be well approximated by a cascade of several exponential, low-pass filters (i.e., a gamma function):

h 0 ( t )^ =γ τ η − 1 et / τ 0 (6.17)