## Abstract

Recent developments in computational photography enabled variation of
the optical focus of a plenoptic camera after image exposure, also
known as *refocusing*. Existing ray models in the field
simplify the camera’s complexity for the purpose of image and depth
map enhancement, but fail to satisfyingly predict the distance to
which a photograph is refocused. By treating a pair of light rays as a
system of linear functions, it will be shown in this paper that its
solution yields an intersection indicating the distance to a refocused
object plane. Experimental work is conducted with different lenses and
focus settings while comparing distance estimates with a stack of
refocused photographs for which a blur metric has been devised.
Quantitative assessments over a 24 m distance range suggest that
predictions deviate by less than 0.35 % in comparison to an optical
design software. The proposed refocusing estimator assists in
predicting object distances just as in the prototyping stage of
plenoptic cameras and will be an essential feature in applications
demanding high precision in synthetic focus or where depth map
recovery is done by analyzing a stack of refocused photographs.

© 2016 Optical Society of America

## 1. Introduction

With a conventional camera, angular information of light rays is lost at the moment of image acquisition, since the irradiance of all rays striking a sensor element is averaged regardless of the rays’ incident angle. Light rays originating from an object point that is out of focus will be scattered across many sensor elements. This becomes visible as a blurred region and cannot be satisfyingly resolved afterwards. To overcome this problem, an optical imaging system is required to enable detection of the light rays’ direction. Plenoptic cameras achieve this by capturing each spatial point from multiple perspectives.

The first stages in the development of the plenoptic camera can be traced
back to the beginning of the previous century [1, 2]. At
that time, just as today, it was the goal to recover image depth by
attaching a light transmitting sampling array, i.e. made from pinholes or
micro lenses, to an imaging device of an otherwise traditional camera
[3]. One attempt to adequately
describe light rays traveling through these optical hardware components is
the *4-Dimensional* (4-D) light field notation [4] which gained popularity among image
scientists. In principle, a captured 4-D light field is characterized by
rays piercing two planes with respective coordinate space
(*s*, *t*) and (*u*,
*v*) that are placed behind one another. Provided with the
distance between these planes, the four coordinates (*u*,
*v*, *s*, *t*) of a single
ray give indication about its angle and, if combined with other rays in
the light field, allows depth information to be inferred. Another
fundamental breakthrough in the field was the discovery of a synthetic
focus variation after image acquisition [5]. This can be thought of as layering and shifting viewpoint
images taken by an array of cameras and merging their pixel intensities.
Subsequently, this conceptual idea was transferred to the plenoptic camera
[6]. It has been pointed out that
the maximal depth resolution is achieved when positioning the
*Micro Lens Array* (MLA) one focal length away from the
sensor [7]. More recently, research
has investigated different MLA focus settings offering a resolution
trade-off in angular and spatial domain [8] and new related image rendering techniques [9]. To distinguish between camera types, the term
*Standard Plenoptic Camera* (SPC) was coined in [10] to describe a setup where an image
sensor is placed at the MLA’s focal plane as presented by [6].

While the SPC has made its way to the consumer photography market, our research group proposed ray models aiming to estimate distances which have been computationally brought to focus [11, 12]. These articles laid the groundwork for estimating the refocusing distance by regarding specific light rays as a system of linear functions. The system’s solution yields an intersection in object space indicating the distance from which rays have been propagated. The experimental results supplied in recent work showed matching estimates for far distances, but incorrect approximations for objects close to the SPC [12]. A benchmark comparison of the previous distance estimator [11] with a real ray simulation software [13] has revealed errors of up to 11 %. This was due to an approach inaccurate at locating micro image center positions.

It is demonstrated in this study that deviations in refocusing distance
predictions remain below 0.35 % for different lens designs and focus
settings. Accuracy improvements rely on the assumption that chief rays
impinging on *Micro Image Centers* (MICs) arise from the
exit pupil center. The proposed solution offers an instant computation and
will prove to be useful in professional photography and motion picture
arts which require precise synthetic focus measures.

This paper is outlined as follows. Section 2 derives an efficient image synthesis to reconstruct photographs with a varying optical focus from an SPC. Based on the developed model, Section 3 aims at representing light rays as functions and shows how the refocusing distance can be located. Following this, Section 4 is concerned with evaluating claims made about the synthetic focusing distance by using real images from our customized SPC and a benchmark assessment with a real ray simulation software [13]. Conclusions are drawn in Section 5 presenting achievements and an outlook for future work.

## 2. Standard plenoptic ray model

As a starting point, we deploy the well known thin lens equation which can be written as

where*f*denotes the focal length,

_{s}*b*the image distance and

_{s}*a*the object distance in respect of a micro lens

_{s}*s*. Since micro lenses are placed at a stationary distance

*f*in front of the image sensor of an SPC,

_{s}*f*equals the micro lens image distance (

_{s}*f*=

_{s}*b*). Therefore,

_{s}*f*may be substituted for

_{s}*b*in Eq. (1) which yields

_{s}*a*→ ∞ after subtracting the term 1/

_{s}*f*. This means that rays converging on a distance

_{s}*f*behind the lens have emanated from a point at an infinitely far distance

_{s}*a*. Rays coming from infinity travel parallel to each other which is known as the effect of collimation. To support this, it is assumed that image spots focusing at a distance

_{s}*f*are infinitesimally small. In addition, we regard micro image sampling positions

_{s}*u*to be discrete from which light rays are traced back through lens components. Figure 1 shows collimated light rays entering a micro lens and leaving main lens elements.

At the micro image plane, an MIC operates as a reference point
$c=\raisebox{1ex}{$\left(M-1\right)$}\!\left/ \!\raisebox{-1ex}{$2$}\right.$ where *M* denotes the
one-dimensional (1-D) micro image resolution which is seen to be
consistent. Horizontal micro image samples are then indexed by
*c* + *i* where *i* ∈
[−*c*, *c*]. Horizontal micro image
positions are given as
*u*_{c+i,j} where
*j* denotes the 1-D index of the respective micro lens
*s _{j}*. A plenoptic micro lens illuminates several
pixels

*u*

_{c+i,j}and requires its lens pitch, denoted as Δ

*s*, to be greater than the pixel pitch Δ

*u*. Each chief ray arriving at any

*u*

_{c+i,j}exhibits a specific slope

*m*

_{c+i,j}. For example, micro lens chief rays which focus at

*u*

_{c−1,j}have a slope

*m*

_{−1,j}in common. Hence, all chief rays

*m*

_{−1,j}form a collimated light beam in front of the MLA.

In our previous model [11], it is
assumed that each MIC lies on the optical axis of its corresponding micro
lens. It was mentioned that this hypothesis would only be true where the
main lens is at an infinite distance from the MLA [14]. Because of the finite separation distance
between the main lens and the MLA, the centers of micro images deviate
from their micro lens optical axes. A more realistic attempt to
approximate MIC positions is to trace chief rays through optical centers
of micro and main lenses [15]. An
extension of this assumption is proposed in Fig. 1(b) where the center of the aperture’s exit
pupil *A′* is seen to be the MIC chief rays origin. It is
of particular importance to detect MICs correctly since they are taken as
reference origins in the image synthesis process. Contrary to previous
approaches [11, 12], all chief rays impinging on the MIC positions
originate from the exit pupil center which, for simplicity, coincides with
the main lens optical center in Fig.
2. All chief ray positions that are adjacent to MICs can be
ascertained by a corresponding multiple of the pixel pitch
Δ*u*.

It has been stated in [6] that the
irradiance *I*_{bU} at a film plane (*s*,
*t*) of a conventional camera is obtained by

*A*(·) denotes the aperture, (

*U*,

*V*) the main lens plane coordinate space and

*b*the separation between the main lens and the film plane (

_{U}*s*,

*t*). The factor 1/

*b*

_{U}^{2}is often referred to as the inverse-square law [16]. If

*θ*is the incident ray angle, the roll-off factor cos

^{4}

*θ*describes the gradual decline in irradiance from object points at an oblique angle impinging on the film plane, also known as

*natural vignetting*. It is implied that coordinates (

*s*,

*t*) represent the spatial domain in horizontal and vertical dimensions while (

*U*,

*V*) denote the angular light field domain. To simplify Eq. (2), a horizontal cross-section of the light field is regarded hereafter so that

*L*

_{bU}(

*s*,

*t*,

*U*,

*V*) becomes

*L*

_{bU}(

*s*,

*U*). Thereby, subsequent declarations build on the assumption that camera parameters are equally specified in horizontal and vertical dimensions allowing propositions to be applied to both dimensions in the same manner. Since the overall measured irradiance

*I*

_{bU}is scalable (e.g. on electronic devices) without affecting the light field information, the inverse-square factor 1/

*b*

_{U}^{2}may be omitted at this stage. On the supposition that the main lens aperture is seen to be completely open, the aperture term becomes

*A*(·)= 1. To further simplify, cos

^{4}

*θ*will be neglected given that pictures do not expose

*natural vignetting*. Provided these assumptions, Eq. (2) can be shortened yielding

Suppose that the entire light field *L*_{bU}
(*s*, *U*) is located at plane
*U* in the form of *I _{U}*
(

*s*,

*U*) since all rays of a potentially captured light field travel through

*U*. From this it follows that

*I*(

_{fs}*s*,

*u*) located one focal length

*f*behind

_{s}*I*

_{bU}with

*u*as a horizontal and

*v*as a vertical angular sampling domain in the 2-D case. The former spatial image plane

*I*

_{bU}(

*s*,

*t*) is now replaced by an MLA, enabling light to pass through and strike the new sensor plane

*I*(

_{fs}*s*,

*u*). When applying the method of similar triangles to Fig. 3, it becomes apparent that

*I*(

_{U}*s*,

*U*) is directly proportional to

*I*(

_{fs}*s*,

*u*) which gives where ∝ designates the equality up to scale. When ignoring the scale factor in Eq. (5), which simply lowers the overall irradiance,

*I*(

_{fs}*s*,

*u*) and

*I*(

_{U}*s*,

*U*) become equal. From this it follows that Eq. (3) can be written as

Due to the human visual perception, photosensitive sensors limit the irradiance signal spectrum to the visible wavelength range. For this purpose, bandpass filters are placed in the optical path of present-day cameras which prevents infrared and ultraviolet radiation from being captured. Therefore, Eq. (6) will be rewritten as

in order that photometric illuminances*E*

_{bU}and

*E*substitute irradiances

_{fs}*I*

_{bU}and

*I*in accordance with the luminosity function [17]. Besides, it is assumed that

_{fs}*E*(

_{fs}*s*,

*u*) is a monochromatic signal being represented as a gray scale image. Recalling index notations of the derived model, a discrete equivalent of Eq. (7) may be given by provided that the sample width Δ

*u*is neglected here as it simply scales the overall illuminance

*E*

_{bU}[

*s*] while preserving relative brightness levels. It is further implied that indices in the vertical domain are constant meaning that only a single horizontal row of sampled

_{j}*s*and

_{j}*u*

_{c+i}is regarded in the following. Nonetheless, subsequent formulas can be applied in the vertical direction under the assumption that indices are interchangeable and thus of the same size. Equation (8) serves as a basis for refocusing syntheses in spatial domain.

Invoking the *Lambertian* reflectance, an object point
scatters light in all directions uniformly, meaning that each ray coming
from that point carries the same energy. With this, an object placed at
plane *a* = 0 reflects light with a luminous emittance
*M _{a}*. An example which highlights the rays’ path
starting from a spatial point

*s′*at object plane

*M*

_{0}is shown in Fig. 4.

Closer inspection of Fig. 4 reveals
that the luminous emittance *M*_{0} at a discrete
point *s′*_{0} may be seen as projected onto a
micro lens *s*_{0} and scattered across micro image
pixels *u*. In the absence of reflection and absorption at
the lens material, a synthesized image
*E′ _{a}*[

*s*] at the MLA plane (

_{j}*a*= 0) is recovered by integrating all illuminance values

*u*

_{c+i}for each

*s*. Taking

_{j}*E′*

_{0}[

*s*

_{0}] as an example, this is mathematically given by

*s*

_{1}in

*E′*

_{0}can be retrieved by

*E′*

_{0}[

*s*] as it appeared on the MLA by summing up all pixels within each micro image to form a respective spatial point of that particular plane. As claimed, refocusing allows more than only one focused image plane to be recovered. Figure 5 depicts rays emitted from an object point located closer to the camera device (

_{j}*a*= 1).

For comprehensibility, light rays have been extended on the image side in
Fig. 5 yielding an intersection at
a distance where the corresponding image point would have focused without
the MLA and image sensor. The presence of both, however, enables the
illuminance of an image point to be retrieved as it would have appeared
with conventional sensor at *E′*_{1}. Further
analysis of light rays in Fig. 5
unveils coordinate pairs [*s _{j}*,

*u*

_{c+i}] that have to be considered in an integration process synthesizing

*E′*

_{1}. Accordingly, the illuminance

*E′*

_{1}at point

*s*

_{0}can be obtained as follows

*s*

_{1}is formed by calculating

*j*has simply been incremented by 1 from Eq. (12) to Eq. (13) allows conclusions to be drawn about the final refocusing synthesis equation which reads

*a*to be recovered. In Eq. (14) it is assumed that synthesized intensities

*E′*[

_{a}*s*] ignore

_{j}*clipping*which occurs when quantized values exceed the maximum amplitude of the given bit depth range. Thus, Eq. (14) only applies to underexposed plenoptic camera images on condition that peaks in

*E′*[

_{a}*s*] do not surpass the quantization limit. To prevent

_{j}*clipping*during the refocusing process, one can simply average intensities

*E*prior to summing them up as provided by

_{fs}*a*∈ ℚ involves an interpolation of micro images which increases the spatial and angular resolution at the same time. In such a scenario, a denominator in a fraction number

*a*represents the upsampling factor for the number of micro images.

Note that our implementation of the image synthesis employs an algorithmic
MIC detection with sub-pixel precision as suggested by Cho *et
al.* [18] and resamples the
angular domain
*u*_{c+i,j}
accordingly to suppress image artifacts in reconstructed photographs.

## 3. Focus range estimation

In geometrical optics, light rays are viewed as straight lines with a
certain angle in a given interval. These lines can be represented by
linear functions of *z* possessing a slope
*m*. By regarding the rays’ emission as an intersection of
ray functions, it may be viable to pinpoint their local origin. This
position is seen to indicate the focusing distance of a refocused
photograph. In order for it to function, the proposed concept requires the
geometry and thus the parameters of the camera system to be known. This
section develops a theoretical approach based on the realistic SPC model
to estimate the distance and *Depth of Field* (DoF) that
has been computationally brought into focus. A Matlab implementation of
the proposed distance estimator can be found online (see
Code
1, [19]).

#### 3.1. Refocusing distance

In previous studies [11, 12], the refocusing distance has been
found by geometrically tracing light rays through the lenses and
finding their intersection in object space. Alternatively, rays can be
seen as intersecting behind the sensor which is illustrated in Fig. 5. The convergence of a selected
image-side ray pair indicates where the respective image point would
have focused in the absence of MLA and sensor. Locating this image
point provides a refocused image distance
*b _{U}′* which may help to get the refocused
object side distance

*a*when applying the thin lens equation. It will be demonstrated hereafter that the ray intersection on image-side requires less computational steps as the ascertainment of two object-side ray slopes becomes redundant. For conciseness, we trace rays along the central horizontal axis, although subsequent equations can be equally employed in the vertical domain which produces the same distance result. First of all, it is necessary to define the optical center of an SPC image by letting the micro lens index be

_{U}′*j*=

*o*where

*J*is the total number of micro lenses in the horizontal direction. Given the micro lens diameter Δ

*s*, the horizontal position of a micro lens’ optical center is given by where

*j*is seen to start counting from 0 at the leftmost micro lens with respect to the main lens optical axis. As rays impinging on MICs are seen to connect an optical center of a micro lens

*s*and the exit pupil

_{j}*A′*, their respective slope

*m*may be given by where

_{c,j}*d*denotes the separation between exit pupil plane and the MLA’s front vertex. Provided the MIC chief ray slope

_{A′}*m*, an MIC position

_{c,j}*u*is estimated by extending

_{c,j}*m*until it intersects the sensor plane which is calculated by Central positions of adjacent pixels

_{c,j}*u*

_{c+i,j}are given by the number of pixels

*i*separating

*u*

_{c+i,j}from the center

*u*. To calculate

_{c,j}*u*

_{c+i,j}, we simply compute which requires the pixel width Δ

*u*. The slope

*m*

_{c+i,j}of a ray that hits a micro image at position

*u*

_{c+i,j}is obtained by With this, each ray on the image side can be expressed as a linear function as given by

At this stage, it may be worth discussing the selection of appropriate
rays for the intersection. A set of two chief ray functions meets the
requirements to locate an object plane *a* because all
adjacent ray intersections lie on the same planar surface parallel to
the sensor. It is of key importance, however, to select a ray pair
that intersects at a desired plane *a*. In respect of
the refocusing synthesis in Eq. (15), a system of linear ray functions is found by letting
the index subscript in
*f*_{c+i,j}(*z*)
be *A⃗* = {*c* + *i*,
*j*} = {*c* − *c*,
*e*} for the first chief ray where *e*
is an arbitrary, but valid micro lens *s _{e}*
and

*B⃗*= {

*c*+

*i*,

*j*} = {

*c*+

*c*,

*e*−

*a*(

*M*− 1)} for the second ray. Given the synthesis example depicted in Fig. 5, parameters would be

*e*= 2,

*a*= 1,

*M*= 3,

*c*= 1 such that corresponding ray functions are

*f*

_{0,2}(

*z*) for

*E*[

_{fs}*u*

_{0},

*s*

_{2}] and

*f*

_{2,0}(

*z*) for

*E*[

_{fs}*u*

_{2},

*s*

_{0}]. Finally, the intersection of the chosen chief ray pair is found by solving

*d*, from MLA to the intersection where rays would have focused. Note that

_{a}′*d*is negative if the intersection occurs behind the MLA. Having

_{a}′*d*, we get new image distances

_{a}′*b*of the particular refocused plane by calculating Related object distances

_{U}′*a*are retrieved by deploying the thin lens equation in a way that With respect to the MLA location, the final refocusing distance

_{U}′*d*can be acquired by summing up all parameters separating the MLA from the principal plane

_{a}*H*

_{1U}as demonstrated in with $\overline{{H}_{1U}{H}_{2U}}$ as the distance which separates principal planes from each other.

#### 3.2. Depth of field

A focused image spot of a finite size, by implication, causes the
focused depth range in object space to be finite as well. In
conventional photography, this range is called *Depth of
Field* (DoF). Optical phenomena such as aberrations or
diffraction are known to limit the spatial extent of projected image
points. However, most kinds of lens aberrations can be nominally
eliminated through optical lens design (e.g. aspherical lenses,
glasses of different dispersion). In that case, the circle of least
confusion solely depends on diffraction making an imaging system
called *diffraction-limited*. Thereby, light waves that
encounter a pinhole, aperture or slit of a size comparable to the
wavelength *λ* propagate in all directions and
interfere at an image plane inducing wave superposition due to the
ray’s varying path length and corresponding difference in phase. A
diffracted image point is made up of a central disc possessing the
major energy surrounded by rings with alternating intensity which is
often referred to as *Airy* pattern [16]. According to Hecht [16], the radius *r _{A}*
of an

*Airy*pattern’s central peak disc is approximately given by

*Rayleigh*criterion. The

*Rayleigh*criterion states that two image points of equal irradiance in the form of an

*Airy*pattern need to be separated by a minimum distance (Δ

*ℓ*)

_{min}=

*r*to be visually distinguishable. Let us suppose a

_{A}*non-diffraction-limited*camera system in which the pixel pitch Δ

*u*is larger than or equal to (Δ

*ℓ*)

_{min}at the smallest aperture diameter

*A*. In this case, the DoF merely depends on the pixel pitch Δ

*u*. To distinguish between different pixel positions, we define three types of rays that are class-divided into:

*central rays*at pixel centers*u*_{c+i,j}*inner rays*at pixel borders*u*_{{c+i,j}−}towards the MIC*outer rays*at pixel borders*u*_{{c+i,j}}+ closer to the micro image edge

Similar to the acquisition of *central ray* positions
*u*_{c+i,j}
in Section 3.1, pixel border positions
*u*_{{c+i,j}±}
may be obtained as follows

*u*is taken from Eq. (19). Given

_{c,j}*u*

_{{c+i,j}±}as spatial points at pixel borders, chief ray slopes

*m*

_{{c+i,j}±}starting from these respective locations are given by Since border points are assumed to be infinitely small and positioned at the distance of one micro lens focal length, light rays ending up at

*u*

_{{c+i,j}±}form collimated beams between

*s*and

*U*propagating with respective slopes

*m*

_{{c+i,j}±}in that particular interval. The range that spans from the furthest to closest intersection of these beams defines the DoF. Closer inspection of Fig. 6 reveals that

*inner rays*intersect at the close DoF boundary and pass through external micro lens edges.

*Outer rays*, however, yield an intersection providing the furthest DoF boundary and cross internal micro lens edges. Therefore, it is of importance to determine micro lens edges

*s*

_{j±}which is accomplished by

*Outer*and

*inner rays*converging on the image side are seen to disregard the refraction at micro lenses and continue their path with

*m*

_{{c+i,j}±}from the micro lens edge as depicted in Fig. 6. Hence, a linear function representing a light ray at a pixel border is given by

*A⃗*= {

*c*+

*i*,

*j*} and

*A⃗*±,

*B⃗*± select a desired DoF ray pair

*A⃗*± = {

*c*−

*c*,

*e*},

*B⃗*± = {

*c*+

*c*,

*e*−

*a*(

*M*− 1)} as discussed in Section 3.1. We get new image distances ${{{b}_{u}}^{\prime}}_{\pm}$ of the particular refocused DoF boundaries when calculating Related DoF object distances ${{{a}_{u}}^{\prime}}_{\pm}$ are retrieved by deploying the thin lens equation such that

*d*

_{a±}can be acquired by summing up all parameters separating the MLA from the principal plane

*H*

_{1U}as demonstrated in Finally, the difference of the near limit

*d*

_{a−}and far limit

*d*

_{a+}yield the

*DoF*that reads The contrived model implies that the micro image size directly affects the refocusing and DoF performance. A reduction of

_{a}*M*, for example via cropping each micro image, causes depth aliasing due to downsampling in the angular domain. This consequently lowers the number of refocused image slices and increases their DoF. Upsampling

*M*, in turn, raises the number of refocused photographs and shrinks the DoF per slice. An evaluation of these statements is carried out in the following section where results are presented.

## 4. Validation

For the experimental work, we conceive a customized camera which
accommodates a *full frame* sensor with 4008 by 2672 active
pixels and Δ*u* = 9 *μ*m pixel pitch. A raw
photograph used in the experiment can be found in Appendix. The optical design is presented in what
follows.

#### 4.1. Lens specification

Table 1 lists parameters of
two micro lens specifications, denoted MLA (I.) and (II.), used in
subsequent experimentations. In addition to the input variables needed
for the proposed refocusing distance estimation, Table 1 contains relevant parameters such as
the thickness *t _{s}*, refractive index

*n*, radii of curvature

*R*

_{s1},

*R*

_{s2}, principal plane distance $\overline{{H}_{1s}{H}_{2s}}$ and the spacing

*d*between MLA back vertex and sensor plane which are required for micro lens modeling in an optical design software environment [13].

_{s}Modern objective lenses are known to change the optical focus by
shifting particular lens groups while other elements are static which,
in turn, alters cardinal point positions of that lens system. To
preserve previously elaborated principal plane locations, a variation
of the image distance *b _{U}* is achieved by
shifting the MLA compound sensor away from the objective lens while
its focus ring remains at infinity. The only limitation is, however,
that the space inside our customized camera confines the shift range
of the sensor system to an overall focus distance of

*d*≈ 4 m with

_{f}*d*as the distance from the MLA’s front vertex to the plane the main lens is focused on. Due to this, solely two focus settings (

_{f}*d*→ ∞ and

_{f}*d*≈ 4m) are subject to examination in the following experiment. With respect to the thin lens equation,

_{f}*b*is obtained via

_{U}*a*, however, it becomes obvious that

_{U}*b*is an input and output variable at the same time which gives a classical

_{U}*chicken-and-egg*problem. To solve this, we initially set the input

*b*:=

_{U}*f*, substitute the output

_{U}*b*for the input variable and iterate this procedure until both

_{U}*b*are identical. Objective lenses are denoted as

_{U}*f*

_{193},

*f*

_{90}and

*f*

_{197}. The specification for

*f*

_{193}and

*f*

_{90}are based on [20, 21] whereas

*f*

_{197}is measured experimentally using the approach in [22]. Calculated image, exit pupil and principal plane distances for the main lenses are provided in Table 2. Note that parameters are λ = 550 nm. Focal lengths can be found in the image distance column for infinity focus. A Zemax file of a plenoptic camera with

*f*

_{193}and MLA (II.) is provided online (see Dataset 1, [23]).

#### 4.2. Experiments

On the basis of raw light field photographs, this section aims to
evaluate the accuracy of predicted refocusing distances as proposed in
Section 3. The challenge here is to verify whether objects placed at a
known distance exhibit best focus at the predicted refocusing
distance. Hence, the evaluation requires an algorithm to sweep for
blurred regions in a stack of photographs with varying focus. One
obvious attempt to measure the blur of an image is to analyze them in
frequency domain. Mavridaki and Mezaris [24] follow this principle in a recent study to
assess the blur in a single image. To employ their proposition,
modifications are necessary as the distance validation requires the
degree of blur to be detected for particular image portions in a stack
of photographs with varying focus. Here, the conceptual idea is to
select a *Region of Interest* (RoI) surrounding the
same object in each refocused image. Unlike in Section 3 where the
vertical index *h* in *t _{h}* is
constant for conciseness, refocused images may be regarded in vertical
and horizontal direction in this section such that a refocused
photograph in 2-D is given as

*E″*[

_{a}*s*,

_{j}*t*].A RoI is a cropped version of a refocused photograph that can be selected as desired with image borders spanning from the

_{h}*ξ*-th to Ξ-th pixel in horizontal and the

*ϖ*-th to Π-th pixel in vertical direction. Care has been taken to ensure that a RoI’s bounding box precisely crops the object at the same relative position in each image of the focal stack. When

*Fourier*-transforming all RoIs of the focal stack, the key indicator for a blurred RoI is a decrease in its high frequency power. To implement the proposed concept, we first perform the 2-D

*Discrete Fourier Transformation*and extract the magnitude

*𝒳*[

*σ*,

_{ω}*ρ*] as given by

_{ψ}*TE*is computed via

*HE*, we calculate the power of low frequencies and subtract them from

*TE*as seen in

*Q*and

_{H}*Q*are limits in the range of {1, . . , Ω − 1} and {1, . . , Ψ − 1} separating low from high frequencies. Finally, the sharpness

_{V}*S*of a refocused RoI is obtained by serving as the blur metric. Thus, each RoI focal stack produces a set of

*S*values which is normalized and given as a function of the refocusing variable

*a*. The maximum in

*S*thereby indicates best focus for a selected RoI object at the respective

*a*.

To benchmark proposed refocusing distance estimates, an experiment is
conducted similar to that from a previous publication [12]. As opposed to [12] where *b _{U}* was
taken as the MIC chief ray origin, here

*d*is given as the origin for rays that lead to MIC positions. Besides this, frequency borders

_{A}′*Q*= Ω/100 and

_{H}*Q*= Ψ/100 are relative to the cropped image resolution. To make statements about the model accuracy, real objects are placed at predicted distances

_{V}*d*. Recall that

_{a}*d*is the distance from MLA to a respective refocused object plane

_{a}*M*. As the MLA is embedded in the camera body and hence inaccessible, the objective lens’ front panel was chosen to be the distance measurement origin for

_{a}*d*. This causes a displacement of 12.7 cm towards object space (

_{a}*d*− 12.7 cm), which has been accounted for in the predictions of

_{a}*d*presented in Tables 3(a) and 3(b). Moreover, Tables 3(a) and 3(b) list predicted DoF borders

_{a}*d*

_{a±}at different settings

*M*and

*b*while highlighting object planes

_{U}*a*.

Figures 7 and 8 reveal outcomes of the refocusing distance
validation by showing refocused images
*E″ _{a}*[

*s*,

_{j}*t*] and RoIs at different slices

_{h}*a*as well as related blur metric results

*S*. The reason why

*S*produces relatively large values around predicted blur metric peaks is that objects may lie within the DoF of adjacent slices

*a*and thus can be in focus among several refocused image slices. Taking slice $a=\raisebox{1ex}{$4$}\!\left/ \!\raisebox{-1ex}{$11$}\right.$ from Table 3(b) as an example, it becomes obvious that its object distance ${d}_{\raisebox{1ex}{$4$}\!\left/ \!\raisebox{-1ex}{$11$}\right.}=186\hspace{0.17em}\text{cm}$ cm falls inside the DoF range of slice $a=\raisebox{1ex}{$5$}\!\left/ \!\raisebox{-1ex}{$11$}\right.$ with ${d}_{\raisebox{1ex}{$5$}\!\left/ \!\raisebox{-1ex}{$11$}\right.+}=194\hspace{0.17em}\text{cm}$ and ${d}_{\raisebox{1ex}{$5$}\!\left/ \!\raisebox{-1ex}{$11$}\right.-}=140\hspace{0.17em}\text{cm}$ because ${d}_{\raisebox{1ex}{$5$}\!\left/ \!\raisebox{-1ex}{$11$}\right.+}>{d}_{\raisebox{1ex}{$4$}\!\left/ \!\raisebox{-1ex}{$11$}\right.}>{d}_{\raisebox{1ex}{$5$}\!\left/ \!\raisebox{-1ex}{$11$}\right.-}$. Section 3.2 shows that reducing the micro image resolution

*M*yields a narrower DoF which suggests to use largest possible

*M*as this minimizes the effect of wide DoFs. Experimentations given in Fig. 8 were carried out with maximum directional resolution

*M*= 11 since

*M*= 13 would involve pixels that start to suffer from vignetting and micro image crosstalk. Although objects are covered by DoFs of surrounding slices, the presented blur metric still detects predicted sharpness peaks as seen in Figs. 7 and 8.

A more insightful overview illustrating the distance estimation
performance of the proposed method is given in Figs. 7(r) and 8(r). Therein, each curve peak indicates the least blur for
respective RoI of a certain object. Vertical lines represent the
predicted distance *d _{a}* where objects were
situated. Hence, the best case scenario is attained when a curve peak
and its corresponding vertical line coincide. This would signify that
predicted and measured best focus for a particular distance are in
line with each other. While results in [12] exhibit errors in predicting the distance
of nearby objects, refocused distance estimates in Figs. 7(r) and 8(r) match least blur peaks

*S*for each object which corresponds to a 0 % error. It also suggests that the proposed refocusing distance estimator takes alternative lens focus settings (

*b*>

_{U}*f*) into account without causing a deviation which was not investigated in [12]. The improvement is mainly due to a correct MIC approximation. A more precise error can be obtained by increasing the SPC’s depth resolution. This inevitably requires to upsample the angular domain meaning more pixels per micro image. As our camera features an optimised micro image resolution (

_{U}*M*= 11) which is further upsampled by

*M*, provided outcomes are considered to be our accuracy limit. The following section aims at gaining quantitative results by using an optical design software [13].

#### 4.3. Simulation

The validation of distance predictions using an optical design software
[13] is achieved by firing off
*central rays* from the sensor side into object space.
However, *inner* and *outer rays* start
from micro lens edges with a slope calculated from the respective
pixel borders. The given pixel and micro lens pitch entail a micro
image resolution of *M* = 13. Due to the paraxial
approximation, rays starting from samples at the micro image border
cause largest possible errors. To testify prediction limits,
simulation results base on *A⃗* = {0,
*e*} and *B⃗* = {12, *e*
− *a*(*M* − 1)} unless specified
otherwise. To align rays, *e* is dimensioned such that
*A⃗* and *B⃗* are symmetric with an
intersection close to the optical axis *z _{U}*
(e.g.

*e*= 0, 6, 12, . . .). DoF rays

*A⃗*± and

*B⃗*± are fired from micro lens edges. Ray slopes build on MIC predictions obtained by Eq. (19). Refocusing distances in simulation are measured by intersections of corresponding rays in object space.

Exemplary screenshots are seen in Fig.
9. It is the observation in Figs. 9(a) to 9(c) that the DoF shrinks with increasing
parameter *a* which reminds of the focus behavior in
traditional cameras. Ray intersections in Figs. 9(d) to 9(f) contain simulation results
with a fixed *a*, but varying *M*. As
anticipated in Section 3, a DoF becomes larger with less directional
samples *u* and vice versa.

To benchmark the prediction, relative errors are provided as
*ERR*. Tables
4 and 6 show that each error of the refocusing
distance prediction remains below 0.35 %. This is a significant
improvement compared to previous results [11] which were up to 11 %. The main reason for
the enhancement relies on the more accurate MIC prediction. While
[11] was based on an ideal SPC
ray model where MICs are seen to be at the height of
*s _{j}*, the refined model takes actual MICs
into consideration by connecting chief rays from the exit pupil’s
center to micro lens centers.

Refocusing on narrow planes is achieved with a successive increase in
*a*. Thereby, prediction results move further away from
the simulation which is reflected in slightly increasing errors. This
may be explained by the fact that short distances
*d _{a}* and

*d*

_{a±}force light ray slopes to become steeper which counteracts the paraxial approximation in geometrical optics. As a result, aberrations occur that are not taken into account which, in turn, deteriorates the prediction accuracy.

When the objective lens is set to *d _{f}* → ∞
(

*a*→ ∞) and the refocusing value amounts to

_{U}*a*= 0,

*central rays*travel in a parallel manner whereas

*outer rays*even diverge and therefore never intersect each other. In this case, only the distance to the nearby DoF border, also known as hyperfocal distance, can be obtained from the

*inner rays*. This is given by

*d*

_{a−}in the first row of Table 4. The 4-th row of the measurement data where

*a*= 4 and

*d*→ ∞ for

_{f}*f*

_{193}contains an empty field in the

*d*

_{a−}simulation column. This is due to the fact that corresponding

*inner rays*lead to an intersection inside the objective lens which turns out to be an invalid refocusing result. Since the new image distance is smaller than the focal length (

*b*<

_{U}′*f*), results of this particular setting prove to be impractical as they exceed the natural focusing limit.

_{U}Despite promising results, the first set of analyses merely examined
the impact of the focus distance
*d _{f}*(

*a*). In order to assess the effect of the MLA focal length parameter

_{U}*f*, the simulation process has been repeated using MLA (I.) with results provided in Table 5. Comparing the outcomes with Table 4, distances

_{s}*d*

_{a±}suggest that a reduction in

*f*moves refocused object planes further away from the camera when

_{s}*d*→ ∞. Interestingly, the opposite occurs when focusing with

_{f}*d*= 1.5m.

_{f}According to the data in Tables
4 and 5, we can infer
that *d _{a}* ≈

*d*if

_{f}*a*= 0 which means that synthetically focusing with

*a*= 0 results in a focusing distance as with a conventional camera having a traditional sensor at the position of the MLA.

A third experimental validation was undertaken to investigate the
impact of the main lens focal length parameter
*f _{U}*. As Table 6 shows, using a shorter

*f*implies a rapid decline in

_{U}*d*

_{a±}with ascending

*a*. From this observation it follows that the depth sampling rate of refocused image slices is much denser for large

*f*. It can be concluded that the refocusing distance

_{U}*d*

_{a±}drops with decreasing main lens focusing distance

*d*, ascending refocusing parameter

_{f}*a*, enlarging MLA focal length

*f*, reducing objective lens focal length

_{s}*f*and vice versa.

_{U}Tracing rays according to our model yields more accurate results in the
optical design software [13]
than by solving Eq.
(23). However, deviations of less than 0.35 % are
insignificant. Implementing the model with a high-level programming
language (see Code
1, [19]) outperforms the real ray simulation in terms of
computation time. Using a timer, the image-side based method presented
in Section 3 takes about 0.002 to 0.005 seconds to compute
*d _{a}* and

*d*

_{a±}for each

*a*on an Intel Core i7-3770 CPU @ 3.40 GHz system whereas modeling a plenoptic lens design and measuring distances by hand can take up to a business day.

## 5. Conclusion

In summary, it is now possible to state that the distance to which an SPC photograph is refocused can be accurately predicted when deploying the proposed ray model and image synthesis. Flexibility and precision in focus and DoF variation after image capture can be useful in professional photography as well as motion picture arts. If combined with the presented blur metric, the conceived refocusing distance estimator allows an SPC to predict an object’s distance. This can be an important feature for robots in space or cars tracking objects in road traffic.

Our model has been experimentally verified using a customized SPC without exhibiting deviations as objects were placed at predicted distances. An extensive benchmark comparison with an optical design software [13] results in quantitative errors of up to 0.35 % over a 24 m range. This indicates a significant accuracy improvement over our previous method. Small tolerances in simulation are due to optical aberrations that are sufficiently suppressed in present-day objective lenses. Simulation results further support the assumption that DoF ranges shrink when refocusing closer, a focus behavior similar to that of conventional cameras.

It is unknown at this stage to which extent the presented method applies to
the *Fourier Slice Photography* [7], depth from defocus cues [25] or other types of plenoptic cameras [9, 10, 26]. Future studies on
light ray trajectories with different MLA positions are therefore
recommended as this exceeds the scope of the provided research.

## Appendix

Figure 10 depicts an unprocessed
image taken by our customized SPC with *d _{f}*
→ ∞ and was used to compute refocused photographs shown in Fig. 7. Footage acquired for this
research has been made available online (see
Dataset
2, [27]).

## Acknowledgments

This study was supported in part by the University of Bedfordshire and the EU under the ICT program as Project 3D VIVANT under EU-FP7 ICT-2010-248420. The authors would like to thank anonymous reviewers for requesting a raw light field photograph. Special thanks are due to Lascelle Mauricette who proof-read the manuscript.

## References and links

**1. **F. E. Ives, “Parallax stereogram and
process of making same,” US Patent
725,567 (April 14,
1903).

**2. **G. Lippmann, “Épreuves réversibles donnant
la sensation du relief,” J. Phys. Théor.
Appl. **7**,
821–825 (1908). [CrossRef]

**3. **E. H. Adelson and J. Y. Wang, “Single lens stereo with a
plenoptic camera,” IEEE Trans. Pattern Anal.
Mach. Intel. **14**(2),
99–106 (1992). [CrossRef]

**4. **M. Levoy and P. Hanrahan, “Lightfield
rendering,” in Proceedings of ACM
SIGGRAPH (1996), pp.
31–42.

**5. **A. Isaksen, L. McMillan, and S. J. Gortler, “Dynamically reparameterized
light fields,” in Proceedings of ACM
SIGGRAPH (2000), pp.
297–306.

**6. **R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Lightfield photography with a
hand-held plenoptic camera,” in *Stanford Tech.
Report*, 1–11
(CTSR,
2005).

**7. **R. Ng, Digital light field
photography (Stanford
University, 2006).

**8. **T. Georgiev, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-angular resolution
tradeoff in integral photography,” in
Proceedings of Eurographics Symposium on Rendering
(2006).

**9. **A. Lumsdaine and T. Georgiev, “The focused plenoptic
camera,” in Proceedings of IEEE International
Conference on Computational Photography,
1–8
(2009).

**10. **C. Perwass and L. Wietzke, “Single-lens 3D camera with
extended depth-of-field,” Proc. SPIE **8291**, 829108
(2012). [CrossRef]

**11. **C. Hahne, A. Aggoun, S. Haxha, V. Velisavljevic, and J. C. J. Fernández, “Light field geometry of a
standard plenoptic camera,” Opt.
Express **22**(22),
26659–26673 (2014). [CrossRef] [PubMed]

**12. **C. Hahne, A. Aggoun, and V. Velisavljevic, “The refocusing distance of a
standard plenoptic photograph,” in
3DTV-Conference: The True Vision - Capture, Transmission and
Display of 3D Video (3DTV-CON)
(2015).

**13. ** Radiant ZEMAX LLC, “Optical design
program,” version 110711
(2011).

**14. **D. G. Dansereau, “*Plenoptic signal processing for
robust vision in field robotics*,” (University
of Sydney, 2014).

**15. **D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding, calibration and
rectification for lenselet-based plenoptic cameras,”
in IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) (2013), pp.
1027–1034.

**16. **E. Hecht, *Optics*,
4th ed. (Addison
Wesley, 2001).

**17. **E. F. Schubert, *Light-Emitting Diodes*
(Cambridge University,
2006). [CrossRef]

**18. **D. Cho, M. Lee, S. Kim, and Y.-W. Tai, “Modeling the calibration
pipeline of the lytro camera for high quality light-field image
reconstruction,” in IEEE International
Conference on Computer Vision (ICCV) (2013), pp.
3280–3287.

**19. **C. Hahne, “Matlab implementation of
proposed refocusing distance estimator,”
figshare (2016) [retrieved 8 September
2016],
http://dx.doi.org/10.6084/m9.figshare.3383797.

**20. **B. Caldwell, “Fast wide-range zoom for 35
mm format,” Opt. Photon. News **11**(7),
49–51 (2000). [CrossRef]

**21. **M. Yanagisawa, “Optical system having a
variable out-of-focus state,” US
Patent 4,908,639 (March 13,
1990).

**22. ** TRIOPTICS, “MTF measurement and further
parameters,” (2015), [retrieved 3 October
2015] http://www.trioptics.com/knowledge-base/mtf-and-image-quality/.

**23. **C. Hahne, “Zemax archive file containing
plenoptic camera design,” figshare
(2016) [retrieved 8 September 2016], http://dx.doi.org/10.6084/m9.figshare.3381082.

**24. **E. Mavridaki and V. Mezaris, “No-reference blur assessment
in natural images using Fourier transform and spatial
pyramids,” in IEEE International Conference on
Image Processing (ICIP) (2014), pp.
566–570.

**25. **M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus
and correspondence using light-field cameras,” in
IEEE International Conference on Computer Vision
(ICCV) (2014), pp.
673–680.

**26. **Z. Xu, J. Ke, and E. Y. Lam, “High-resolution lightfield
photography using two masks,” Opt.
Express **20**(10),
10971–10983 (2012). [CrossRef] [PubMed]

**27. **C. Hahne, “Raw image data taken by a
standard plenoptic camera,” figshare
(2016) [retrieved 8 September 2016], http://dx.doi.org/10.6084/m9.figshare.3362152.