Consistent Depth Estimation in a 4D Light Field

Contributions

We propose a novel paradigm to deal with depth reconstruction from 4D light fields. Our method offers both a fast purely local depth estimation within a light field structure, as well as the option to obtain very accurate estimates using a variational global optimization framework.

The input is a 4D light field parameterized as a Lumigraph [1].
The first step is a local depth labeling on 2D sections of the Lumigraph, so-called epipolar plane images (EPI)[2], described in Local Depth Labeling in EPI Space. This results in two channels $d_s,d_t$ with depth information and corresponding reliability estimates $r_s,r_t$.
Extracting the labels by comparing these reliabilities pixel-wise yields a 4D depth field. As an alternative to this simple reliability merge we also provide a (slower) global integration which produces a globally optimal solution to the depth map merging problem.
An optional global optimization step also offers a Consistent EPI Depth Labeling, taking the inherent structure of epipolar plane images into account.

4D Light Field Parametrization and Epipolar Plane Images (EPIs)


Light field geometry	Pinhole view at $ (s^,t^) $ and epipolar plane image $ S_{y^,t^}$

Each camera location $ (s^*,t^*) $ in the image plane $ \Pi $ yields a different pinhole view of the scene. By fixing a horizontal line of constant $ y^* $ in the image plane and a constant camera coordinate $ t^* $, one obtains an epipolar plane image (EPI) in $(x,s)$ coordinates. A scene point $ P $ is projected onto a line in the EPI due to a linear correspondence between its $s$- and projected $x$-coordinate.

Local Depth Labeling in EPI Space

In order to obtain the local depth estimate, we need to estimate the direction of epipolar lines on the EPI. This is done using the structure tensor $J$ of the epipolar plane image $S=S_{y^*,t^*}$,

\[ J = \left[ \begin{matrix} G_\sigma\ast(S_x S_x) & G_\sigma\ast(S_x S_y) \\ G_\sigma\ast(S_x S_y) & G_\sigma\ast(S_y S_y) \end{matrix} \right] = \left[ \begin{matrix} J_{xx} & J_{xy} \\ J_{xy} & J_{yy} \end{matrix} \right]. \]

Here, $G_\sigma$ represents a Gaussian smoothing operator at an outer scale~$\sigma$ and $S_x,S_y$ denote the gradient components of $S$ calculated on an inner scale $\tau$. The direction of the local level lines can then be computed via

\[n_{y^*,t^*} = \begin{bmatrix} J_{yy} - J_{xx}\\ 2 J_{xy} \end{bmatrix} = \begin{bmatrix} \Delta x \\ \Delta s \end{bmatrix}, \] from which we derive the local depth estimate (from equation in figure 'Light field geometry') as \[d_{y^*,t^*} = -f \frac{ \Delta s }{\Delta x }.\] As a reliability measure we use the coherence of the structure tensor, \[r_{y^*,t^*} := \frac{\left( J_{yy} - J_{xx}\right)^2+ 4J_{xy}^2}{\left( J_{xx} + J_{yy}\right)^2}.\]


Epipolar plane image	Local depth labeling

Consistent EPI Depth Labeling

Each scene point projects to a line in the epi-polar-plane image, with a slope inversely proportional to the distance to the image plane. Because of occlusion ordering, a line labelled with depth $\lambda_i$ corresponding to direction $n_i$ cannot be crossed by a line with a dept $\lambda_j > \lambda_i$, which is further away from the observer.


Allowed if $\lambda_i	Forbidden if $\lambda_i

We enforce these constraints by penalizing transitions from label $\lambda_i$ to $\lambda_j$ into direction $\nu$ with \[ p(\lambda_i, \lambda_j, \nu) := \begin{cases} 0 &\text{ if }i=j,\\ \infty &\text{ if } i [3], where the authors describe the construction of a regularizer $R$ to enforce the desired ordering constraints. As a data term, we use the absolute distance between local estimate above and candidate label, since impulse noise is dominant.


Epipolar plane image	Consistent depth labeling

Depth Integration

After obtaining EPI depth estimates $d_{y^*,t^*}$ and $d_{x^*,s^*}$ from the horizontal and vertical slices, respectively (either locally or from consistent depth labeling), we need to consolidate those estimates into a single depth map, which we obtain as the minimizer $u$ of a global optimization problem.

As a data term, we choose the minimum absolute difference to the respective local estimates weighted with the reliability estimates, \[ \begin{aligned} \rho(u,x,y) := \min( &r_{y^*,t^*}(x,s^*) | u - d_{y^*,t^*}(x,s^*) |, \\ &r_{x^*,s^*}(y,t^*) | u - d_{x^*,s^*}(y,t^*) | ). \end{aligned} \]
As a regularizer, we choose total variation, since this allows us to compute globally optimal solutions to the functional using the technique of functional lifting described in [4].

Depth Labeling Results

Results on synthetic light fields:


Central View	Local	Global	Consistent


Central View	Local	Global	Consistent

Results on stanford light fields:


Central View	Stereo	Global


Central View	Stereo	Global

Results on light fields from a plenoptic camera:


Central View	Stereo	Global


Central View	Stereo	Global

top