Target Poses: Constraints, Rank, and Correlation

We are often asked questions relating to the type of poses that need to be captured for realiable and robust camera calibration. While the answer to this is quite complex and depends on many factors, such as lens, setup, capture constraints, pattern budget, detector noise, etc., there are some fundamental mathematical principles which we will uncover in this article. Understanding these relationships will help us design better calibration procedures and improve the accuracy of calibration.

Problem setup

We start with the pinhole camera model and use simplified planar target. For a target feature point $\mathbf{P}_i=[X_i,Y_i,0,1]^\top$, we first change its coordinate base to that of the camera, then project to the image plane:

$$ \begin{bmatrix}x_{c,i}\\y_{c,i}\\z_{c,i}\end{bmatrix} = \mathbf{R}\,\mathbf{P}_i+\mathbf{t}, \qquad \begin{bmatrix} \hat{u}_i\\ \hat{v}_i\\ 1 \end{bmatrix} = \begin{bmatrix} f\frac{x_{c,i}}{z_{c,i}} + c_x\\ f\frac{y_{c,i}}{z_{c,i}} + c_y\\ 1 \end{bmatrix} $$

For the following formulas, we shall parameterize the rotation using an axis-angle vector $\mathbf{r}=[r_x,r_y,r_z]^\top$.

One view of four points contributes 8 constraints (4 points × 2 pixel coordinates). However, only if all parameter are identifiability can we determine 8 parameter values. The residual vector contains differences between detected and projected point coordinates

$$ \mathbf{r}(\boldsymbol{\theta})= [\hat{u}_1-u_1^{\text{meas}},\hat{v}_1-v_1^{\text{meas}},\ldots,\hat{u}_4-u_4^{\text{meas}},\hat{v}_4-v_4^{\text{meas}}]^\top \in\mathbb{R}^{8} $$

In camera calibration, we usually estimate the parameters $\hat{\boldsymbol{\theta}}$ by nonlinear least squares:

$$ \hat{\boldsymbol{\theta}}= \arg\min_{\boldsymbol{\theta}}\;\frac{1}{2}\|\mathbf{r}(\boldsymbol{\theta})\|_2^2 $$

At iteration $k$, we linearize the projection function around the current estimate $\boldsymbol{\theta}_k$:

$$ \mathbf{r}(\boldsymbol{\theta}_k+\Delta\boldsymbol{\theta}) \approx \mathbf{r}_k + \mathbf{J}_k\Delta\boldsymbol{\theta}, \qquad \mathbf{J}_k=\left.\frac{\partial\mathbf{r}}{\partial\boldsymbol{\theta}}\right|_{\boldsymbol{\theta}_k} $$

Here $\mathbf{J}_k$ is the Jacobian matrix of the residual vector with respect to the parameters. With this approximation, the local optimization problem becomes:

$$ \Delta\boldsymbol{\theta} = \arg\min_{\Delta\boldsymbol{\theta}}\;\frac{1}{2}\|\mathbf{r}_k + \mathbf{J}_k\Delta\boldsymbol{\theta}\|_2^2. $$

Which can be found by taking the derivative with respect to $\Delta\boldsymbol{\theta}$ and setting equal to zero. This results in the linear system:

$$ \mathbf{J}_k^\top\mathbf{J}_k\,\Delta\boldsymbol{\theta} = -\mathbf{J}_k^\top\mathbf{r}_k, $$

which is solved in every iteration, and parameters are updated as $\boldsymbol{\theta}_{k+1}=\boldsymbol{\theta}_k+\Delta\boldsymbol{\theta}$.

Parameter identifiability is governed by the rank and condition number of $\mathbf{J}$. If columns are dependent, $\mathbf{J}$ is singular (has a rank lower than the numbe of columns). If columns are nearly dependent, the condition number, $\mathrm{cond}(\mathbf{J})$ is high and the update is unstable and sensitive to noise.

The matrix $\mathbf{H}=\mathbf{J}^\top\mathbf{J}$ is called the approximated Hessian or information matrix. It summarizes how much information the image measurements provide about local parameter perturbations. Large diagonal entries mean a parameter changes the image strongly; large off-diagonal entries mean two parameters induce similar image changes and can therefore be confused.

We can derive from it an unscaled covariance matrix (under a few assumptions)

$$ \mathbf{C}_0 = \mathbf{H}^{-1} $$

From this covariance, the correlation matrix $\mathbf{R}$ is obtained by normalizing all entries:

$$ R_{ij}=\frac{(C_0)_{ij}}{\sqrt{(C_0)_{ii}(C_0)_{jj}}} $$

This matrix answers: which parameters vary together in the local estimate?

We can also compute a partial correlation matrix, which contains conditional correlations. Interpretation: values near $\pm 1$ mean parameters $i$ and $j$ remain hard to distinguish even after conditioning on the rest; values near $0$ mean their direct local coupling is weak.

The displayed uncertainty values are the corresponding unscaled standard deviations, obtained from the diagonal of $\mathbf{C}_0$:

$$ \sigma_i = \sqrt{(C_0)_{ii}} $$

So, for example, $\sigma_f=\sqrt{(C_0)_{ff}}$, $\sigma_{c_x}=\sqrt{(C_0)_{c_xc_x}}$, and $\sigma_{t_z}=\sqrt{(C_0)_{t_zt_z}}$. Larger values mean weaker local evidence for that parameter.

Frontoparallel views

We begin with a simple capture situation: a planar target held head-on (frontoparallel) with respect to the camera. This is a natural configuration and some calibration can and should be frontoparallel.

However, this is also the situation where an important ambiguity arises. Let the unknown parameters be $\boldsymbol{\theta}=[f,r_x,r_y,r_z,t_x,t_y,t_z]^\top$. If the board is exactly frontoparallel, then $\mathbf{R}=\mathbf{I}$ and every corner has the same camera depth $z_{c,i}=t_z$. For a board point $\mathbf{P}_i=[X_i,Y_i,0]^\top$, the pinhole equations reduce to

$$ u_i = f\frac{X_i+t_x}{t_z}, \qquad v_i = f\frac{Y_i+t_y}{t_z}. $$

So the image depends on $f$ and $t_z$ only through the ratio $f/t_z$. If we define $\alpha=f/t_z$, then

$$ u_i=\alpha(X_i+t_x), \qquad v_i=\alpha(Y_i+t_y). $$

This is the fundamental scale-depth coupling: multiplying $f$ and $t_z$ by the same factor leaves the projected image unchanged.

The local Jacobian makes this explicit. Let

$$ x_i=X_i+t_x, \qquad y_i=Y_i+t_y, \qquad z_i=t_z. $$

The two Jacobian rows contributed by corner $i$ are

$$ \frac{\partial(u_i,v_i)}{\partial(f,r_x,r_y,r_z,t_x,t_y,t_z)} = \begin{bmatrix} \dfrac{x_i}{z_i} & -\dfrac{f x_i y_i}{z_i^2} & f\!\left(\dfrac{1}{z_i}+\dfrac{x_i^2}{z_i^2}\right) & -\dfrac{f y_i}{z_i} & \dfrac{f}{z_i} & 0 & -\dfrac{f x_i}{z_i^2}\\[8pt] \dfrac{y_i}{z_i} & -f\!\left(\dfrac{1}{z_i}+\dfrac{y_i^2}{z_i^2}\right) & \dfrac{f x_i y_i}{z_i^2} & \dfrac{f x_i}{z_i} & 0 & \dfrac{f}{z_i} & -\dfrac{f y_i}{z_i^2} \end{bmatrix}. $$

Stacking the four corners gives the full matrix $\mathbf{J}\in\mathbb{R}^{8\times 7}$.

We notice a linear relationship between the columns associated with $f$ and $t_z$. For every corner,

$$ \frac{\partial u_i}{\partial t_z} = -\frac{f}{t_z}\frac{\partial u_i}{\partial f}, \qquad \frac{\partial v_i}{\partial t_z} = -\frac{f}{t_z}\frac{\partial v_i}{\partial f}. $$

Therefore, over the entire stacked Jacobian,

$$ \mathbf{J}_{t_z} = -\frac{f}{t_z}\,\mathbf{J}_f. $$

So the $t_z$ column is an exact scalar multiple of the $f$ column. Even if we appended more rows corresponding to different target poses and points, the dependence would remain. As a consequence, $\mathbf{J}$ does not have full rank (is singular), and the linear system of equations cannot be solved uniquely.

Focal $f$ [px] Depth $t_z$ [m] Lock $f/t_z$ ratio

When locking the $f/t_z$ ratio, notice how points always project to exactly the same spots on the image plane. If we blindly calibrated with only frontoparallel views, reprojection errors might be low, but the focal length would not be determined accurately at all, simply because it was not identifiable from the given evidence.

Frontoparallel views cannot separate focal scale $f$ from depth $t_z$.
This leads to an exact linear dependence in the Jacobian matrix (singular).
We need calibration evidence that distinctly identifies all parameters.

Tilted views disentangle parameters

The next natural question is what changes when we tilt the target. Intuitively, tilt should help, because the corners now occupy different depths and therefore respond differently to changes in camera parameters.

We will assume $r_x=0$ and $r_z=0$ while varying $r_y$. Once the board is tilted, the four corners no longer share the same depth, so the frontoparallel symmetry is broken: changing focal length, depth, and rotation no longer produces the same image motion at every corner.

Tilt $r_y$ [deg] Board size

We now get a full rank Jacobian and technically a unique solution. Notice how correlation decreases when board size is increased because it results in more depth variation. However, the condition number is high, and more evidence would be needed to improve decoupling and reduce the uncertainty in focal length.

Tilt introduces depth variation across the corners, which decorrelates $f$ and $t_z$.
Large calibration boards allow for more depth variation.

Principal point estimation

So far, we have only included focal length as an intrinsic parameter. In practice, however, calibration often becomes harder because we want to estimate more parameters, such as principle point coordinates. This is similar to increasing model flexibility in a general fitting problem: the model can explain more variation, but it also demands more evidence from the data.

To illustrate that effect, we now add the principal point coordinate $c_x$ to the parameter vector: $\boldsymbol{\theta}=[f,r_x,r_y,r_z,t_x,t_y,t_z,c_x]^\top$. In this scenario, $r_z=0$ while both $r_x$ and $r_y$ can be varied by the sliders.

Principal point $c_x$ [px] Roll $r_x$ [deg] Tilt $r_y$ [deg]

Once principal point parameters are introduced, the calibration data must contain enough asymmetry and viewpoint variation to distinguish them from pose and scale effects. Observe that $c_x$ estimation specifically needs a non-zero rotation about x. Otherwise, the additional flexibility simply shows up as larger uncertainty and stronger coupling.

Adding principal point parameters increases the dimension of the parameter space and demands more evidence.
$c_x$ estimation requires non-zero $r_x$ while $c_y$ estimation requires non-zero $r_y$.

Adding planes

Normal practice is to collect more than one image. In calibration terms that means adding more views, more pose diversity, or more target coverage.

Below we keep the same camera but observe two tilted planes. The shared focal scale is estimated once, while each plane has its own pose which needs to be estimated: $\boldsymbol{\theta}=[f,r_{1x},r_{1y},r_{1z},t_{1x},t_{1y},t_{1z},r_{2x},r_{2y},r_{2z},t_{2x},t_{2y},t_{2z}]^\top$.

Plane 1 tilt $r_{1y}$ [deg] Plane 2 tilt $r_{2y}$ [deg]

This setup adds parameters, but it also adds substantially more geometric support. The shared camera parameters now have to explain measurements coming from multiple planes at different orientations, which is exactly the kind of diversity that improves conditioning and reduces coupling.

Adding another plane provides more independent evidence while also adding six parameters to be estimated.
Multiple planes reduce harmful parameter coupling because the shared parameters must explain more varied observations.
Better calibration comes from diverse views and poses.

What about lens distortion?

Lens distortion behaves somewhat differently from the intrinsic and pose parameters discussed above. The key point is that distortion is primarily an image-plane effect: it describes how image points deviate from the ideal pinhole projection as a function of where they lie in the image.

For that reason, distortion does not fundamentally require tilted observations. What it really requires is coverage. If the target is only observed near the image center, then the calibration data contains very little information about how the lens behaves near the image boundaries, where distortion is usually strongest.

Tilt is still helpful in a full calibration because it improves overall parameter identifiability and helps separate different effects. But for distortion itself, broad coverage across the image is the more important requirement. In practice this means moving the target so that features appear near the edges and corners of the sensor, not just near the center.

Lens distortion is mainly an image-plane effect, not a depth-variation effect.
Distortion parameters need good image coverage, especially toward the borders.

Calibration takeaway

What can we learn from this analysis about how to design calibration procedures?

First, the geometry of the calibration target and its pose relative to the camera have a strong influence on the local identifiability of parameters.

Frontoparallel views alone cannot distinguish focal scale from depth, and do not constraint principal point coordinates.

However, even with tilt, some parameters can still be strongly correlated if their effects on the image are similar enough.

Therefore, it is important to consider not just the number of constraints but also their diversity and how they interact with the unknowns.

Calib.io's software products, Calibrator and libCalib provide tools to analyze and visualize these relationships, helping users design better calibration procedures and understand the limitations of their data.