Camera Model

Tag: 3D Vision; Date: 12 January 2020

A typical camera model composed by following coordinate system:

World coordinate: The base frame that defined by user to represent the object’s 3D location. A general point can be described in world coordinate as $P_W = (X_W, Y_W, Z_W)$, unit is physical quantity(e.g. m/cm/mm).
Camera coordinate: The frame system that origin is optical center, to represent the object’s location from camera’s perspective. A general point can be describe in camera coordinate as $P_C = (X_C, Y_C, Z_C)$, unit is physical quantity(e.g. m/cm/mm).
Image coordinate: The frame used to describe the plane of the perspective projection inside the camera. A point on the projection plane is described as $p = (x, y, 1)$, unit is physical quantity(e.g. m/cm/mm).
Pixel coordinate: Shares the same plane with Image coordinate but with different origin. Use to describe the pixel location $(u, v)$ on a digital image, unit is pixel.

/assets/images/Screen_Shot_2021-01-12_at_1.33.17_PM.png

From a point to a pixel

When we take a picture using camera, the location of objects in Euclidean space being transform to the planar pixel location.

/assets/images/Screen_Shot_2021-01-12_at_9.17.42_PM.png

World Coordinate → Camera Coordinate

Rigid Transformation

/assets/images/Screen_Shot_2021-01-12_at_9.20.23_PM.png

Let’s represent the position of a 3D point in world coordinate $P_W=\begin{bmatrix} X_W
Y_W
Z_W
\end{bmatrix}$. The same point can also be expressed in camera coordinate $P_C = \begin{bmatrix} X_C
Y_C
Z_C
\end{bmatrix}$, by using rigid transformation.

\[\begin{bmatrix} X_C \\ Y_C \\ Z_C \\ \end{bmatrix} = R \begin{bmatrix} X_W \\ Y_W \\ Z_W \\ \end{bmatrix} + \vec{T}\]

, where $R$ is a 3x3 matrix and $\vec{T}$ is a 3x1 vector.

We can also write like

\[\begin{bmatrix} X_C \\ Y_C \\ Z_C \\ 1 \\ \end{bmatrix} = \begin{bmatrix} R & \vec{T}\\ \vec{0} & 1\\ \end{bmatrix} \begin{bmatrix} X_W \\ Y_W \\ Z_W \\ 1 \\ \end{bmatrix}\]

Camera Coordinate → Image Coordinate

Perspective Projection

The point $P_C$ go through the lens and project point $p$ onto the sensor inside the camera:

/assets/images/Screen_Shot_2021-01-12_at_9.52.28_PM.png

\[\triangle ABO_C \sim \triangle oCO_C\] \[\triangle PBO_C \sim \triangle PCO_C\]

Because of the similar triangle

\[\frac{X_C}{x} = \frac{Y_C}{y}= \frac{Z_C}{f}\] \[x = f\frac{X_C}{Z_C}, y = f\frac{Y_C}{Z_C}\]

where $f$ is the focal length. Now we can write the relation between the Image Coordinate with Camera Coordinate.

\[Z_C\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} X_C \\ Y_C \\ Z_C \\ 1 \end{bmatrix}\]

The $Z_C$ on the left side can be replace with scaling factor $s$, because this equation hold for any point on the projection line $\vec{pP}$.

\[s\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \begin{bmatrix} X_C \\ Y_C \\ Z_C \\ 1 \end{bmatrix}\]

Given any point, we can now find the location of perspective projection of that point.

Image Coordinate → Pixel Coordinate

Affine Transformation

Physical length quantity → Pixel: After the point be projected onto sensor inside the camera, the camera generate a digital image of the point that composed by pixels.

The relation between Image Coordinate and Pixel Coordinate can be described by the Affine Transformation(scaling + shifting).

/assets/images/Screen_Shot_2021-01-12_at_11.05.39_PM.png

\[\begin{cases} u = \frac{x}{dx} + u_0 \\ v = \frac{y}{dy} + v_0 \end{cases}\] \[\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{1}{dx} & 0 & u_0\\ 0 & \frac{1}{dy} & v_0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}\]

$dx, dy$: The physical quantity of one pixel(e.g. how long(mm/m/cm..) is a pixel).

$\frac{x}{dx}$: Turn the unit of physical quantity to unit of pixel.

Overall: World Coordinate → Pixel Coordinate

Extrinsic parameters: represents a rigid transformation from 3-D world coordinate system to 3-D camera coordinate system

Intrinsic parameters: represents a projective transformation from 3-D camera’s coordinates into the 2-D image coordinates.