Projective Geometry & Camera Modeling

(Summary of concepts by Toulouse de Margerie, McGill University)

 
 

To fully understand Computer Vision and in general Computer Graphics, one needs a good understanding of projective geometry and camera modeling since these are the basis for much of the data handled in these fields.  I will try to summarize some of the fundamental concepts of the projective plane P2 and projective space P3 and why these are important when dealing with computer graphics.  Furthermore, I will discuss the basics of camera modeling and relation to projective geometry.
 

P2 - The Projective Plane

The set of equivalence classes of triplets of real numbers which are scalar multiples of each other.  ie. Where p=(x,y,z) is equivalent to p'=(x',y',z') iff (x,y,z) = (sx',sy',sz') for some s.  Hence, any point defined in P2 is defined only up to an arbitrary scale factor.

Points in the projective plane are said to be homogeneous coordinates.
 

P3 - The Projective Space

This is the direct equivalent to P2 but in one higher dimension.  As expected, coordinates are expressed as quadruples but only up to an arbitrary scale factor.
 

Proper/Improper Coordinates

If we define a plane that cuts through the projective space, such as Z=1, we can consider points in P2 to be either proper or improper.  Those with coordinates which intersect the plane are proper and those which do not are improper.  In this case only those points of the form (x,y,0) are improper.  This implies that proper points can all be normalized, in this case, as (x/z, y/z, 1)
 

Projective Transformations

Projective transformations are simply linear transformations between projective spaces.

The standard basis' for P2 and P3 are (0,0,1), (0,1,0), (1,0,0), (1,1,1) and (0,0,0,1), (0,0,1,0), (0,1,0,0), (1,0,0,0), (1,1,1,1) respectively.  Note, that from this we infer that any projective transformation from P2  to itself is completely determined by its action on 4 points, and one from P3 to itself is determined by its action on 5 points.  This holds in general for Pn with n+2 points.
 


As we will see, in many situations, it is useful to express coordinates in P2 or P3 as opposed to the standard Euclidean spaces R2 or R3.  When dealing with geometry in projective dimensions we retain the concepts of points, lines, and incidence, but not angles and lengths.
 
 

Modeling Camera Parameters

It is fundamental to computer vision to grasp the geometric representation of cameras.  Here I will explain the geometric model commonly used to represent cameras and how these relate 3D points in the world to 2D points in images.  It turns out that thinking of points in projective spaces makes things much easier when building mathematical models of camera projections.  Hence, we will look at the camera model equations in this context and explain how they relate to Euclidean spaces.

We will discuss the parameters of camera models in the context of two general categories.  Those which are external to the camera (the extrinsic parameters) and those internal to its physical makeup (the intrinsic parameters).
 

Extrinsic Camera Parameters

The extrinsic camera parameters are those which relate a camera's position with respect to a world coordinate reference frame.  Hence it is simply the translation and rotational transformation from the world origin to the camera's position.  These are often represented as a 3D translation vector T and a 3x3 rotation matrix R.  This rotation matrix is orthogonal (ie RTR = RRT = I) and therefore only has 3 degrees of freedom (DOF).  This makes for a total of 6 DOF for the extrinsic parameters.

Given a world coordinate Pw, the same point is mapped to Pc in the camera's frame of reference according to

Pc = R ( Pw - T )
These extrinsic parameters can be thought of physically as specifying the position and orientation of the camera in space.

The same equation relating world coordinates to camera coordinates can be expressed conveniently in matrix form

      |  r11   r12   r12   -R1TT  |
Mext = |  r21   r22   r23   -R2TT  |
      |  r31   r32   r33   -R3TT  |
where rij are the elements and Ri are the rows of the rotation matrix R.
Now, if we augment the point Pw with a 1 as a fourth entry in its otherwise 3D coordinate, we can express the same equation relating world coordinate to camera coordinates by
Pc = Mext PwT

 
 

Intrinsic Camera Parameters

The set of intrinsic camera parameters are those which govern how 3D points in the camera's frame of reference are mapped to 2D coordinates in the images it produces.  These are the internal parameters of the camera and can be divided in to three separate categories: the projective transformation, camera frame to pixel coordinate transformation, and geometric distortion.
 

The Projective Parameter

The fundamental equations of perspective image formation tell us that 3D coordinates are related to 2D image plane coordinates by
x = f ( X/Z )      and      y = f ( Y/Z )
The focus parameter, f, is the only parameter which describes the projective transformation part of a camera system.  It is at this point that we will find describing points in a projective space convenient.  Note that we can express the same relationship in matrix form
       |  f  0  1  |
Mproj = |  0  f  1  |
       |  0  0  1  |
Now, a 3D point P is related to a point in the camera's image plane, p, by
p = Mproj P
but only if we are willing to accept that p is now expressed in the projective plane P2.  However, we quickly see that this representation of the coordinate p can be transformed into its camera image plane coordinates simply by dividing it by the third component.  You can verify that this gives the exact same equations as the fundamental equations of projective given above.
 

Camera Frame to Pixel Coordinate Transformation

There is also a transformation inherent in conversion of points in the camera's image plane to the actual array of pixels output by the camera.  These are encapsulated in two parameters, the image center (ox, oy) and the effective pixel size (sx, sy).  These two parameters relate pixels in the camera reference frame (x,y) to the pixel coordinate frame (xim, yim) by
x = - ( xim - ox ) sx       and       y = - ( yim - oy ) sy
This relationship too can be expressed concisely in matrix form as
      |  1/sx    0    ox  |
Mcam = |   0    1/sy   oy  |
      |   0      0    1   |
again using the same technique of expressing camera coordinates and pixel coordinates in P2, we have
pim = Mcam p
where pim is the pixel coordinate and p is the camera image plane coordinate.
 

Geometric Distortion

Finally, we also address the issue of radial distortion introduce by the optics of a real physical camera.  It is sometimes noticeable that near the edges of a camera's images, the projected scene points become distorted.  In most cases this phenomena can be modeled accurately by the equations
x = xd ( 1  +  k1r2  +  k2r4 )
y = yd ( 1  +  k1r2  +  k2r4 )
where (xd, yd) is the distorted coordinate point and r = ( x2 + y2 )

Here k1 and k2 can be considered as further intrinsic camera parameters which model the radial distortion introduced by the optics.  In many cases both k1 and k2 are assumed to be small enough that they can be ignored or at least k2 by itself is small enough to be ignored.
 

The Overall Camera Model

In conclusion, ignoring the possible radial distortion, the intrinsic camera parameters can be expressed compactly as
                 | -f/sx     0      ox  |
Mint = Mproj Mcam = |    0   -f/sy     oy  |
                  |    0      0       1  |
Furthermore the extrinsic and intrinsic matrix form of the camera parameters can compactly express the relation between 3D world coordinates and 2D image coordinates (as homogenous coordinates in P2) as
[x y z]T = Mint Mext [X Y Z 1]T
Trying to express these transformations directly in Euclidean space forces us to abandon this simple linear relationship between 3D world points and 2D image points.