Computer Graphics: Camera

May 14, 2025

Camera

This report outlines the fundamental principles of camera operation, both in the physical world and within the context of computer graphics.
It covers concepts ranging from light convergence and focusing to the implementation of visual effects like defocus blur and anti-aliasing.

Constructing the Camera’s Basis Axes

In 3D graphics, a camera’s local coordinate system is defined by three orthogonal basis vectors.
These vectors are constructed to align with the camera’s orientation in the world, allowing for the correct projection of 3D objects onto a 2D viewport.
The process relies on three key parameters:
- the camera’s position (lookfrom)
- the point it is aimed at (lookat)
- a world-space up vector

Defining the Camera’s Local Axes

The camera’s local axes (typically referred to as the u-v-w basis or right-up-view basis) can be calculated using vector operations:
1. View Direction (w-axis)
  - The first axis is the camera’s view direction.
    - This is a vector pointing from the lookfrom position to the lookat point.
  - To form a proper basis vector, it must be normalized.
    - $w=\text{unit_vector}(\text{lookfrom}−\text{lookat})$
  - The vector is defined as lookfrom - lookat to ensure the w-axis points away from the scene, which is the standard convention for left-handed coordinate systems often used in computer graphics.
2. Right Vector (u-axis)
  - The second axis, or “right” vector, is perpendicular to both the view direction and the world-space up vector.
    - It is computed using a cross product.
  - The world-space up vector is typically $(0,1,0)$, but it’s important to use a normalized version of it in the calculation.
    - $u=\text{unit_vector}(\text{up_vector}\times w)$
3. Corrected Up Vector (v-axis)
  - The third axis is the “up” vector in the camera’s local space.
    - It must be perfectly orthogonal to both the view direction and the right vector.
  - While an initial up vector is provided, a new, corrected up vector is generated to ensure this orthogonality.
    - This is also calculated using a cross product.
    - $v=w\times u$
  - The order of the vectors in the cross product is crucial for a right-handed or left-handed coordinate system.
    - The order shown above is consistent with the left-handed system used in the initial derivation of $u$.

The Importance of Normalization

While the cross product of two orthogonal unit vectors theoretically results in another unit vector, this isn’t always the case in practice.
- Due to the limitations of floating-point arithmetic, the computed magnitude may be slightly off from 1.
Therefore, it is a standard and essential practice to normalize the resulting vectors after each cross-product operation.
- While the lookfrom and lookat points define a direction, the magnitude of the resulting vectors from subtraction and cross-product operations can vary.
- Normalization ensures that each basis vector has a unit length, which is a fundamental requirement for a consistent and stable coordinate system.
This process is crucial for mitigating precision errors that can accumulate from repeated vector calculations.

How a Camera Forms an Image

Image Point and Image Plane

A camera forms an image by converging light rays from an object onto a sensor.
- While an object reflects light in many directions, a camera’s lens focuses these rays to a single point, called the Image Point, on a 2D surface known as the Image Plane.
- Collectively, these points form an inverted image on the sensor.
In computer graphics, the viewport is the 2D window on the screen where the 3D scene is rendered.
- It represents the final rendered image and can be considered the digital equivalent of the physical image plane.

Circle of Confusion (CoC)

When light rays from a point on an object do not perfectly converge on the image plane, they form a small circle instead of a single point.
- This is known as the Circle of Confusion (CoC) .
The size of the CoC directly determines an object’s sharpness in the final image.
- A smaller CoC results in a sharper, more in-focus image.
By adjusting the lens, a photographer can control the size of the CoC for objects at different distances, thus bringing them into focus.

Focal Point and Focal Length

The concepts of the Focal Point and Focal Plane are specific to light rays originating from objects at an infinite distance, where the rays are effectively parallel.
- In this scenario, all parallel rays converge to a single point on the Focal Plane, known as the Focal Point.
Focal Length is the distance between a lens’s optical center and its principal focal point.
- For a real-world camera, if the subject is at an infinite distance, the image plane is at the same location as the focal plane,
  - so the focal length can be considered the distance from the lens’s optical center to the image sensor.
Lenses are classified as either prime (fixed focal length) or zoom (variable focal length).
- A zoom lens changes its focal length by moving its internal elements, which alters the field of view.

Field of View and Zooming

The Field of View (FOV) is the extent of the observable world that is seen at any given moment.
- In photography and computer graphics, it is the angle that determines how much of a scene the camera can capture, essentially controlling the perspective and zoom level.
In the real world, a camera’s field of view is intrinsically linked to the focal length.
- A specific lens with a fixed focal length (e.g., a 50mm lens) will have a specific, non-negotiable FOV.
  - A longer focal length (telephoto lens) results in a narrower field of view, which magnifies the scene and produces a “zoomed-in” effect.
  - Conversely, a shorter focal length (wide-angle lens) provides a wider field of view, capturing more of the scene and creating a “zoomed-out” effect.
Unlike real-world cameras,
- computer graphics engines often allow the field of view to be set independently of the focal length for greater creative control, though this is not physically accurate.

Aperture

In an optical system, the aperture is the opening that controls the amount of light entering the lens.
A larger aperture permits light to enter from a wider range of angles, making it more challenging for the lens to converge these rays to a single point of focus.
- This results in a sharp image for objects at the exact focal distance, while objects at other distances appear progressively blurrier due to a larger circle of confusion (CoC).
The size of the aperture is typically represented by its f-number (N), also known as an f-stop, which is calculated as the ratio of the lens’s focal length (f) to the diameter of the effective aperture (D).
\[N={f\over D}\]
For a fixed-focal-length lens (i.e., no zooming), the f-number is inversely proportional to the aperture’s diameter.
The aperture, via its f-number, determines the degree of blur for objects outside this plane.
- When the f-number increases, the aperture diameter decreases.
  - This reduces the angle at which light rays enter the lens, thereby shrinking the CoC.
  - Consequently, objects at distances other than the focal distance appear less blurry.
- Conversely, when the f-number decreases, the aperture diameter increases.
  - This widens the angle of incoming light, expanding the CoC and causing objects outside the focal distance to appear more blurry.

Depth of Field

The depth of field (DoF) is the range of distances within a scene that appears acceptably sharp in an image.
- An object is considered “in focus” if the CoC formed by its reflected rays on the image plane is small enough to be perceived as a single point by the viewer.
The size of the CoC depends on an object’s distance from the focal plane.
- A larger depth of field means a greater range of distances will appear in focus.
- This implies that even objects located significantly far from the focal distance may still appear acceptably clear, provided they remain within the boundaries of the DoF.

Focusing

When a camera captures a scene, every object exists at a specific depth, or distance from the camera.
Think of the world as being composed of numerous parallel object planes, each perpendicular to the camera’s view direction.
- Ideally, light rays originating from points on a single object plane would converge to a single corresponding image plane within the camera.
- This implies that each object plane has its own distinct image plane where its light rays would be in perfect focus.
However, a camera’s sensor can only occupy a single, fixed image plane at any given moment.
- When this plane is selected for rendering, rays from objects on all other planes will not be in sharp focus.
- Instead of converging to a single point, these rays will intersect the selected image plane as a blurred disc known as the circle of confusion (CoC).
- The size of this circle determines the degree of blur for that specific point.

How to Choose the Image Plane

The camera determines the appropriate image plane to render by adjusting its internal optics, a process explained by the Thin Lens Equation:
\[{1\over f} = {1\over u} + {1\over v}\]
- where
  - $f$ is the focal length of the lens.
  - $u$ is the focus distance, the distance from the lens to the object plane that is intended to be in perfect focus.
  - $v$ is the image distance, the distance from the lens’s optical center to the image sensor or image plane.
    - This is the variable that is physically adjusted to achieve focus.
The Thin Lens Equation shows that for a given focal length ($f$), the image distance ($v$) is uniquely determined by the focus distance ($u$).
- Therefore, achieving focus on an object at a specific distance is accomplished by physically moving the lens assembly relative to the image sensor.
- This physical movement of the optical center ensures that rays from the desired object plane converge precisely onto the camera’s sensor, thus bringing that object into sharp focus.

Implementing Visual Effects

Defocus Blur (Depth of Field)

Defocus blur, commonly known as Depth of Field (DoF), occurs when objects are not within the camera’s focus range, causing their CoC to become perceptibly large.
In real-time rendering, simulating this physical process is computationally expensive.
- Therefore, a common alternative is a post-processing technique using a G-Buffer.
A G-Buffer (Geometry Buffer) stores per-pixel data, including depth, position, and surface normals.
- A post-processing shader then uses this depth information to calculate a blur radius for each pixel based on its distance from the camera’s focal plane.
  - If a pixel’s depth is within the DoF boundaries,
    - its Circle of Confusion (CoC) radius is set to zero (meaning no blur).
  - If a pixel’s depth is outside the boundaries,
    - the shader calculates a blur radius.
  - This radius is scaled based on how far the pixel’s depth is from the in-focus boundaries.
    - The further a pixel is from the plane of focus, the larger the calculated blur radius will be.
Pixels within the DoF receive little to no blur, while pixels outside of it are blurred according to their distance from the focal plane, simulating the defocus effect.

Motion Blur

Motion blur is an effect that simulates the streaking of objects in motion, as a result of a short exposure time.
- This is implemented by averaging the color of a pixel across multiple frames or time steps, similar to how defocus blur is achieved by averaging samples across a spatial area.
The final color of a pixel is a blend of its colors at different moments in time, creating the illusion of motion.

Click to see the code

  Ray getRayToSample(int currentWidth, int currentHeight) const {
      auto offset = getSampleSquare();
      auto pixelSample = pixelCenterTopLeft
          + ((currentWidth + offset.getX()) * pixelDeltaWidth)
          + ((currentHeight + offset.getY()) * pixelDeltaHeight);

      auto rayOrigin = (defocusAngle <= 0) ? center : getDefocusRandomPoint();
      auto rayDirection = pixelSample - rayOrigin;
      auto rayTime = getRandomDouble();

      return Ray(rayOrigin, rayDirection, rayTime);
  }

Anti-Aliasing

Aliasing is a common visual problem in computer graphics where sharp edges and lines appear jagged or “stair-stepped.”
This happens because a continuous, smooth 3D scene is projected onto a discrete grid of pixels on a screen.
- Each pixel can only display a single, solid color. The core reason for this issue is a process called point sampling.

Why Aliasing Occurs

In ray tracing, the color of a pixel is traditionally determined by casting a single ray from the camera, through the center of the pixel, into the scene.
- The color of whatever the ray hits is then used for that entire pixel.
The issue is that a single pixel represents a small square area of the scene, not just a single point.
- If a sharp edge—for instance, where a black object meets a white background—falls within that pixel’s area, the single ray cast to the center might only hit the white background.
- The black part of the edge is completely missed.
This “all-or-nothing” approach leads to the loss of detail, causing the jagged stair-step appearance.
- A pixel’s color is determined by a single sample, but the area it represents may contain multiple colors in the scene.

How to Fix

The solution is to use supersampling, which involves casting multiple rays for each pixel instead of just one.
- The key idea is to sample different locations within the pixel’s area and then average the resulting colors to get a more accurate representation of the true color.
- This blending of colors smooths out the transitions between objects, eliminating the jagged edges.
Supersampling methods differ in how they choose the multiple sample locations within each pixel:
- Grid Sampling
  - The pixel is divided into a regular grid (e.g., 2x2, 4x4), and a ray is cast to the center of each sub-grid cell.
    - For a 4x4 grid,
      - 16 rays are cast
      - and their average color value will be the color value of that cell
  - This method is straightforward and effective but can sometimes miss fine details or produce its own visual artifacts if the sampling grid aligns with a repeating pattern in the scene.
- Jittered Sampling:
  - This is a more advanced technique.
    - It also divides the pixel into a grid, but instead of casting a ray to the center of each cell, it casts the ray to a randomly chosen point within each cell.
  - This small bit of randomness helps prevent the structured artifacts of grid sampling while still ensuring that samples are distributed evenly across the pixel’s area.
  - Jittered sampling produces a more natural and accurate result with fewer visible artifacts.

Top

Kyle

Computer Graphics: Camera

Camera

Constructing the Camera’s Basis Axes

Defining the Camera’s Local Axes

The Importance of Normalization

How a Camera Forms an Image

Image Point and Image Plane

Circle of Confusion (CoC)

Focal Point and Focal Length

Field of View and Zooming

Aperture

Depth of Field

Focusing

How to Choose the Image Plane

Implementing Visual Effects

Defocus Blur (Depth of Field)

Motion Blur

Anti-Aliasing

Why Aliasing Occurs

How to Fix

Leave a comment

You may also enjoy

3D Physics Engine

Computer Graphics: Materials

Computer Graphics: Colors

Project 4: File Systems