CS 184: Computer Graphics and Imaging, Spring 2018

Assignment 3: PathTracer



The purpose of this project is to add some realistic lighting to the meshes that were enabled by the previous project. In order to implement this lighting, there needs to be some kind of intersection algorithm (part 1). Checking intersection over all the primitives in a scene is egregiously bad, so bounding boxes are also enabled just like in the rasterizer (part 2). The last parts deal with actually tracing the path of the light across scene, simulating some bouncing and also optimizing this at the very end (part 5). It's important to note that all surfaces in this project are currently diffuse, which means that light bounces off of them in random directions to produce a dull, opaque texture.

Part 1: Ray Generation and Intersection

So begins our raytracing journey!

The raytracing loop generates the specified number of rays, shooting one ray out from the center of the pixel if there is only 1 sample and choosing at random otherwise. In generating each ray, the origin of the ray is placed on a sensor plane at z-value -1 (with appropriate conversions) and the t values are set. Later on, the t value will serve as a marker for primitive precedence in the scene. A bug happened during the implementation of this part where the t value actually wasn't being used correctly, causing a wall that should have been in behind a sphere to be in front of it.

The ray intersection algorithm uses the Moller Trumbore algorithm. This algorithm generates 3 numbers: (1) the intersection coordinate t, (2) the first barycentric coordinate b1, and (3) the second coordinate b2. b3 is found by:

b3 = 1 - b1 - b2

Finally, we can test whether the intersection is in the triangle by using the barycentric coordinates (all between 0 and 1) and also test whether another primitive is obstructing the view with the t value. The barycentric coordinates are also useful for shading (i.e. calculating the normal).

Here are a few images generated by the raytracing algorithm:

Sphere Normal Render
Gems Normal Render

Cow Normal Render

Note that the cow image took 800+ seconds to render with 8 threads on a VM.

Part 2: Bounding Volume Hierarchy

In this portion we work on accelerating the previous part.

The purpose of the BVH is to focus the intersection checks on small batches to minimize needless checks in blank space. To construct the tree of mesh batches, the algorithm follows the following logic:

  1. Intersect just the bounding box, and if that doesn't work, stop.
  2. If the current batch is too big, split primitives into two children nodes along the longest axis. If the split is uneven (i.e. one side has no primitives), the bounding box gets resized by half along the longest axis, and then a new axis split is calculated. The new axis might not be the same as the original axis.
  3. If the current batch is under the max requirement, then intersect every node to find the closest hit.

The actual logic for step 1, checking for bounding box intersection, is quite simple. The algorithm checks every axis against the ray's min_t and max_t, updating a local copy of that interval as necessary. The test passes if no check for x, y, or z fails, and the logic is purposefully duplicated so that fails are detected earlier. For example, if we know the x-interval falls outside the ray's t-interval, then no need to check y or z.

Now, instead of rendering in 800+ seconds, given the same resources, the cow renders in 1.153 seconds... Almost 700x speedup.

Cow Normal Render (new)

Two additional renders showing off the new computational speed that would have taken hours before:

Dragon Render
Human Face Render

Let's test the relationship between # of primitives and rendering time, with an additional variable of 1 or 2 camera rays per pixel.

Wow! One observation I made here is that the rendering speed actually seems unrelated to the number of primitives, where before, increasing the number of primitives had a detrimental effect on performance. Each render, given the same number of samples, takes approximately the same amount of time. The number of primitives did increase the time to construct a BVH, from 0.04s for the TEAPOT to about 2.0s for LUCY. Naturally, supersampling increases rendering time -- that's nothing new.

The performance test demonstrates the power of this acceleration technique. I could see how if there was some way to pre-process the BVH, this ray-tracer would be handier for the real-time rendering required for something graphically demanding like a video game.

Part 3: Direct Illumination

The next task is to implement some realistic shading using the previously defined intersection methods. The two methods of observing the effects of light on a scene:

Both methods use some small value epsilon to push the start of the casted ray slightly above the hitpoint, so it doesn't intersect with itself. The following image use 128 rays per pixel and 64 samples per light.

Uniform Hemisphere Sampling
Importance Sampling

Importance sampling seems to be much more effective at quickly converging to the proper lighting and eliminating noise. Unfortunately, I worked very hard over the course of a week to try to eliminate the slighter darkness in the importance render, to no avail. Just note that for the next two parts, the reason the importance renders are just marginally smaller is because of this. Below, we'll also examine the effect of increasing sample per area light (all sampled at 1 ray/pixel):

Area Samples = 1
Area Samples = 4
Area Samples = 16
Area Samples = 64

It's important to note that increasing the number of samples per light increases the accuracy of the ray-cast, but the gritty quality still remains because of aliasing from under-sampling per pixel.

Hemisphere Spheres
Importance Spheres

In conclusion, the accuracy of the lighting is apparent from both renders. However, the uniform hemisphere images generally contain more noise at the same sampling rate as the importance-sampled images. At a high level of supersampling, uniform hemisphere image would probably look indistinct from the importance-sampled image, but that's a huge computational tax. The images above should demonstrate that sampling over the light source uses far less resources and produces better quality even at low sampling.

Part 4: Global Illumination

This time, we enable multiple ray-bounces. Overall, this method adds light onto the previous part to produce more realistic shading. At minimum, the path-trace will do everything from Part 3. If the specified ray is of depth > 1, then at least one additional indirect ray will be cast. The exact number of additional rays will be between 1 and m, and for the sake of runtime, there will be a small chance of ray termination (Russian roulette). The indirect ray randomly bounces off the hitpoint, recursively calling the illumination function to determine the irradiance at the next point, as determined by the sampling method (hemisphere or importance).

Below are hemisphere and importance samples of global illumination. Compare them to the direct illumination above.

Indirect Hemisphere Spheres
Indirect Importance Spheres

Below are the effects of observing only direct lighting vs only indirect lighting (since our algorithm combines them both for rays of depth greater than one) with max ray depth 6.

No Indirect
No Direct

Below is the rendered view for different ray-depths of the same bunny, but with combined indirect and direct lighting again.

Ray Depth 0
Ray Depth 1
Ray Depth 2
Ray Depth 3
Ray Depth 100

Finally, we'll use different sampling rates for the same light to test all the work so far, using the sphere box, with max ray depth 6 and 4 lights samples.

S = 1
S = 2
S = 4
S = 8
S = 16
S = 64
S = 264
S = 1024

Part 5: Adaptive Sampling

Using a bit of statistics can cut down on computational resources if it seems like a part of a scene is pretty consistent over many samples. To do that, for every sample, I keep a running total of the pixel samples illuminance and another total of squared illuminance. With some calculations and after having taken a certain number of samples, we can calculate whether the samples are consistent enough to stop doing more computations, with 95% confidence. These checks are only made every specified number of batches of samples.

The final product of the first part of this project is displayed below, with sampling rate 2048, ray depth 5, adaptive sampling 8 per batch at max tolerance 0.085, and 1 sample per light.

Adaptive sampling in action.
The bunny to rule all bunnies.

The bunny is sampled at a slightly higher rate than it should be because I had to tweak the max tolerance from 0.05 to 0.085, likely due to the dimness glitch from Part 3. I ran out of time to find the exact max tolerance needed, but after many, many attempts, it is interesting to observe how increasing max tolerance causes faster render at the cost of lower sample rate (blue). This adapative approach errs on the side of too many samples (red), but shifting the max tolerance up slightly can accomodate for that.