Parallelizing Ray Tracer in a Weekend14 Feb 2019
I was having a lot of fun implementing Peter Shirley’s ray tracer over the past weekend. In this post I will explain how to make it go faster and visualize the progress.
As I said, over the weekend I went through the book(let) Ray Tracing in a Weekend written by Peter Shirley, and tried to implement it. It contains almost all the code written in C++, but I took the liberty of improving a thing here and there. The result is pretty impressive as you can see on the image below.
The book has a good flow—it leads you from generating an image, over implementing ray tracer components (rays, geometry, materials, etc.) to rendering a complete scene. And all that in a fascinatingly short time!
The key word here is simplicity. The author took a lot of care to use the bare minimum needed to accomplish the final result. There are no
dependencies except the C++ standard library and
drand48(). The image
output format is
PPM which is pretty handy for quick image generation, although it is rather bulky
because of all the text1.
The Good, the Bad and the Ugly
Up until now, everything is great. We can trace rays, make our own simple scenes with spheres and view beautiful renderings. But there is one thing I didn’t tell you. All this works splendidly for small images, but is painfully slow for anything you want to view without using a magnifier. The author used default size of 100x200 pixels (height x width), an example of which you can see below.
If you wonder why he chose such a small image size, the answer is pretty simple—ray tracing takes time—a lot of it. What we are doing here is simulating light as it happily goes through the space. We can’t beat its majesty nature at this task. What we can do is try to use our computing power more efficiently. Right now, only one processing core is used while most computers have a handful of them at our disposal.
Power and speed
As Jeremy Clarkson likes to say—power and speed solve many problems. I mostly try not to believe him, but in this case he is right. Raw power can speed up the ray tracer quite a bit without touching any component, or being too smart about things.
To accomplish our goal of using multiple cores, we will be using features introduced in C++11:
provide a simple way to execute things in parallel, while mutexes2 are convenient at stopping those threads from colliding with each other.
Basic idea is to assign tiles of the image to threads. I chose tiles of uniform size N, for example 16x16 or 32x32. You can imagine it like rendering a lot of smaller images which are then stitched together to form the final result. Let’s call those tiles tasks. That way every thread gets tasks to solve, or process.
To achieve maximum CPU utilization, we will create as many threads as viable, by querying
std::thread::hardware_concurrency() to get the number of supported
concurrent threads. If we create more threads, we probably won’t see much benefit because not all threads could be used at the same time.
Some would be idle and waiting.
Now that we have our threads, we have to orchestrate them. Keep in mind that usually there are a lot more tasks than threads (which is what we want!). Threads that finish their assigned work therefore need to ask for new one. That’s where mutexes come to play.
We will have a function called something like
get_next_task() that every thread calls after they finish their job. Because any worker can
access it at any moment (critical section), we have to allow only one of them at a time.
std::mutex does exactly that. When a function
gets entered, the mutex
gets locked by the current thread. If any other worker tries to get his next task, he will have to wait it out. After the worker currently
owning the mutex gets
his new task, he unlocks the mutex and the next one can enter. Rinse and repeat.
Finally, if there are no more tasks to get assigned, the thread is done. After all threads finish, we will have our rendering. Please keep in mind that this is one way to do it. There are certainly others, but I found this one simple, both to understand and implement.
Visualizing the progress
Now that we understand how to parallelize the ray tracer, we can make it even more attractive. Currently we only see the image after all the rendering is done. That’s no fun. I’d like to see the image as it’s being rendered.
To achieve that, I used a nice library called SFML (Simple and Fast Multimedia Library). I made it open a window and show a textured sprite in it. The sprite takes the whole window. The texture contains the current state of the image being rendered. You can see the result on the GIF below. Nice, isn’t it? I even implemented an inward snake pattern of task assignment!
Now that we introduced a new library, I threw away the PPM format and used PNG which I like a lot more. Rendered image, which is actually a texture, gets converted to a PNG format and saved.
I also encourage you to take a look at this. It uses tiles of size 256, just for fun. Notice how the tiles with less visible spheres finish quicker. That’s also the problem with using big tiles. Threads that finish first don’t have any more tasks to do, while other threads are busy rendering large chunks of the image. Some threads are idle, while other are work. We want all threads crunching most of the time. Smaller tiles allow better balancing of work.
I hope you learned something today. The code is available in my repository. Specifically, take a look at this file. That is where the magic happens. If you use Linux, it should be pretty easy to run it using the available Makefile. Make sure your compiler supports C++11 and that SFML is available.
If you want to learn more about ray tracing and parallelization of said, I encourage you to take a look at Physically Based Rendering book.
If you noticed an error, or just want to talk, send me an email.
Thank you for reading!