Sweep and Prune of Broadphase

The Problem

A physics engine is a system used to simulate physics phenomenon on computers. Imagine a scene with n objects, a very natural way to process physics interactions among all these objects is to consider pairs coming out of them. Let’s say if we have 3 objects A, B and C. We basically need to have an outer loop for the first object of the pair, and a second for the other. For example, 2 pairs are considered while we process A: AB and AC, and one pair is considered when we process B: BC. This method is intuitive enough but effectively results in an O(n2) performance. Since the collision test is a costly operation, we encounter a performance issue if the number of the object rises.

Intuition for SAP

Turns out we can do better than O(n2) with the Sweep and Prune algorithm.

There are two most important intuitions for SAP to make sense. First, if we project vertices of two AABBs onto the same axis, we get two intervals in the form of [min, max], where min is the min for bounding AABB and max is the max for the same thing. This said, if we compare the two intervals and check that there is NO overlap between them, then the two shapes do not overlap, since we know that we can find an axis along which is a gap between the shapes. Second, if we sort objects along that axis in ascending order and check overlaps regarding that order, then every time we detect a gap between object O1 and O2, we can directly abandon O1 because we know for sure that any object farther than O2 will not have overlaps with O1.

This could be better illustrated by the image below.


For easiness, I would notate the blue, red, yellow and green brick as B, R, Y, G from now on. Corresponding to the first point I made above, we can see there is no collision between Y and G by observing the gap between the intervals on the axis. To the second point, if we store the four objects in a sorted order, in this case from left to right of the axis, we can directly stop considering collisions for B as soon as we reach Y as we know any object after Y will not have collisions with B at all.

The algorithm

Given the intuition, we can present the algorithm:

  • Every frame, store objects in list L1 and sort them based on a particular axis. In our case, let’s say this axis is the x-axis. The sorting standard is the min extent of bounding AABBs along x.
  • Prepare another list L2.
  • We extract the first object in L1 and add it to L2.
  • We then inspect the next object in L1 with all objects in L2. In this case, there is only one object in L2, the one we just extracted from L1, since it is the first iteration.
  • We check if the left extent (min.x) of the inspected object is larger than the right extent (max.x) of that object in L2. What the extent represents may vary based on which axis you are using.
  • If yes, we remove that object in L2.
  • If no, we report a possible collision between the two objects.
  • Repeat this until we reach the end of L1.

For SAP, we use insertion sort as the sorting algorithm. Also, notice that the algorithm effectively requires you to have 3 data structures to store information (in my case in C++, 3 vectors). Two for L1 and L2, and another for collision pairs, based on how exactly you encapsulate the collision class in your implementation.

The implementation

Let’s first look at how this algorithm works in high concept. We start directly with sorting L1 (with notations above). Then we use the sorted list to process collision information, which fills L2. Finally, we use computed collision information to commit actual collisions based on the physics scheme used (corrective, preventative and continuous).

void PhysicsScene::ProcessSAP(eAxis axis)
    // sort L1

    // fill L2

    // collision detection

Note that all we care about in this algorithm are in SortOnAxis and ProcessCollisionPairs. The ProcessPairsCorrective is the actual collision resolution step which is another huge topic on its own.

Therefore let’s first focus on SortOnAxis. Remeber it is essentially using insertion sort:

// insertion sort
int i, j;
Entity* key = nullptr;
for (i = 1; i < m_axisList.size(); ++i)
	key = m_axisList[i];
	j = i - 1;

	// Move elements of vector from 0 to i - 1 that have 
	// greater position than key does along specified axis
	// to one position ahead
	bool greaterPos = false;
	switch (axis)
	case AXIS_X:
		greaterPos = DescendBoundAABBMinX(*m_axisList[j], *key);
	case AXIS_Y:
	case AXIS_Z:
	case AXIS_NUM:
	while (j >= 0 && greaterPos)
		m_axisList[j + 1] = m_axisList[j];
		j = j - 1;
	m_axisList[j + 1] = key;
// sorted list of objects along wanted axis

Couple of things to note in this implementation:

  • The Entity class is an encapsulation for physics entities used by the physics engine.
  • The “axisList” (L1) is stored as a member of physics scene.
  • When processing L1, we start with index 1 because in sorting we need to compare entities previous to this current index. And the immediate index previous to 0 is -1, which does not make sense.
  • We use the word “Descend” when getting greaterPos boolean because we need to find objects with higher extents compared to the key entity (the key entity’s extent smaller than that of m_axisList[j]).
  • If you want to sort on any other axis, you need to give your own implementations in the correct slot of that switch statement.
  • The while loop is essentially what an insertion sort is.

Now that we have a sorted entity list, we can start adding entities into the L2:

// make sure active list and pair list are empty

// begin on left of axis lits
for (std::vector<Entity*>::size_type idx = 0; idx < m_axisList.size(); ++idx)
	Entity* e = m_axisList[idx];

	// the next element is valid
	Entity* next = nullptr;
	if ((idx + 1) < m_axisList.size())
		next = m_axisList[idx + 1];
		// have come to end of axis list, all pairs processed

	switch (axis)
	case AXIS_X:
		// go thru active list
		for (int activeIdx = (int)(m_activeList.size()- 1U); activeIdx >= 0; --activeIdx)
			Entity* currentEntity = m_activeList[activeIdx];
			float nextLeft = next->m_boundAABB.mins.x;
			float currentRight = currentEntity->m_boundAABB.maxs.x;

			if (nextLeft > currentRight)
				// remove this current entity from active list
				m_activeList[activeIdx] = m_activeList[m_activeList.size() - 1U];
				// add this collision pair
				sCollisionPair* pair = new sCollisionPair(next, currentEntity);

				next->m_passedBroadphase = true;
				currentEntity->m_passedBroadphase = true;
	case AXIS_Y:
	case AXIS_Z:
	case AXIS_NUM:

Things to note on this implementation:

  • Unlike axisList, for activeList (L2), you need to clear it every frame because the potential collision information changes every frame. Or, unlike I stored as a member, you can directly create a temporary list for L2 within the scope of the function. BUT, if you are doing that, you need to make sure that L2 does not go out of scope when you actually resolve collisions.
  • Same thing for m_pairs, which is the list I used to store pair collision information coming out of this function. You can store it as a member as I did, or let the function scope trashes it, but make sure it does not get trashed when you start resolving collisions.
  • If the current index of L1 is idx, then the next index is idx + 1. If that index reaches the size of L1, then we reach the end of L1, where we can return the process.
  • Otherwise, we need to start processing on the given axis. Again, I have only given implementation along the x-axis. If you want to also work on other axes, you need to give your own implementations.
  • Remeber we need to remove objects from L2 whenever (nextLeft > currentRight), and store the pair as collision candidate otherwise. This is what’s inside for loop.
  • But, if you are using C++, there is a risk that you invalid iterators as you use erase() function on provided data structures, which will give you errors when you run the program. Since I am using a vector here, I also need to handle that. My workaround is to start processing from the end of the vector and replace the entity that I want to remove with the entity at the true end, and then pop the back of the vector. More info on invalidation of iterators is here.

Upon the return of this function, you have a m_pairs full of potential collision pairs, which you will use in ProcessPairCorrective, the collision resolution step.

At this moment I should mention that the above implementation falls into my specific physics engine structure and may not suit your case in an exact way. You should focus on understanding the logic behind instead of sticking with exact implementations. Based on legacy issues like framework design and use of language, your implementation may look very different. And if you do not understand how the algorithm works, it is almost for sure that you will get bugs from nowhere that you feel hard to trace down.

The performance

Remember our motivation is to have better performance. However, the insertion sort, in the average case, still performs O(n2). So how is this helping us at all?

Turns out there is a characteristic of physics simulation systems called temporal coherence. This principle states that if the objects simulated are moving at a reasonable speed, then its position and orientation remain similar from time step to time step. That said, the insertion sort is sorting on an already highly sorted list most of the time, which is the ideal case for the insertion sort. This can help us reach an average performance of O(nlog(n)).


So, up to this point, we have come up with an algorithm that helps us improve the performance of the collision system in an efficient and reliable way.

But the methodology is not constrained to this way only, of course. Remember we are trying to implement a broad phase here, and there are many other algorithms doing the same thing. Some quick examples are bounding volumes/areas (bounding circle, AABBs, etc) or spatial partitioning methods like quad-tree. These methods would include different implementations on the core of their algorithms as well as data structures that I highly encourage you to explore on. Nonetheless, they are all trying to minimize the number of candidates for collision pairs.

Image Reference

[1] http://www.idevgames.com/articles/collision-detection-and-spatialindexes


Render to texture


As I was trying out various techniques, I found the operation of rendering to textures fundamentally useful in many cases. In this post I will introduce my interface of generating an FBO, binding a color buffer and a RBO, and drawing with new FBO’s texture in OpenGL.


The Frame Buffer Object is a buffer containing drawing data in one. This usually means a color buffer with colors and a Render Buffer Object with depth and stencil info. Since all these are buffers, encapsulating FBO info looks like:

class FBO
    GLuint m_fbo_handle;
    GLuint m_color_handle;
    GLuint m_depth_stencil_handle;

With color handle we keep a reference to color texture that we draw to, while with RBO storing depth and stencil data we can still commit depth and stencil test in the normal pipeline.

With type defined, we need a configuration on buffers before it is actually used, and that requires configurations on color buffer and RBO naturally:

void ConfigureFBO()
  glGenFramebuffers(1, &m_fbo_handle);
  glBindFramebuffer(GL_FRAMEBUFFER, m_fbo_handle);


void ConfigureColor()
  glGenTextures(1, &m_color_handle);
  glBindTexture(GL_TEXTURE_2D, m_color_handle);
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, NULL);
  glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, m_color_handle, 0);

void ConfigureDepthStencil()
  glGenRenderbuffers(1, &m_depth_stencil_handle);
  glBindRenderbuffer(GL_RENDERBUFFER, m_depth_stencil_handle);
  glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH24_STENCIL8, width, height);
  glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_STENCIL_ATTACHMENT, GL_RENDERBUFFER, m_depth_stencil_handle);

Draw to it!

Given FBO configured now we can directly draw to it as usual:

// 1. bind the fbo instance using its handle, done by your renderer

// 2. define shader you would normally use to draw real-time to //scene, deciding what is in the fbo; again, use the renderer in your //infrastructure

// 3. Draw calls of your renderer, this time drawing to the fbo

This allows you to draw to texture.


This trick alone seems very straightforward, and it is. The point is it can be used in so many places for various modern rendering techniques. Here are some examples.


Post-processing gives you fancy screen effects like kernel effects. The only thing you need to do is to apply post-processing shader on the above texture you generated:

// 1. define shader used

// 2. bind texture to the one you just drew

// 3. draw the texture across screen

Since you bound to another FBO before, you may want to bind back to the default one with glBindFramebuffer(GL_FRAMEBUFFER, 0) before you do above.


The bloom is a special kind of post-processing that needs two targets. The original target drawing the scene normally, and the other one with “bright” blurred fragments which you blend the original one with.

This second target with a special requirement on fragments is where rendering to texture becomes useful for this case. Effectively, I set up a second color attachment for the FBO:

GLuint colorBuffers[2];
glGenTextures(2, colorBuffers);
for (unsigned int i = 0; i < 2; ++i)
    // bind color buffer i to the FBO
    // in this case the color buffer member of FBO is modified to hold two/multiple buffers

And then we just need to draw with the shader that will also pick the “bright” fragments into the second target:


// draw the pbr scene as usual...
// pick fragments along the way

Finally, we blend the original target with the blurred target:

// 1. use blend shader

// 2. bind original texture at index 0
renderer->BindTexture(fbo_instance, 0);

// 3. bind blurred texture

// 4. draw the texture across the screen

This gives effect like the following:

Deferred Shading

In deferred shader, we render to multiple buffers of FBO (or G-Buffer in this case), instead of one or two buffers. We then use data in those buffers to calculate the lighting scene pixel by pixel (or only those in lighting volumes if those are used). This is by nature rendering to the screen quad again, similar to situations above.

Now hopefully you see how rendering to texture is a useful operation/methodology used in many techniques.