NickNagel.com - Understanding the Basis of the Separating Axis Theorem (SAT)

Understanding the Basis of the Separating Axis Theorem (SAT)

Rationale

This article is part of a series I've been working on centered around the scene-graph conceptualization of Scalable Vector Graphics (SVG). The motivation for the series stems from work I've been undertaking to add virtual camera funtionality to my SVG Creator's framework. Specifically, I needed to simulate camera features (like zoom, panning and tilt) to facilitate the creation of interactive SVG artworks.

In the previous article in the series I shared some insights that occured to me relating the concept of mathematical bases and basis vectors to a generalized process for understanding and using SVG transformations within the scene-graph. The present piece dovetails off the previous by looking at how projection and basis reasoning lead to robust geometric constraints definitions (e.g., collisions, containment, and panning limits).

The Problem

The problem at hand concerned the development of an SVG virtual camera system. One of the things I enjoy about working with SVG is the ability to create artworks with varying levels of detail and set up systems that allow viewers to navigate the work and appreciate features from different perspectives.

For example, below you see some SVG which illustrates the concept of the "World Tree" from Norse mythology. In it, I've set up a camera system to enable viewers to zoom in an pan around to explore some of the details hidden away in the more distant perspective. The problem for this piece (and it's a general problem creating artwork for any number of applications like games, sims, or advertisements for example) is that the virtual camera panning has to be constrained or the viewer will get lost. Think about it for a minute. How would you express the constraints?

Live Interactive Illustration

Figure: The world Tree by 𝒩 (shown with test scaffolding).

Think about it: if the user pans too far, they'll fall off the edge of the world, right? We can't have that! So we need to constrain the pannable area whenever the user zooms in to view details.

If you look closely at the World Tree SVG, you'll see I've added some test scaffolding. I've drawn the world coordinate system axes (shown in blue) and also the local coordinate system axes for the camera (shown in red). The blue axes represent the global SVG coordinate system -- the world's basis. The red axes represent the camera's local coordinate system -- the basis attached to the user's view. I want you to think of these coordinate systems as bases in the geometric sense: each basis defines how vectors, distances, and shapes are interpreted in its own frame of reference.

I've also drawn rectangles around each basis.

The blue rectangle corresponds to my target viewBox dimensions in world coordinates.
The red rectangle corresponds to the transformed viewBox under the camera's affine transform.

Initially, when the artwork loads, the global basis and the camera basis coincide. But as soon as the user zooms in, the camera basis is scaled up, and its rectangle becomes larger than the SVG viewport in world coordinates.

To express the panning constraints in this larger context, I added a green rectangle. This perimeter defines the set of legal panning positions -- boundaries within which the camera is allowed to move without exposing empty space. At the initial zoom level, the green box sits neatly inside the blue world box. But once the user zooms in, the camera's viewBox becomes larger in world units. The green region becomes the area to which the camera view must be constrained in order to keep the world fully visible.

Whew! Now that we've established these two bases -- the world basis and the scaled camera basis we're in the perfect position to enforce panning constraints using the Separation of Axes Theorem.

The Separating Axis Theorem <--| rename section

The Separating Axis Theorem (SAT) sets up a computer graphics algorithm that is often used in gaming and simulation to test for the intersection of convex polygons (think collision detection). Here, I'll extend it to test for containment.

Play with the wold tree for a bit. Double click (or spread) to "zoom in". Since I've adjusted the SVG viewBox for this illustration you won't perceive the zoom-effect so much as you'll see the camera basis scale up. Now if you pan about (touch and drag the camera view around with your finger or device) you'll see how the camera constraint box starts out fully contained by but can come to overlap with the SVG view box. Notice how the system is only "happy" if the green box is fully contained by the viewbox.

How do we test for containment. The most obvious way (to me) is set up a conditional to check the verts on the boxes. Something like:

IF 
  CAM_CONSTRAINTS_upper_left outside  WORLDVIEW_upper_left AND
  CAM_CONSTRAINTS_lower_left outside  WORLDVIEW_lower_left AND
  etc...
THEN 
  it's all good ... 
ELSE
  not so much ...

Sure that'll work. For this case. Using simple rectangles. But it feels kinda kludgey, doesn't it? Wouldn't it be great if we could formulate a test that could be used for any given set of concave polygons of any arbitrary degree of complexity? Yes! Now you're thinking in algorithms! The good news is you're not the only one who's thought about this. Standing on the shoulders of Hermann Minkowski we can formulate a nice, neat formal statement:

The Separating Axis Theorem:

Two convex polygons $A$ and $B$ are disjoint (i.e., they do not overlap or touch) if, and only if, there exists a line (let that be called a separating axis) such that the projections of the two polygons onto that line are also disjoint.

Simple, right? No, really it's a mouthful and densely packed at that. As a visual thinker I like use illustration when reasoning about algortithms. So in the following section I'll try to unpack this visually with a solid example. Think of it as a bit of linear algebra for artists.

Bases in Action: SAT Illustrated

Let's take for example the very simplest case possible ...

TODO: Move me out to styles ...

Figure 1: Two triangles A and B. Are they overlapping? Easy enough to see, right? But can we prove it to the machine ... ?

If you study Figure 1 for a minute you'll see I've drawn two triangles.

$$ \begin{align*} \text{Triangle } A: & \quad A_1=(0, 0), \quad A_2=(2, 0), \quad A_3=(1, 2) \\ \text{Triangle } B: & \quad B_1=(3, 1), \quad B_2=(5, 1), \quad B_3=(4, 3) \end{align*} $$

On visual inspection it's pretty easy to see they're separated, right? So how do we prove that with separating axes?

Step 1. Pick an axis -- any axis ...

What exactly do we mean by axis here? Think of it as a line we can draw between the two figures. If any line can be drawn that cuts between the two, clearly they must be separated. Theoretically we could draw an infinite number of such lines to test the theorem. But in practice we'll use the edges of the polygons. For our triangles, let's arbitrarily inspect $\vec{E_{B_1B_2}}$ to get us going. We can get the edge as a vector quantity by subtraction:

$$ \vec{E}_{B_{1}B_{2}} = B_2 - B_1 = (5-3, 1-1) = (2, 0) $$

2. Use the normal as a basis for projection

And this brings us to the heart of the discussion. Once we have an edge, we can obtain its normal (a line indicating a direction perpendicular to the edge) and use it as a basis for the projection. As I discussed at length in the preceding article in this series , a basis in linear algebra gives us a coordinate system. In this case we get a way of mapping from our 2D svg world coordinates to a one-dimensional basis on which we can project our triangles and look for separation.

Insight!

For me, the key insight to understanding the SAT is to understand that the separating axes are geometric bases for projection.

Here's the normal for edge $\vec{E_{B_1B_2}}$ .

$$ \vec{N}_1 = (-E_y, E_x) = (0, 2) $$

If you touch or click triangle $B$ in Figure 1 you'll see how we get its normal.

3. Project the triangles on the normal basis

Touch or click again and you'll see how we project the triangles. Conceptually, it's easy. We project the polygon by mapping its vertices, $V_i$, onto the normal basis using the dot-product [FOOT_NOTE]:

$$ Proj( V_i ) = V_i \cdot \vec n $$

Projecting a single vertex gives us a scalar value. Projecting all the vertices in a polygon gives us an interval along the basis line (think of it as a "shadow" cast by the polygon along the basis). In this case, the shadows of the two polygons are useful measures; we can use them to look for a gap indicating separation! And to satisfy the Separating Axis Theorem all we need is one (gap that is)! So for Triangle $B$ we project:

$$ \begin{align*} \text{Proj}(B_1) &= (3, 1) \cdot (0, 2) = 2 \\ \text{Proj}(B_2) &= (5, 1) \cdot (0, 2) = 2 \\ \text{Proj}(B_3) &= (4, 3) \cdot (0, 2) = 6 \end{align*} $$

And the projections give us interval:

$$ I_B = [2, 6] $$

Now for Triangle $A$ :

\begin{align*} \text{Proj}(A_1) &= (0, 0) \cdot (0, 2) = 0 \\ \text{Proj}(A_2) &= (2, 0) \cdot (0, 2) = 0 \\ \text{Proj}(A_3) &= (1, 2) \cdot (0, 2) = 4 \end{align*}

$$ I_A = [0, 4] $$

The Story So Far ...

In Figure 1 I've sketched the shadows of the two triangles plotted along our normal basis. By visual inspection we can readily see that the shadows overlap. We haven't found our gap yet.

And you can sort of see this intuitively. If we were to project our triangles onto the Y axis of the original 2D basis for our SVG world (which is exactly what this normal does), you can imagine how their vertical shadows would overlap. In any case, I want to point out that with respect to our normal basis, all that it defines is a direction for a line and a zero point of reference. As a basis The normal vector simply gives us a key to measure the intervals of our polygons.

Finding a Gap

So we're not done yet! We still need to find a gap to prove the triangles are not overlapping. Look again below at our figure.

FIGURE 2: SAME TRIANGLES BUT INITIALIZED TO TEST ALONG EDGE B1 - 3

Can you spot an axis along which we might find a gap? How 'bout edge $B_{1 \rightarrow 3}$ ? If you look closely you can almost see intuitively that the triangles might separate along this axis. And, indeed, if you click or touch Triangle $B$ you'll see that this time we'll find the gap. It's worth looking at the computations I've provided and comparing them against the figure to "connect the dots".

Step 1. Compute the edge vector and obtain the normal basis:

$$ \vec{E}_{B_{1}B_{3}} = B_3 - B_1 = (4-3, 3-1) = (1, 2) $$

$$ \vec{N}_2 = (-E_y, E_x) = (-2, 1) $$

Step 2. Project the triangles and get the intervals.

Triangle $B$ :

$$ \begin{align*} \text{Proj}(B_1) &= (3, 1) \cdot (-2, 1) = -6 + 1 = -5 \\ \text{Proj}(B_2) &= (5, 1) \cdot (-2, 1) = -10 + 1 = -9 \\ \text{Proj}(B_3) &= (4, 3) \cdot (-2, 1) = -8 + 3 = -5 \end{align*} $$

$$ \text{Interval } I_B = [-9, -5] $$

Triangle $A$ :

$$ \begin{align*} \text{Proj}(A_1) &= (0, 0) \cdot (-2, 1) = 0 \\ \text{Proj}(A_2) &= (2, 0) \cdot (-2, 1) = -4 \\ \text{Proj}(A_3) &= (1, 2) \cdot (-2, 1) = -2 + 2 = 0 \end{align*} $$

$$ \text{Interval } I_A = [-4, 0] $$

Step 3. Look for separation. In the following figure I've sketched the basis line we obtained above and plotted the intervals for Triangles $A$ and $B$. And based on visual inspection it's pretty clear that we've found our gap! We've satisfied the SAT!

Application: Thinking with Bases

As a visual thinker I'm always trying to imagine and sketch out abstract concepts. For anything from driving directions to computing bounding box intersection to understanding gradient descent; good visualizations are often a key to understanding. And understanding is important and valuable!

TODO: BRING THE DISCUSSION BACK TO THE CAMERA ...

Discussion and Application

Thinking about containment in this way might seem like overkill at first. For a simple problem like rectangular bounds checking why not just set up your conditionals on the verts and be done with it. But the art of the algorithm we've just walked through takes a specific case and generalizes it to handle an infinitude. Understand the logic of the basis projections used in the SAT and you'll have a much more fine grained-approach to collision detection that fits closer object interactions for detailed polygonal shapes. Apply the axis separation theorem in this way and you'll never have to worry about collision detection again!

I originally applied this algo to create flight traaining simulator for the DoD back in the day ... TODO bring in the Nellis AFB experience...

Summary

Resources

Hermann Minkowski
Grant Sander's beautiful artful animation on basis vectors. I highly recommend give this a watch. It is soo relaxing and gorgeously presented. And there's no test :) .
SVG 2 Specification, §8: Coordinate Systems, Transformations, and Units (W3C).