It is common practice to use multiple layers of speakers when dealing with larger spaces. There will often be a main speaker array, and the several fill (or “delay”) speakers. When using these delay speakers, it is standard to add a time delay to the audio feed leading to those speakers to account for the propagation of sound throughout the space and attempt to minimize any perceptible echo and phase interference.

What happens though when your speakers are far to the outsides of your audience?

## Introduction

Time aligning delays isn’t always straightforward. The usual problem with time alignment is because the speed of sound isn’t constant. However, when the listeners have a lateral offset from the speakers, there’s a geometric problem.

I’m going to tell a short story to illustrate a case where I saw this in action, and then walk through some high-school math. Bear with me, because I promise that there is some practice advice and food for thought at the end.

## Storytime: My Parish’s New Sound System

My parish recently had their sound system redone by a local AV integrator. The choice of integrator was made before I began attending, and–despite the integrators designer being a rather big name in AVIXA circles–the integration was rather disappointing, marked by a lack of attention to detail and very interesting decisions made in choice of speaker and placement. There are four speakers, two on each side with main array and a set of fills–a fairly standard arrangement.

The first thing that jumped out at me was just how wide the speaker placements were. The speakers are up against the left and right walls of the sanctuary with something like 10° of inward cant. This is not a particularly narrow space and congregants were going to be sitting as much as ~25′ to the inside.

The AV integrators made a big deal out of how they delayed the fill speakers to time align them with the mains, and the parish AV volunteers were duly impressed by that. Normally I wouldn’t have given it a second thought, but the wideness of the array nagged at me.

There’s a problem with time aligning an array that wide.

## Because Pythagoras

Think back to high school trigonometry. The answer as to why time alignment in situations like this gets messy is found in the Pythagorean theorem. Yes, the formula for finding the hypotenuse (longest) side of a right triangle. The usual expression of the formula is:

With the sides understood following:

So, what happens if we want graph the length of C as on side changes? Well, we can graph it on X and Y coordinates using the algebraic function notation that we learned. One side will be our input variable X and the hypotenuse (C) will be our output variable Y, with the third side (B) as a constant. We can express this:

B requires a constant value, so let’s look at it for a range of values. Popping this into Desmos with b=0, we get the following graph:

So that’s underwhelming. It’s a line because the square root of the square of a number is just that number… but what happens if we make B some other value? Let’s take a look at the same formula for B = 5:

So we get something really different. What does this mean?

Keep in mind that the equation is giving us distance from the speaker (on the Y/vertical axis). The X/horizontal axis is how face back the listener is standing, and our B distance is how far to the side (distances in whatever unit pleases you: feet, meters, nose-to-tail corgis). You can interact with the graph here, and see how changing that lateral shift (B) affects the output.

So the more offset to the side of the speaker we are the more rapidly our distance from the speaker changes as we get farther from it (notice how it is a curve that curves upward). By the time we are 10 units away on that graph, the line is almost (but not quite) straight.

The non-linearity of this is important!

## Of Non-Linear Distances: A Tale of Two Speakers

Let’s bring a second speaker into the equation.

So our initial equation represent the relationship between the main speaker and the listener. What about a second speaker?

Well the equation we have is good for a speaker at the front of the room, we just need to add another constant to represent the distance (in nose-to-tail corgis, of course) that the second speaker is into the room from the first. let’s use Δ (the Greek letter delta) to represent that distance because using Greek letters is more math-y and misusing them from their common uses to annoy math people is fun.

First, a visual hastily drawn in my terribleCAD:

The “stage” is towards the bottom of the image, we see the main speaker square and the fill speaker square and the distance between them, Δ. The listener circle is back from the main by a distance of x, and is some distance to the side b.

So, our new equation, to represent the fill speaker with its offset is going to look like this:

For the sake of keeping them separate, we will use f(x) to refer to the main and g(x) to refer to the fill.

Let’s go ahead and graph our two speakers using our B = 5 and with Δ = 5:

What is the most important observation we can make about the relationship between those two curves?

The lines are not parallel.

Furthermore, if you follow the link above and adjust the value of B, you can see that the difference between them changes for any given value of B. Compare the above for a B of 5 versus the below where B = 10:

When B = 5, there are 4 units difference where the second curve starts. When B = 10, notice that the difference is only 2 units.

As the lateral shift increases, the difference in distance between the speakers decreases.

To put that another way: as you move to the side, you distance to the mains gets closer to being the same distance. In fact, if you lateral shift is very large relative to the distance between the two speakers, the distance can functionally be the same (try B = 25 and Δ = 1).

Now we keep talking about the difference in distance between the two speakers, and this is the important part in time aligning them because that distance between them is what we are trying to correct for. The assumption is that if we correct for the **linear **distance between them, that it will also be correct(-ish) when the sound reaches the listener.

Note where I bolded “linear” distance. This assumption only holds true if the listener is on the same line (B = 0); and we can see that by setting B = 0:

However, as we’ve seen, if the listener is not on the same line as the speakers, then the relationship of the listener to the speakers is **non-linear**; and the assumption falls apart.

## The Meat of the Matter: Error Amounts

The important question on your mind now is likely: but does it fall apart enough to matter?

That’s a good question, and the answer is–as always–“it depends.”

Let’s graph something a bit more abstract: let’s look at graphs of the difference in listener distance between the two speakers.

The difference between the two speakers can be graphed by defining y as the difference of their two functions:

And a graph looks something like this:

So now the y-axis is showing us the difference in the distance between the two speakers and the listener (the difference in lengths of the red lines in the terribleCAD drawing).

What are we looking for?

Simple, if the distance difference between the two speakers is consistent (linear), then this graph will be flat. B = 0 gives us a flat line:

You can also notice that at some point along the x-axis, any curve will begin to become functionally flat (fun romp for your math chops: will it actually become flat?). Observe for B = 5 if we go down the x-axis some ways:

So there is a certain distance at which the lateral offset doesn’t matter–which depends on the relationship between B and Δ. However, until that point there is some cause for concern.

## Practical Considerations

What is our takeaway from this?

We need to keep in mind acoustic and psychoacoustic effects that are related to time, primarily: the effect of the reverberant field, the echo threshold, and masking from loudness.

I’ll specifically address the latter two, because they may not be as obvious:

First is loudness masking. Just like the 3:1 mic placement rule, there is a threshold at which one sound source is louder enough than another that any destructive interference between them is imperceptible. This threshold is typically somewhere between 10dB-12dB. If either your mains or fills are 10dB louder from that listening position than the other, then any phase-based interference between them is probably negligible.

Second is the echo threshold. The length of the echo threshold is highly dependent on the frequency content of the sound–usually somewhere in the range of 5ms-50ms. Because a lateral shift in the listener moves them relatively less close to the fills relative to the mains, it is possible that a delay in the fills may not be enough delay for a listener at the front of the curve for high values of B, and may push the difference in arrival time between the mains and the fills past the echo threshold (if the distance between the mains and the fills is larger than the echo threshold).

If we push B = 25 with Δ = 75, there’s a 21 unit difference where the curve begins. If we assume the units are feet, that comes out to ~18ms off on the delay line for that listening position.

Is this a realistic scenario? Given the number of older/traditionally-constructed church buildings I have been in where the speakers are placed far to the right and left, I’m sure it does happen.

What do we do in these situations?

We can look at expected values for B in areas past the fill speakers, and tune the delay lines to account for the offset. In my parish, the “flattening of the curve” doesn’t happen until you are standing at the narthex wall. There’s no point in running a delay line that’s only valid for the next room–it is better to tune based on where the audience/congregation will actually be congregated.

More broadly, it is a reminder that each listening position within an audience will be different, and if we are setting up audio systems that deviate from ideal parameters we need to look at how that affects what seems like a “typical” audience area.