Unleashing Massive Flocks with Unity
1. Introduction
In 1986, Craig Reynolds introduced a surprisingly simple yet powerful computational model to simulate the collective behavior of bird flocks. He called it Boids, a stylized abbreviation of Bird-oid objects.
The idea was as elegant as it was revolutionary: complex group behavior can emerge from a few simple local rules. You don’t need centralized intelligence to coordinate a flock of birds, a school of fish, or a crowd of people, just a handful of rules, applied by each agent (which we’ll also call boids or entities) in response to its immediate neighbors.
Reynolds defined three core behaviors:
- Alignment — steer toward the nearby agents’ average heading
- Cohesion — move toward nearby agents’ center of mass
- Separation — avoid crowding neighbors by maintaining a minimum distance.
By applying these rules at every frame for every agent, the simulation quickly generates realistic dynamics: fluid waves of motion, natural obstacle avoidance, self-organization, none of which are explicitly programmed but rather emerge from local interactions.
2. How Boids work
At the heart of the Boids algorithm are three fundamental rules. Each act on an individual level, but when applied together across a population of agents, they produce surprisingly lifelike group behavior.
- Alignment
“Match the average heading of your neighbors”.
Each entity looks at the direction of its nearest neighbors (within a certain range) and slightly adjusts its heading to align with the average.
Think of it like walking in a group and unconsciously syncing your pace with the others.
This helps create the illusion of a unified organism with a shared direction. - Cohesion
“Move toward the center of your neighbors”.
Each entity tries not to fall too far behind or stray from the group. It calculates the center of mass (centroid) of its neighbors and moves slightly toward it.
This keeps the group from scattering too far without making it clump unnaturally. - Separation
“Don’t get too close”.
If an agent gets too close to another, it steers away to avoid collisions. This is perhaps the most intuitive: nobody wants to bump into others.
This rule maintains safe spacing between agents, preventing the flock from collapsing into a single blob.
On their own, these rules don’t accomplish much. But together, they produce compelling emergent behavior:
- The flock changes direction like a fluid wave.
- Agents split and regroup naturally.
- They can dodge obstacles or respond to predators, if implemented.
All this happens without a centralized flock manager. The behavior emerges from the bottom up.
3. Extra features: predators, explosions, and real-time interaction
What makes this kind of simulation truly captivating is its ability to react to the environment, generate chaos, and then return to balance. Here are some interactive dynamics I added to the simulation:
- Predators: to make the behavior more believable (and fun), I added predators. These special agents move independently or chase nearby entities, trigger an escape response within a certain radius, are perceived by agents as a repulsive force (an “anti-cohesion”).
When a predator enters the flock, boids fan out, holes appear in the swarm, or subgroups form, all without scripting the choreography. It happens organically.
- Click-triggered explosions: you can create a repulsive force by double-clicking the left mouse button.
All boids within a certain radius quickly flee the explosion point, then gradually return to normal, re-forming the swarm. - Click-and-hold attraction: this works opposite to the explosion.
When holding the right mouse button, an attractive force pulls boids within a defined radius toward the cursor.
As you move the mouse, you can drag clusters of boids around the scene.
All of this makes the simulation interactive and even playful, just disrupting the flock and watching how it reacts is satisfying.
At this point, the swarm is no longer just a coherent blob, it’s a dynamic ecosystem, where:
- the group forms, breaks, and recomposes
- it reacts to threats and reorganizes after shocks
- you can observe animal-like behavior, even though none of it was explicitly programmed.
4. The Scalability Challenge
The Boids rules, taken individually, are simple and cheap to compute. But when theory meets practice, a key issue arises: scalability.
The neighbor search problem
Each agent needs to find out who its neighbors are, meaning that every frame, each agent must compare itself with all others to determine proximity. This happens for every agent, on every update.
From a computational perspective, this means O(n²) complexity:
- 100 boids → 10,000 comparisons.
- 1,000 boids → 1 million.
- 10,000 boids → 100 million!
Even with modern hardware, this quickly becomes unsustainable. The result? A laggy, unmanageable simulation.
To scale the simulation to thousands of agents, we need optimizations:
· Reduce the number of comparisons by organizing spatial data more efficiently, this is where spatial structures like the QuadTree come in.
· Use modern multi-core CPUs to parallelize the rule calculations.
5. Optimizing spatial partitioning (QuadTree)
A QuadTree is a hierarchical data structure that recursively subdivides a 2D space into four quadrants.
Imagine having a map and dividing it into four smaller rectangles. If one contains too much “stuff” (in our case, too many boids), you subdivide it again, and so on.
This creates a tree-like spatial structure, where each node represents a region, and its children represent sub-regions.
I implemented a QuadTree in Unity, visualizing the defined regions. I set a limit of 3 entities per region. Once that limit is exceeded, the region is subdivided further.
How does this help?
Instead of comparing every boid to every other:
- We build the QuadTree, placing each agent in the appropriate region.
- To find an agent’s neighbors, we query only the region it’s in and adjacent regions, within a defined radius.
Let’s look at the diagram below:
The agent in region A wants to find neighbors within the green circle.
Since the circle overlaps regions A and D, we only search those two, skipping all others.
This drastically reduces the number of comparisons.
In well-distributed scenarios, this can bring the complexity down to O(n log n).
Implementation in Unity
I created a QuadTree class and a NodeTree class.
- QuadTree handles tree construction, using NodeTree objects as its data structure.
- NodeTree manages spatial partitioning and entity insertion.
There are two main variants of a QuadTree:
- Point (or Loose) QuadTree — when a node exceeds its capacity, all points are redistributed into its children.
- Bucket (or Cell) QuadTree — when a node exceeds capacity, it is subdivided, but the points are not redistributed up the tree; each level handles only its own inserts.
I implemented both for comparison. In our simulation, performance is nearly identical for either version, so I’m sticking with the Bucket one.
public class QuadTree : MonoBehaviour
{
[SerializeField] private int _nodeCapacity;
[SerializeField] private Rect _bounds;
[SerializeField] private float _nodeMinSize;
public int NodeCapacity => _nodeCapacity;
public Rect Bounds => _bounds;
public float NodeMinSize => _nodeMinSize;
private NodeTree _root;
public void Build(NativeArray<float2> boidPositions)
{
int boidCount = boidPositions.Length;
_root = new NodeTree(_bounds, _nodeCapacity, _nodeMinSize, 0);
for (int i = 0; i < boidCount; i++)
_root.Insert(i, boidPositions[i]);
}
public void QueryRange(Rect range, NativeArray<float2> boidPositions, NativeList<int> found)
{
_root.QueryRange(range, boidPositions, found);
}
}
public class NodeTree
{
public Rect Bounds => _bounds;
public NodeTree[] Children => _children;
public bool IsDivided => _isDivided;
private Rect _bounds;
private readonly List<int> _points;
private NodeTree[] _children;
private readonly int _nodeCapacity;
private readonly float _nodeMinSize;
private bool _isDivided;
private readonly int _nodeDepth;
public NodeTree(Rect bounds, int nodeCapacity, float nodeMinSize, int depth)
{
_bounds = bounds;
_nodeCapacity = nodeCapacity;
_nodeMinSize = nodeMinSize;
_nodeDepth = depth;
_isDivided = false;
_points = new List<int>();
}
public void Subdivide()
{
Vector2 halfBound = new Vector2(_bounds.width, _bounds.height) / 2;
Rect nw = new Rect(new Vector2(_bounds.x, _bounds.y + halfBound.y), halfBound);
Rect ne = new Rect(new Vector2(_bounds.x + halfBound.x, _bounds.y + halfBound.y), halfBound);
Rect sw = new Rect(new Vector2(_bounds.x, _bounds.y), halfBound);
Rect se = new Rect(new Vector2(_bounds.x + halfBound.x, _bounds.y), halfBound);
_children = new NodeTree[4];
_children[0] = new NodeTree(nw, _nodeCapacity, _nodeMinSize, _nodeDepth + 1);
_children[1] = new NodeTree(ne, _nodeCapacity, _nodeMinSize, _nodeDepth + 1);
_children[2] = new NodeTree(sw, _nodeCapacity, _nodeMinSize, _nodeDepth + 1);
_children[3] = new NodeTree(se, _nodeCapacity, _nodeMinSize, _nodeDepth + 1);
_isDivided = true;
}
/*
* Bucket Quadtree (also known as Cell Quadtree)
*/
public bool Insert(int index, float2 point)
{
if (!_bounds.Contains(new Vector2(point.x, point.y)))
return false;
if (_points.Count < _nodeCapacity || !CanDivide())
{
_points.Add(index);
return true;
}
else
{
if (!_isDivided)
Subdivide();
foreach (var child in _children)
{
if (child.Insert(index, point))
return true;
}
}
return false;
}
private bool CanDivide()
{
return (_bounds.width >= _nodeMinSize && _bounds.height >= _nodeMinSize);
}
public void QueryRange(Rect range, NativeArray<float2> boidPositions, NativeList<int> found)
{
if (!_bounds.Overlaps(range, true))
return;
foreach (var p in _points)
{
if (range.Contains(boidPositions[p]))
{
found.Add(p);
}
}
if (_isDivided)
{
foreach (var child in _children)
{
child.QueryRange(range, boidPositions, found);
}
}
}
}
Trade-offs and Limits
- High density reduces efficiency: if many boids are crammed into a small area, comparisons can still approach O(n²).
- Tree must be rebuilt every frame: since boids can move fast, region assignments change constantly. In theory, we could update instead of rebuild, but it would be complex and computationally expensive, so I prefer to rebuild the QuadTree every frame.
- Build cost exists, but is outweighed by the performance gains in neighbor searches.
6. Optimizing with Unity Jobs & Burst Compiler
After optimizing the neighbor search with QuadTree, another bottleneck remains: the per-frame computation each entity must perform.
Even with fast lookups, each agent still has to:
- compute forces based on neighbors,
- update its position and velocity,
- avoid obstacles, predators, explosions, etc.
By default, Unity runs all this on the main thread, sequentially. This wastes the potential of multi-core CPUs.
The Job System is Unity’s way to let developers write safe, multithreaded code.
It allows us to break tasks into “jobs” that run in parallel across CPU cores, perfect for operations repeated across many data points.
But parallel programming comes with trade-offs:
- Limited Unity API access: jobs can’t access Transform, GameObject, or other non-thread-safe components.
- Only blittable types: jobs must use blittable data types (bool, int, float, native array). Blittable data refers to data types that have the same binary representation in managed and unmanaged memory. This means they can be directly copied from one memory space to another without transformation or marshaling.
- Manual memory management: with NativeArray and similar structures, developers must allocate and deallocate memory manually.
- Setup overhead: preparing and synchronizing jobs adds complexity, so it’s best suited for heavy, repetitive workloads.
In combination with the Job System, Unity’s Burst Compiler can compile your code into highly optimized machine instructions, squeezing out even more performance.
How I used jobs in my simulation
First, I identified which logic could be parallelized:
Parallelizable:
- The behaviors of agents: Alignment, Separation, Cohesion, Attraction, Repulsion, Flee, Wander
Not parallelizable:
- QuadTree construction and search (due to data structure complexity).
Second, for each entity, I determined the maximum neighbor distance of interest, then I built three arrays:
- neighborsArray: a flat map of neighbors.
- neighborsCounts: how many neighbors each agent has.
- neighborsStartIndices: where each agent’s neighbors begin in the flat array.
This setup let me pass read-only data to jobs, which is ideal for safe parallelism.
Note: below is a stripped-down excerpt of the Update()
method, focused solely on the part where neighbor arrays (_neighborsArray
, _neighborsCounts
, _neighborsStartIndices
) are built.
The actual job scheduling and execution logic that follows has been omitted here for clarity and focus.
void Update()
{
_quadTree.Build(_boids.Positions);
int neighborIdx = 0;
for (int i = 0; i < _boids.Count; i++)
{
_neighborsStartIndices[i] = neighborIdx;
_neighborsCounts[i] = 0;
NativeList<int> foundNeighbors = new NativeList<int>(Allocator.Temp);
Vector2 halfPos = new Vector2(_boids.Positions[i].x - _boids.MaxPerceptionRadius/2,
_boids.Positions[i].y - _boids.MaxPerceptionRadius/2);
Rect perceptionRect = new Rect(halfPos, new Vector2(_boids.MaxPerceptionRadius, _boids.MaxPerceptionRadius));
_quadTree.QueryRange(perceptionRect, _boids.Positions, foundNeighbors);
for (int j = 0; j < foundNeighbors.Length; j++)
{
int neighborIndex = foundNeighbors[j];
if (neighborIndex != i) // Avoid to add itself.
{
_neighborsArray[neighborIdx] = neighborIndex;
neighborIdx++;
_neighborsCounts[i]++;
}
}
foundNeighbors.Dispose();
}
// The jobs will be scheduled in the next step using these arrays...
}
Once the data was ready, I launched the jobs in parallel. Since each job works on its own slice of data, there’s no memory contention, so everything runs safely.
Finally, after the jobs are finished, I update each entity’s Transform.
Note: below is an excerpt that focuses specifically on how the alignment behavior is computed using Unity’s Job System. The code shows how the job is defined, scheduled, and how the result is retrieved. Other behaviors like cohesion, separation, attraction, and so on follow the same pattern, for the sake of clarity, only alignment is shown here.
public class Flocking : MonoBehaviour
{
private NativeArray<JobHandle> _boidsJobHandles;
private JobHandle _alignmentJob;
void Update()
{
// Ensure the previous job is completed before accessing its data again
_alignmentJob.Complete();
// Rebuild QuadTree and prepare neighbor arrays (see previous section)...
// Start the alignment job with all required data and parameters
_alignmentJob = AlignmentJob.Begin(_boids.Positions,
_boids.Velocities,
_neighborsArray,
_neighborsCounts,
_neighborsStartIndices,
_boids.Alignment.Steering,
_boids.Behavior.MovementAccuracy,
_boids.Alignment.PerceptionRadius,
_boids.Alignment.PerceptionAngle,
_boids.Alignment.MinDistanceWeight,
_boids.Alignment.MaxDistanceWeight,
_boids.Alignment.DesiredMagnitude,
_boids.Alignment.MaxSteeringForce,
_boids.Alignment.ScaleForce);
}
void LateUpdate()
{
// Wait for the job to finish before applying its results
// Using LateUpdate gives the job as much time as possible to execute,
// since it's the last method called in the game loop
_boidsJobHandles = new NativeArray<JobHandle>(1, Allocator.TempJob);
_boidsJobHandles[0] = _alignmentJob;
JobHandle boidsJobDependencies = JobHandle.CombineDependencies(_boidsJobHandles);
// Once the jobs are done, apply results to the Transform of each boid
// (not shown here for brevity)
}
}
[BurstCompile(CompileSynchronously = true)]
public struct AlignmentJob : IJobParallelFor
{
private NativeArray<float2> _alignmentSteering;
[ReadOnly] private NativeArray<float2> _positions;
[ReadOnly] private NativeArray<float2> _velocities;
[ReadOnly] private NativeArray<int> _neighborsArray;
[ReadOnly] private NativeArray<int> _neighborsCount;
[ReadOnly] private NativeArray<int> _neighborsStartIndices;
[ReadOnly] private int _movementAccuracy;
[ReadOnly] private float _perceptionRadius;
[ReadOnly] private float _perceptionAngle;
[ReadOnly] private float _minDistanceWeight;
[ReadOnly] private float _maxDistanceWeight;
[ReadOnly] private float _desiredMagnitude;
[ReadOnly] private float _maxSteeringForce;
[ReadOnly] private float _scaleForce;
public static JobHandle Begin(NativeArray<float2> positions,
NativeArray<float2> velocities,
NativeArray<int> neighborsArray,
NativeArray<int> neighborsCount,
NativeArray<int> neighborsStartIndices,
NativeArray<float2> steering,
int movementAccuracy,
float perceptionRadius,
float perceptionAngle,
float minDistanceWeight,
float maxDistanceWeight,
float desiredMagnitude,
float maxSteeringForce,
float scaleForce)
{
var job = new AlignmentJob()
{
_positions = positions,
_velocities = velocities,
_neighborsArray = neighborsArray,
_neighborsCount = neighborsCount,
_neighborsStartIndices = neighborsStartIndices,
_alignmentSteering = steering,
_movementAccuracy = movementAccuracy,
_perceptionRadius = perceptionRadius,
_perceptionAngle = perceptionAngle,
_minDistanceWeight = minDistanceWeight,
_maxDistanceWeight = maxDistanceWeight,
_desiredMagnitude = desiredMagnitude,
_maxSteeringForce = maxSteeringForce,
_scaleForce = scaleForce
};
return IJobParallelForExtensions.Schedule(job, positions.Length, 32);
}
public void Execute(int index)
{
float2 currentPos = _positions[index];
float2 currentVel = _velocities[index];
int count = _neighborsCount[index];
int startIdx = _neighborsStartIndices[index];
if (count <= 0)
{
_alignmentSteering[index] = float2.zero;
return;
}
bool ran = false;
float2 desired = float2.zero;
for (int i = 0; i < count && i < _movementAccuracy; i++)
{
int neighborIndex = _neighborsArray[startIdx + i];
float2 neighborPosition = _positions[neighborIndex];
float2 distToNeighbor = currentPos - neighborPosition;
float distance = math.length(distToNeighbor);
if (distance < _perceptionRadius)
{
float2 dirToNeighbor = (_positions[neighborIndex] - currentPos);
dirToNeighbor = math.normalize(dirToNeighbor);
float2 forward = math.normalize(currentVel);
float dot = math.dot(forward, dirToNeighbor);
float angleToNeighbor = math.acos(dot);
float angleDegToNeighbor = math.degrees(angleToNeighbor);
if (angleDegToNeighbor < (_perceptionAngle / 2))
{
float weightedDistance = math.remap(0f, _perceptionRadius, _minDistanceWeight, _maxDistanceWeight, distance);
desired += math.normalize(_velocities[neighborIndex]) * weightedDistance;
ran = true;
}
}
}
if (ran)
{
desired = math.normalize(desired) * _desiredMagnitude;
float2 steering = desired - currentVel;
float steeringMagnitude = math.length(steering);
if (steeringMagnitude > _maxSteeringForce)
steering = math.normalize(steering) * _maxSteeringForce;
steering *= _scaleForce;
_alignmentSteering[index] = steering;
}
else
{
_alignmentSteering[index] = float2.zero;
}
}
}
How the job is defined and scheduled
In Unity’s Job System, defining a job means creating a struct that implements a specific interface — in this case, IJobParallelFor
. This interface tells Unity that the job will be executed in parallel, once for each index in the range you provide. In our simulation, that means one execution per boid.
The struct AlignmentJob
represents this definition. It contains all the input data (like positions, velocities, and neighbors) and one output array where each boid’s steering result will be stored.
To run the job, we use the Schedule()
method. This not only starts the job but also allows us to specify the total number of iterations (i.e., the number of boids) and a batchCount
.
The batchCount
defines how many iterations each worker thread should process at a time. Smaller values can increase parallelism, while larger values reduce scheduling overhead. In practice, a value like 32
is a good balance between performance and responsiveness.
Before scheduling a new job in Update()
, we call .Complete()
on the previous frame’s alignment job. This ensures that the job has fully finished before we try to read from or write to any of its data (such as _alignmentSteering
). Unity’s jobs run asynchronously, so accessing memory that's still in use by a running job could lead to race conditions or crashes,Complete()
prevents that by blocking until the job has safely finished.
Later in the frame, we wait for the current job to complete in LateUpdate()
. This is intentional: since LateUpdate()
is the last step in Unity's frame loop, it gives the job as much time as possible to run in the background before we need the results.
This approach helps maximize parallel execution time while minimizing the risk of stalls or delays on the main thread.
Once scheduled, Unity runs the job concurrently across CPU threads, using the data we’ve passed in. The Execute()
method inside the job is where the actual computation for each boid takes place. Although we won’t go into its internal logic here, it’s important to understand that this is the function that Unity will call repeatedly, once for each index, in parallel, to perform the alignment calculation.
This setup ensures high performance and scalability while keeping the logic thread-safe and deterministic.
Job System best practices
When working with Unity’s Job System, following a few best practices can make the difference between a simulation that just runs, and one that runs efficiently and safely.
- Profile early and often
Before optimizing blindly, it’s essential to measure. Use Unity’s Profiler to identify real bottlenecks in your simulation, whether they’re related to physics, rendering, or CPU-bound logic like boid behaviors.
The Profiler helps you see exactly where time is spent in each frame, so you can determine which parts benefit most from parallelization. Don’t assume, measure first. - Avoid overkill
While Unity’s Job System is powerful, not every task is worth offloading. Very small or infrequent computations may not benefit from parallelization, and in some cases, the setup cost (allocating data, scheduling, synchronizing) can outweigh the gains.
Jobs are best reserved for workloads that are repeated, data-heavy, and scale with the number of agents or elements, like our per-boid behavior computations. - Manage memory responsibly
NativeArray
and otherNative*
containers provide high performance but require manual memory management.
Always deallocate them when you're done using them, otherwise, you'll create memory leaks that can go unnoticed during development.
Where possible, try to allocate once and reuse across frames, instead of reallocating every update. This reduces pressure on the garbage collector and improves frame consistency.
7. Reflections and next steps
From a model born in 1986, we’ve built a modern, interactive, scalable simulation that can handle tens of thousands of agents with believable emergent behavior.
Thanks to a combination of:
- simple yet powerful rules (alignment, cohesion, separation, etc.),
- spatial optimizations like QuadTree,
- parallel computation via Unity Jobs + Burst Compiler,
- and real-time interactivity (predators, explosions, attraction),
we’ve created a fluid, engaging simulation that’s ready for all sorts of uses.
Future ideas
This simulation can be a launchpad for more advanced projects. Some ideas:
- Learning-based flocking: use machine learning to evolve behaviors toward goals.
- Gameplay integration: use boids for dynamic NPCs, enemy swarms, or ambient crowd systems.
- Biological or generative art: simulate natural phenomena or create interactive visual experiences.
Source code
Want to try it out, break it down, or just have fun scaring boids with a mouse click?
You can find the full project here:
https://github.com/fgrenoville/2D-Flocking-Simulation
Thanks for reading!
If you enjoyed the article, consider leaving a like, a comment, or sharing it.
Got questions about the code, optimizations, or want to build on it? I’d love to chat!