-
What Is Async Compute?
- Running compute work in parallel with graphics work on the GPU
- Modern GPUs have separate compute queues that can run alongside the graphics queue
- Enables better GPU utilization by filling idle shader units
-
GPU Queue Types
- Graphics queue: supports all operations (graphics, compute, transfer)
- Compute queue: compute + transfer only (no rasterization)
- Transfer queue: DMA transfers only
- Multiple queues can run simultaneously on different hardware units
-
Why It Matters for Path Tracing
- BLAS builds are compute-heavy — can overlap with rendering
- Denoising passes can overlap with next frame’s ray tracing
- TLAS rebuild can overlap with shadow ray tracing
- Typical frame timeline without async
[BLAS build] → [TLAS build] → [Ray trace] → [Denoise] → [Present]
- With async compute
[BLAS build (async)] ↕ [TLAS build] → [Ray trace] → [Denoise (async)] ↕ [Present]
-
Vulkan Async Compute Setup
- Find a compute-only queue family
- Create separate command pools and queues for compute
- Submit compute work to compute queue, graphics to graphics queue
-
Synchronization
- Async compute requires careful synchronization
- Timeline semaphores (Vulkan 1.2) — preferred
- Signal from compute queue, wait on graphics queue
- Pipeline barriers within a queue
- Queue ownership transfers for shared resources
-
Practical Considerations
- Not all GPUs benefit equally
- Integrated GPUs: often single queue, no benefit
- Discrete GPUs: multiple compute units, significant benefit
- Overhead: synchronization adds complexity and some latency
- Profile first: measure actual GPU utilization before optimizing
- NVIDIA NSight, AMD RGP — tools for visualizing queue utilization
-
In Godot Context
- Godot’s
RenderingDevice exposes compute queues
- BLAS builds for skinned meshes are good candidates for async
- Denoising (OIDN compute) can run async with next frame’s RT