• Basic Traversal Algorithm

    • Recursive version (conceptual)
    • Iterative version (GPU-friendly — no recursion)
      • Use an explicit stack (array of node indices)

  • Ordered Traversal (Optimization)

    • Visit the closer child first
    • If closer child has a hit, the farther child may be skipped entirely
    • How to determine closer child
      • Compare AABB intersection distances: t_left vs t_right
      • Push farther child first (so closer child is on top of stack)
    • Significant speedup for primary rays (coherent rays)
    • Less benefit for secondary rays (incoherent)

  • Shadow Ray Optimization

    • Shadow rays only need to know IF there’s a hit, not WHICH hit
    • Use gl_RayFlagsTerminateOnFirstHitEXT in Vulkan
    • In software BVH: return immediately on first primitive hit
    • Avoids traversing the entire BVH for occlusion queries

  • Stack Overflow

    • BVH depth can exceed stack size for degenerate trees
    • Typical max depth: 30-40 for well-built BVH
    • Stack size of 64 is usually sufficient
    • Degenerate case: all primitives in a line → depth = N (use SAH to avoid)

  • GPU Traversal Considerations

    • Warp divergence
      • Different threads in a warp may take different BVH paths
      • Reduces GPU utilization (some threads idle while others traverse)
      • Hardware RT (Vulkan) handles this internally — major advantage
    • Memory access patterns
      • BVH nodes should be in a contiguous buffer (good cache behavior)
      • Leaf primitives should be sorted to match BVH order
    • Persistent threads
      • Keep threads alive across multiple rays (avoid kernel launch overhead)
      • Used in production GPU path tracers