When you create a pipeline in Vulkan, the graphics driver compiles the SPIR-V code from your application into assembly code that the GPU can execute. Given that applications can have tens of thousands of these pipelines, the speed of compilation is crucial. When tracing application execution with GFXReconstruct, all these pipelines are compiled at the beginning of the trace, which can significantly slow down the process.
The Challenge with GPU-AV
GPU-AV, our GPU validation tool, requires instrumenting the SPIR-V code with additional validation and error reporting instructions. Initially, this instrumentation caused performance issues, with each pipeline taking multiple seconds to compile, and the entire trace taking hours to start with GPU-AV enabled. Although subsequent runs were faster due to shader caching by the driver, we needed to enhance the initial startup time.
Our Optimization Strategies
Here are the four main strategies we employed to accelerate the process:
- Leveraging the “DontInline” Flag:
- SPIR-V includes a “DontInline” flag for functions. We found that most drivers ignored this flag. Inlining generally improves application performance. But with NVIDIA’s 553.31 Windows Driver, Setting this flag gave us a 10x speed increase, reducing compilation time for small traces from 205 seconds to just 32 seconds. The speedup occurred because GPU-AV instrumented the same function multiple times, and the driver’s inlining was counterproductive.
- Segmenting Instrumentation:
- We divided the large, monolithic instrumentation into smaller, more targeted checks. This allowed us to skip unnecessary instrumentation, particularly in scenarios where certain features like descriptor indexing weren’t used. This approach significantly improved performance for applications with limited use of advanced GPU features.
- Utilizing Vulkan Robustness Features:
- To prevent application crashes, GPU-AV would wrap every potentially invalid SPIR-V instruction in an if/else construct. This was computationally expensive. By adopting Vulkan’s robustness features, we could delegate crash prevention to the hardware, thereby simplifying and speeding up our instrumentation.
- Optimizing for Graphics Pipeline Libraries:
- Previously, we instrumented each library in the Graphics Pipeline Library individually. However, since many libraries might not be used during a trace, we shifted the instrumentation to occur at the library linking phase (typically at draw time). This not only distributed the compilation load over time but also reduced unnecessary instrumentation.
- Bonus: Parallel Pipeline Compilation:
- GFXReconstruct recently introduced a command line option for parallel compilation of pipelines, which, while still in beta, has dramatically increased compilation speeds. (Contact us for more details.)
Conclusion
These optimizations have significantly reduced the overhead of using GPU-AV for Vulkan pipeline validation, making it more practical for developers to use in their workflows without substantial delays. We continue to refine and enhance these processes to ensure both performance and reliability in GPU validation tasks.