Performance

RTL simulation has been CPU-only for decades. skalp puts it on the GPU — starting with Metal on Apple Silicon, where unified memory means zero DMA. How SharedCodegen produces both Metal shaders and compiled C++ from the same core, what the simulation step looks like on a GPU, and why fault simulation at 10M faults/sec is embarrassingly parallel.