Why Your nn.Linear Is Already Fused (And When torch.compile Actually Helps)
The Hugging Face team digs into PyTorch profiling traces to reveal a surprising truth: eager-mode nn.Linear already fuses bias addition into its GEMM kernel. Here's what that means for performance.