FlashPack — a new, high-throughput file format and loading mechanism for PyTorch that makes model checkpoint I/O blazingly fast, even on systems without access to GPU Direct Storage (GDS).
With FlashPack, loading any model can be 3–6× faster than with the current state-of-the-art methods like accelerate or the standard load_state_dict() and to() flow — all wrapped in a lightweight, pure-Python package that works anywhere.
I am on the model layer and focus on atomic tasks, so I don't get involved in product discussions. But this provocative article provoked the community quite a bit. The case in point is Claude Code, which happens to be my biggest productivity revolution since ChatGPT.
RAG predated TUI and agents. So to be fair it's quite an achievement to survive the AI evolution. But I feel it is overshadowed by context engineering in the agent era. How does everyone feel about this?