John Smith's picture
In a Training Loop πŸ”„

John Smith PRO

John6666

AI & ML interests

None yet

Recent Activity

reacted to AbstractPhil's post with πŸš€ 7 minutes ago
The Long: this is a proof of concept; ensemble compilation vmap prototype is functional and can be used to increase throughput for wider batches on FFN, MLP, LLM, or other models than just ensembles. This system traces your model and creates stages of functional activation. Based on the stage it will combine or remove combinations of stages meant to assign your layers to batched functional calls meant to put pressure on your GPU with less loops with directly curated cudagraph compliance where applicable. Identical weights yield identical results at the cost of hardware and vram. TLDR: This is an ensemble optimization adapted to standard models. This will yield high-capacity speed improvements through increased throughput for inference and training alike using carefully traced staged vmap structures. https://github.com/AbstractEyes/pytorch-parallel-compiler The early list of layers isn't fully represented yet, so this is a preliminary look into the potentials of this structure when fully fleshed out. MLP (N=100, batch=32, CUDA): ``` Eager: 2-3x speedup Compiled: 35-40x speedup ``` ResBlock (N=20, batch=8, CUDA): ``` Eager: ~5x speedup Compiled: ~10x speedup ``` This is early testing and so far the yields indicate that WIDENING your model with adjacent shared batched vmaps for uniformly staged models will yield considerably higher output for inference at the cost of additional hardware utilization. This is akin to lining up all your systems and uniformly passing the necessary implications through a shared frozen representation gate. Training for this is not tested nor supported yet, use at your own risk.
View all activity

Organizations

Glide's profile picture open/ acc's profile picture Solving Real World Problems's profile picture FashionStash Group meeting's profile picture No More Copyright's profile picture XORTRON - Criminal Computing's profile picture