Inference speed

by tintwotin - opened 6 days ago

Using Find on a 2 min. 1920x832 video takes: 459.15s on RTX 4090 - can anything be done to speed it up? Like downscaling the video beforehand? Or is a turbo version planned?

rethinkNow

Nemo Station org 6 days ago

459s for a 2-min 1920×832 clip is on the slow end but expected at that resolution. Two things you can try:

Pre-downscale the video. 1920×832 is roughly 8× over the model's per-frame pixel budget (we cap at ~200K pixels via smart_resize internally). The internal resize handles it, but at decode cost. Downscaling to ~640×270 before sending to the model cuts the visual-encoder time substantially without hurting accuracy for grounding-style queries.
Quantise the weights. On a 4090, AWQ-quantised weights + bf16 KV-cache typically give 3-4× throughput vs vanilla bf16. We haven't shipped a quantized checkpoint ourselves yet, but you can do this in a half-hour with llm-compressor or AutoAWQ. If you do, we'd be curious what mIoU you get on TimeLens-Bench to compare against our bf16 numbers.

No "turbo" variant planned — the model is already 2B params, so the realistic speedup path is inference-side, not architectural.

tintwotin

5 days ago

I tried downscaling and it didn't help. I would like to add it to my Pallaidium AI add-on for Blender, but currently it is simply too slow for me. Will check in later to see if something has improved. Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment