This model work by feeding multi sampling frame from video or raw video file?

#7
by CT-Ati - opened

If It is the multi video base
Can we have preprocessing script or optimum sampling rate If we want to apply this into streamming input src?

BTW I just assume base on this info in readme

Video preprocessing
The custom modeling code sets these env vars internally (matches the training-time setup). If you want to override them, set them in your shell before importing transformers:

Env var	Default	What it does
FORCE_QWENVL_VIDEO_READER	torchcodec	Video decoder backend
VIDEO_MAX_PIXELS	200704	Max pixels per frame (~448×448)
FPS	2.0	Frame sampling rate
FPS_MAX_FRAMES	240	Cap on total frames (covers ~2 min videos)
FPS_MIN_FRAMES	4	Floor for very short videos
Nemo Station org

env var that are specified by default are the optimum value, the model was predominately trained at 2 fps.

Sign up or log in to comment