This model work by feeding multi sampling frame from video or raw video file?

by CT-Ati - opened 7 days ago

If It is the multi video base
Can we have preprocessing script or optimum sampling rate If we want to apply this into streamming input src?

BTW I just assume base on this info in readme

Video preprocessing
The custom modeling code sets these env vars internally (matches the training-time setup). If you want to override them, set them in your shell before importing transformers:

Env var	Default	What it does
FORCE_QWENVL_VIDEO_READER	torchcodec	Video decoder backend
VIDEO_MAX_PIXELS	200704	Max pixels per frame (~448×448)
FPS	2.0	Frame sampling rate
FPS_MAX_FRAMES	240	Cap on total frames (covers ~2 min videos)
FPS_MIN_FRAMES	4	Floor for very short videos

rethinkNow

Nemo Station org 7 days ago

env var that are specified by default are the optimum value, the model was predominately trained at 2 fps.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment