Cattle Vision Framework — Model Weights

Pretrained model weights for the Cattle Vision Framework, an MS thesis project on multi-behavior recognition in dairy cattle from surveillance video.

Pipeline: RF-DETR detection → SAM2 segmentation → OC-SORT tracking → VideoMAE behavior classification

Datasets: CBVD-5 (indoor barn, 12 GB) and CVB (outdoor pasture, 15 GB)

Available Weights

Filename	Size	Description	Val Metric
`rf-detr-medium.pth`	387 MB	RF-DETR-Medium backbone (COCO pretrained, not fine-tuned)	—
`rf-detr-seg-medium.pt`	137 MB	RF-DETR-Seg-Medium fine-tuned on SAM2 pseudo-labels (Phase 3b)	—
`rfdetr_combined_v1_best.pth`	128 MB	RF-DETR cattle detector, trained on CBVD-5 + CVB combined	70.4% mAP@50
`videomae_combined_v1.pt`	330 MB	VideoMAE Config 5: trained on CBVD-5+CVB, eval on both	macro-F1=0.7537
`videomae_cvb_v1.pt`	330 MB	VideoMAE Config 2: trained on CVB, eval on CVB	macro-F1=0.7607
`videomae_cbvd5_v1.pt`	330 MB	VideoMAE Config 1: trained on CBVD-5, eval on CBVD-5	macro-F1=0.3149
`videomae_cbvd5_to_cvb_v1.pt`	330 MB	VideoMAE Config 3: trained on CBVD-5, eval on CVB (cross-domain)	macro-F1=0.1690
`videomae_cvb_to_cbvd5_v1.pt`	330 MB	VideoMAE Config 4: trained on CVB, eval on CBVD-5 (cross-domain)	macro-F1=0.1789

7-Class Behavior Taxonomy

ID	Behavior	Datasets
0	Standing	CBVD-5, CVB
1	Lying	CBVD-5, CVB
2	Foraging/Grazing	CBVD-5, CVB
3	Drinking	CBVD-5, CVB
4	Ruminating	CBVD-5, CVB
5	Grooming	CVB only
6	Other	CVB only

Cross-dataset evaluation uses IDs 0–4 only.

Behavior Classification Results (VideoMAE)

Config	Train	Eval	macro-F1	F1-Standing	F1-Lying	F1-Foraging	F1-Drinking	F1-Ruminating
Config 1	CBVD-5	CBVD-5	0.3149	0.906	0.303	0.896	0.000	0.100
Config 2	CVB	CVB	0.7607	0.860	0.845	0.979	0.881	0.801
Config 3	CBVD-5	CVB	0.1690	0.215	0.281	0.006	0.188	0.493
Config 4	CVB	CBVD-5	0.1789	0.675	0.145	0.432	0.000	0.000
Config 5	CBVD-5+CVB	Both	0.7537	0.870	0.823	0.980	0.876	0.772

Training Configs

Behavior Classification (`videomae_combined_v1.pt`)

experiment_name: videomae_combined_v1
model_name: MCG-NJU/videomae-base-finetuned-kinetics
num_classes: 7

train:
  dataset_filter: null   # combined CBVD-5 + CVB
  split_filter: train

batch_size: 8
grad_accum_steps: 4      # effective batch = 32
num_epochs: 30
lr: 5.0e-5
lr_head: 1.0e-3
weight_decay: 0.05
warmup_epochs: 3
early_stopping_patience: 8
use_class_weights: true

Detection (`rfdetr_combined_v1_best.pth`)

experiment_name: rfdetr_combined_v1
model:
  type: RFDETRMedium
  pretrained: true
dataset:
  name: combined
  num_classes: 1          # single class: cattle
training:
  epochs: 100
  batch_size: 2
  grad_accum_steps: 8     # effective batch = 16
  resolution: 576         # must be divisible by 64
  lr: 1.0e-4
  lr_encoder: 1.5e-4
  gradient_checkpointing: true
  use_ema: true

Usage

Download a weight file

# Install CLI
pip install huggingface-hub

# Download best behavior model
huggingface-cli download sakifkhan98/cattle-vision-framework videomae_combined_v1.pt \
  --local-dir weights/

# Download backbone
huggingface-cli download sakifkhan98/cattle-vision-framework rf-detr-medium.pth \
  --local-dir weights/

Load behavior classifier

import torch
from transformers import VideoMAEForVideoClassification

# Load checkpoint
ckpt = torch.load("weights/videomae_combined_v1.pt", map_location="cpu")
model = VideoMAEForVideoClassification.from_pretrained(
    "MCG-NJU/videomae-base-finetuned-kinetics",
    num_labels=7,
    ignore_mismatched_sizes=True,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

LABEL_NAMES = {
    0: "Standing", 1: "Lying", 2: "Foraging",
    3: "Drinking", 4: "Ruminating", 5: "Grooming", 6: "Other"
}

Load RF-DETR detector

from rfdetr import RFDETRMedium
import torch

model = RFDETRMedium()
ckpt = torch.load("weights/rfdetr_combined_v1_best.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()

Citation

@mastersthesis{khan2026cattle,
  author  = {Sakif Khan},
  title   = {Multi-Behavior Recognition in Dairy Cattle from Surveillance Video},
  school  = {Texas State University},
  year    = {2026},
}

License

MIT License. See LICENSE.

Downloads last month: -; Downloads are not tracked for this model. How to track

sakifkhan98
/

cattle-vision-framework

Cattle Vision Framework — Model Weights

Available Weights

7-Class Behavior Taxonomy

Behavior Classification Results (VideoMAE)

Training Configs

Behavior Classification (`videomae_combined_v1.pt`)

Detection (`rfdetr_combined_v1_best.pth`)

Usage

Download a weight file

Load behavior classifier

Load RF-DETR detector

Citation

License

Dataset used to train sakifkhan98/cattle-vision-framework

Cattle Vision Framework — Model Weights

Available Weights

7-Class Behavior Taxonomy

Behavior Classification Results (VideoMAE)

Training Configs

Behavior Classification (videomae_combined_v1.pt)

Detection (rfdetr_combined_v1_best.pth)

Usage

Download a weight file

Load behavior classifier

Load RF-DETR detector

Citation

License

Dataset used to train sakifkhan98/cattle-vision-framework

Behavior Classification (`videomae_combined_v1.pt`)

Detection (`rfdetr_combined_v1_best.pth`)