Cattle Vision Framework β€” Model Weights

Pretrained model weights for the Cattle Vision Framework, an MS thesis project on multi-behavior recognition in dairy cattle from surveillance video.

Pipeline: RF-DETR detection β†’ SAM2 segmentation β†’ OC-SORT tracking β†’ VideoMAE behavior classification

Datasets: CBVD-5 (indoor barn, 12 GB) and CVB (outdoor pasture, 15 GB)


Available Weights

Filename Size Description Val Metric
rf-detr-medium.pth 387 MB RF-DETR-Medium backbone (COCO pretrained, not fine-tuned) β€”
rf-detr-seg-medium.pt 137 MB RF-DETR-Seg-Medium fine-tuned on SAM2 pseudo-labels (Phase 3b) β€”
rfdetr_combined_v1_best.pth 128 MB RF-DETR cattle detector, trained on CBVD-5 + CVB combined 70.4% mAP@50
videomae_combined_v1.pt 330 MB VideoMAE Config 5: trained on CBVD-5+CVB, eval on both macro-F1=0.7537
videomae_cvb_v1.pt 330 MB VideoMAE Config 2: trained on CVB, eval on CVB macro-F1=0.7607
videomae_cbvd5_v1.pt 330 MB VideoMAE Config 1: trained on CBVD-5, eval on CBVD-5 macro-F1=0.3149
videomae_cbvd5_to_cvb_v1.pt 330 MB VideoMAE Config 3: trained on CBVD-5, eval on CVB (cross-domain) macro-F1=0.1690
videomae_cvb_to_cbvd5_v1.pt 330 MB VideoMAE Config 4: trained on CVB, eval on CBVD-5 (cross-domain) macro-F1=0.1789

7-Class Behavior Taxonomy

ID Behavior Datasets
0 Standing CBVD-5, CVB
1 Lying CBVD-5, CVB
2 Foraging/Grazing CBVD-5, CVB
3 Drinking CBVD-5, CVB
4 Ruminating CBVD-5, CVB
5 Grooming CVB only
6 Other CVB only

Cross-dataset evaluation uses IDs 0–4 only.


Behavior Classification Results (VideoMAE)

Config Train Eval macro-F1 F1-Standing F1-Lying F1-Foraging F1-Drinking F1-Ruminating
Config 1 CBVD-5 CBVD-5 0.3149 0.906 0.303 0.896 0.000 0.100
Config 2 CVB CVB 0.7607 0.860 0.845 0.979 0.881 0.801
Config 3 CBVD-5 CVB 0.1690 0.215 0.281 0.006 0.188 0.493
Config 4 CVB CBVD-5 0.1789 0.675 0.145 0.432 0.000 0.000
Config 5 CBVD-5+CVB Both 0.7537 0.870 0.823 0.980 0.876 0.772

Training Configs

Behavior Classification (videomae_combined_v1.pt)

experiment_name: videomae_combined_v1
model_name: MCG-NJU/videomae-base-finetuned-kinetics
num_classes: 7

train:
  dataset_filter: null   # combined CBVD-5 + CVB
  split_filter: train

batch_size: 8
grad_accum_steps: 4      # effective batch = 32
num_epochs: 30
lr: 5.0e-5
lr_head: 1.0e-3
weight_decay: 0.05
warmup_epochs: 3
early_stopping_patience: 8
use_class_weights: true

Detection (rfdetr_combined_v1_best.pth)

experiment_name: rfdetr_combined_v1
model:
  type: RFDETRMedium
  pretrained: true
dataset:
  name: combined
  num_classes: 1          # single class: cattle
training:
  epochs: 100
  batch_size: 2
  grad_accum_steps: 8     # effective batch = 16
  resolution: 576         # must be divisible by 64
  lr: 1.0e-4
  lr_encoder: 1.5e-4
  gradient_checkpointing: true
  use_ema: true

Usage

Download a weight file

# Install CLI
pip install huggingface-hub

# Download best behavior model
huggingface-cli download sakifkhan98/cattle-vision-framework videomae_combined_v1.pt \
  --local-dir weights/

# Download backbone
huggingface-cli download sakifkhan98/cattle-vision-framework rf-detr-medium.pth \
  --local-dir weights/

Load behavior classifier

import torch
from transformers import VideoMAEForVideoClassification

# Load checkpoint
ckpt = torch.load("weights/videomae_combined_v1.pt", map_location="cpu")
model = VideoMAEForVideoClassification.from_pretrained(
    "MCG-NJU/videomae-base-finetuned-kinetics",
    num_labels=7,
    ignore_mismatched_sizes=True,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

LABEL_NAMES = {
    0: "Standing", 1: "Lying", 2: "Foraging",
    3: "Drinking", 4: "Ruminating", 5: "Grooming", 6: "Other"
}

Load RF-DETR detector

from rfdetr import RFDETRMedium
import torch

model = RFDETRMedium()
ckpt = torch.load("weights/rfdetr_combined_v1_best.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()

Citation

@mastersthesis{khan2026cattle,
  author  = {Sakif Khan},
  title   = {Multi-Behavior Recognition in Dairy Cattle from Surveillance Video},
  school  = {Texas State University},
  year    = {2026},
}

License

MIT License. See LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train sakifkhan98/cattle-vision-framework