sakifkhan98/cattle-vision-data
Viewer β’ Updated β’ 1.19k β’ 25
Pretrained model weights for the Cattle Vision Framework, an MS thesis project on multi-behavior recognition in dairy cattle from surveillance video.
Pipeline: RF-DETR detection β SAM2 segmentation β OC-SORT tracking β VideoMAE behavior classification
Datasets: CBVD-5 (indoor barn, 12 GB) and CVB (outdoor pasture, 15 GB)
| Filename | Size | Description | Val Metric |
|---|---|---|---|
rf-detr-medium.pth |
387 MB | RF-DETR-Medium backbone (COCO pretrained, not fine-tuned) | β |
rf-detr-seg-medium.pt |
137 MB | RF-DETR-Seg-Medium fine-tuned on SAM2 pseudo-labels (Phase 3b) | β |
rfdetr_combined_v1_best.pth |
128 MB | RF-DETR cattle detector, trained on CBVD-5 + CVB combined | 70.4% mAP@50 |
videomae_combined_v1.pt |
330 MB | VideoMAE Config 5: trained on CBVD-5+CVB, eval on both | macro-F1=0.7537 |
videomae_cvb_v1.pt |
330 MB | VideoMAE Config 2: trained on CVB, eval on CVB | macro-F1=0.7607 |
videomae_cbvd5_v1.pt |
330 MB | VideoMAE Config 1: trained on CBVD-5, eval on CBVD-5 | macro-F1=0.3149 |
videomae_cbvd5_to_cvb_v1.pt |
330 MB | VideoMAE Config 3: trained on CBVD-5, eval on CVB (cross-domain) | macro-F1=0.1690 |
videomae_cvb_to_cbvd5_v1.pt |
330 MB | VideoMAE Config 4: trained on CVB, eval on CBVD-5 (cross-domain) | macro-F1=0.1789 |
| ID | Behavior | Datasets |
|---|---|---|
| 0 | Standing | CBVD-5, CVB |
| 1 | Lying | CBVD-5, CVB |
| 2 | Foraging/Grazing | CBVD-5, CVB |
| 3 | Drinking | CBVD-5, CVB |
| 4 | Ruminating | CBVD-5, CVB |
| 5 | Grooming | CVB only |
| 6 | Other | CVB only |
Cross-dataset evaluation uses IDs 0β4 only.
| Config | Train | Eval | macro-F1 | F1-Standing | F1-Lying | F1-Foraging | F1-Drinking | F1-Ruminating |
|---|---|---|---|---|---|---|---|---|
| Config 1 | CBVD-5 | CBVD-5 | 0.3149 | 0.906 | 0.303 | 0.896 | 0.000 | 0.100 |
| Config 2 | CVB | CVB | 0.7607 | 0.860 | 0.845 | 0.979 | 0.881 | 0.801 |
| Config 3 | CBVD-5 | CVB | 0.1690 | 0.215 | 0.281 | 0.006 | 0.188 | 0.493 |
| Config 4 | CVB | CBVD-5 | 0.1789 | 0.675 | 0.145 | 0.432 | 0.000 | 0.000 |
| Config 5 | CBVD-5+CVB | Both | 0.7537 | 0.870 | 0.823 | 0.980 | 0.876 | 0.772 |
videomae_combined_v1.pt)
experiment_name: videomae_combined_v1
model_name: MCG-NJU/videomae-base-finetuned-kinetics
num_classes: 7
train:
dataset_filter: null # combined CBVD-5 + CVB
split_filter: train
batch_size: 8
grad_accum_steps: 4 # effective batch = 32
num_epochs: 30
lr: 5.0e-5
lr_head: 1.0e-3
weight_decay: 0.05
warmup_epochs: 3
early_stopping_patience: 8
use_class_weights: true
rfdetr_combined_v1_best.pth)
experiment_name: rfdetr_combined_v1
model:
type: RFDETRMedium
pretrained: true
dataset:
name: combined
num_classes: 1 # single class: cattle
training:
epochs: 100
batch_size: 2
grad_accum_steps: 8 # effective batch = 16
resolution: 576 # must be divisible by 64
lr: 1.0e-4
lr_encoder: 1.5e-4
gradient_checkpointing: true
use_ema: true
# Install CLI
pip install huggingface-hub
# Download best behavior model
huggingface-cli download sakifkhan98/cattle-vision-framework videomae_combined_v1.pt \
--local-dir weights/
# Download backbone
huggingface-cli download sakifkhan98/cattle-vision-framework rf-detr-medium.pth \
--local-dir weights/
import torch
from transformers import VideoMAEForVideoClassification
# Load checkpoint
ckpt = torch.load("weights/videomae_combined_v1.pt", map_location="cpu")
model = VideoMAEForVideoClassification.from_pretrained(
"MCG-NJU/videomae-base-finetuned-kinetics",
num_labels=7,
ignore_mismatched_sizes=True,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
LABEL_NAMES = {
0: "Standing", 1: "Lying", 2: "Foraging",
3: "Drinking", 4: "Ruminating", 5: "Grooming", 6: "Other"
}
from rfdetr import RFDETRMedium
import torch
model = RFDETRMedium()
ckpt = torch.load("weights/rfdetr_combined_v1_best.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()
@mastersthesis{khan2026cattle,
author = {Sakif Khan},
title = {Multi-Behavior Recognition in Dairy Cattle from Surveillance Video},
school = {Texas State University},
year = {2026},
}
MIT License. See LICENSE.