Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Paper • 2301.08243 • Published • 7
This repository provides the target encoder weights for an I-JEPA Vision Transformer Huge model.
model_weights.pth: PyTorch checkpoint containing target encoder weightsconfig.json: architecture configuration used for this checkpointThe checkpoint contains weights only. Instantiate a matching model architecture, then load the state dict:
import json
import torch
with open("config.json", "r") as f:
cfg = json.load(f)
# Build your model class with cfg (must match training architecture)
# model = build_target_encoder(**cfg)
state = torch.load("model_weights.pth", map_location="cpu")
# model.load_state_dict(state, strict=True)
# model.eval()
If you use I-JEPA in your work, please cite:
@article{assran2023self,
title={Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture},
author={Assran, Mahmoud and Duval, Quentin and Misra, Ishan and Bojanowski, Piotr and Vincent, Pascal and Rabbat, Michael and LeCun, Yann and Ballas, Nicolas},
journal={arXiv preprint arXiv:2301.08243},
year={2023}
}