PINN-JEPA — Physics-Informed Encoder for 3D Human Motion (Step 1)

A physics-informed neural network (PINN) encoder for 3D skeletal human motion. The encoder is pretrained in a self-supervised way to produce motion representations whose kinematic structure (position → velocity → acceleration → jerk) and bone geometry stay physically consistent. This repository releases the Step 1 PINN pretraining stage (encoder body + pretraining head) and a non-latest checkpoint.

Status: research release. The published weights are not the latest internal checkpoint, and are intended for reproducibility and experimentation, not production.

Model overview


Input	`(B, T, J, 12)` — per joint state `[p(3), v(3), a(3), j(3)]`
Skeleton	H36M 17-joint topology (`J = 17`)
Output	token features `(B, T, J, D)` + reconstructed state `s_hat (B, T, J, 12)`
Backbone	State embedding → (GraphMix spatial + TemporalBlock) × depth → LayerNorm
Default size	`d_model = 256`, `depth = 6`, `d_state = 64`
Framework	PyTorch (custom modules, no `transformers` dependency)

The encoder predicts a residual on position only; velocity, acceleration and jerk are derived analytically via central differences, which is what keeps the representation kinematically consistent rather than letting each channel drift independently.

Training objective (Step 1)

Self-supervised state reconstruction combined with physics-aware regularizers:

State reconstruction on p / v / a / j (weighted)
Bone-length consistency over skeleton edges
Kinematic consistency (finite-difference agreement between channels)
Jerk regularization for motion smoothness

See PINN_Lossfunction.py for exact terms and default weights.

Repository contents

PINN_EncoderBody.py                 # backbone (StateEmbedding, GraphMix, TemporalBlock, EncoderBody)
PINN_PretrainModel.py               # Step 1 model: encoder + residual-p head -> s_hat
PINN_Lossfunction.py                # physics-aware pretraining losses
PINN_Training.py                    # train step, checkpoint save/load
PINN_ModelEvaluation_downstream.py  # representation-quality eval (clustering)
PINN_ModelEvaluation_itself4.py     # model self-evaluation
PINN_visualization_for_model3.py    # 3D skeleton render / input-vs-output compare
Utils.py                            # skeleton edges, central_diff, masked_mean, etc.
config.json                         # architecture hyperparameters (edit to match the checkpoint)
export_weights.py                   # slim a training checkpoint -> release weights
inference_example.py                # minimal load + forward example

Note on imports. Modules use flat imports (from Utils import ...). Keep all .py files at the repository root, or add the repo root to PYTHONPATH, before importing.

Usage

import json, torch
from PINN_EncoderBody import EncoderBody
from PINN_PretrainModel import PINNPretrainModel

cfg = json.load(open("config.json"))
encoder = EncoderBody(**cfg["encoder"])
model = PINNPretrainModel(encoder=encoder, fps=cfg["fps"])

state = torch.load("pytorch_model.bin", map_location="cpu")
model.load_state_dict(state)
model.eval()

# x: (B, T, J=17, 12) = [p, v, a, j] per joint
out = model(x)
features = out["token_feat"]   # (B, T, J, D) representation
s_hat    = out["s_hat"]        # (B, T, J, 12) reconstructed state

config.json ships with the architecture defaults. If the released checkpoint was trained with different settings, edit config.json so the shapes match before loading.

Intended use

Self-supervised motion representation learning research
Feature extraction for downstream pose/motion tasks
Studying physics-informed regularization for skeletal motion

Out of scope

Not a clinical, diagnostic, biometric, or safety-critical tool
Not trained or validated for person identification or surveillance
Tuned for the H36M 17-joint topology; other skeletons need adaptation/retraining

Limitations

Released weights are an older checkpoint and may underperform the internal latest version.
Assumes a fixed 17-joint topology and a consistent (p, v, a, j) input layout.
fps at inference should match the value used to build the (v, a, j) channels.
Evaluation utilities depend on scikit-learn; UMAP is optional.

License

This release is distributed under the Academic Free License v3.0 (AFL-3.0).

Source code (*.py): AFL-3.0 — see LICENSE.
Model weights (released checkpoint): AFL-3.0, with the disclaimer below.

The weights are provided "as is", for research and reproducibility, without warranty of any kind. They are not the latest internal checkpoint and carry no fitness guarantee for any particular use. See NOTICE for the scope split between code and weights.

Citation

@misc{pinn_jepa_pose,
  title  = {PINN-JEPA: Physics-Informed Encoder for 3D Human Motion},
  author = {<Authors>},
  year   = {2026},
  note   = {Research code and weights, AFL-3.0},
  howpublished = {\url{https://huggingface.co/<org>/<repo>}}
}

Downloads last month: 2