MorphStream Models

Models and TensorRT engine cache for real-time face processing used by MorphStream GPU Worker.

Private repository — requires access token for downloads.

Structure

/
├── inswapper_128.onnx           # Standard face swap (529MB)
├── inswapper_128_fp16.onnx      # FP16 optimized - default (265MB)
├── hyperswap_1a_256.onnx        # HyperSwap variant A (384MB)
├── hyperswap_1b_256.onnx        # HyperSwap variant B (384MB)
├── hyperswap_1c_256.onnx        # HyperSwap variant C (384MB)
├── yolov8n.onnx                 # Person detection (12MB)
├── dfl_xseg.onnx                # XSeg v1 face segmentation — legacy (67MB)
├── xseg_1.onnx                  # XSeg occlusion model 1 (67MB)
├── xseg_2.onnx                  # XSeg occlusion model 2 (67MB)
├── xseg_3.onnx                  # XSeg occlusion model 3 (67MB)
├── 2dfan4.onnx                  # 68-point face landmarks (93MB)
├── bisenet_resnet_34.onnx       # BiSeNet face parsing ResNet-34 (89MB)
├── bisenet_resnet_18.onnx       # BiSeNet face parsing ResNet-18 (51MB)
├── buffalo_l/                   # Direct ONNX face analysis models
│   ├── det_10g.onnx             # SCRFD face detection FP32 (16MB)
│   ├── det_10g_fp16.onnx        # SCRFD face detection FP16 (8.1MB)
│   ├── w600k_r50.onnx           # ArcFace recognition embeddings (166MB)
│   ├── 1k3d68.onnx              # 3D landmarks, 68 points (137MB)
│   ├── 2d106det.onnx            # 2D landmarks, 106 points (4.8MB)
│   └── genderage.onnx           # Gender/age estimation (1.3MB)
├── gfpgan/                      # Face enhancement (not used in real-time)
│   ├── GFPGANv1.4.pth
│   └── weights/
│       ├── detection_Resnet50_Final.pth
│       └── parsing_parsenet.pth
├── trt_cache/                   # Pre-compiled TensorRT engines
│   ├── sm89/trt10.9_ort1.24/   # RTX 4090
│   ├── sm86/trt10.9_ort1.24/   # RTX 3090
│   └── ...                      # Other GPU arch + version combos
└── scripts/
    └── convert_scrfd_fp16.py    # FP32 → FP16 conversion utility

Face Swap Models

Model	Description	Size	Input	Format
`inswapper_128.onnx`	Standard quality	529 MB	128px	ONNX FP32
`inswapper_128_fp16.onnx`	FP16 optimized (default)	265 MB	128px	ONNX FP16
`hyperswap_1a_256.onnx`	High quality — variant A	384 MB	256px	ONNX FP32
`hyperswap_1b_256.onnx`	High quality — variant B	384 MB	256px	ONNX FP32
`hyperswap_1c_256.onnx`	High quality — variant C	384 MB	256px	ONNX FP32

Face Analysis (buffalo_l)

Models originally from InsightFace buffalo_l pack. GPU Worker loads them directly via ONNX Runtime (DirectSCRFD, DirectArcFace, DirectLandmark106) without the InsightFace Python library.

Model	GPU Worker Class	Description	Size
`det_10g.onnx`	`DirectSCRFD`	SCRFD face detection (FP32)	16 MB
`det_10g_fp16.onnx`	`DirectSCRFD`	SCRFD face detection (FP16, ~2x faster on Tensor Cores)	8.1 MB
`w600k_r50.onnx`	`DirectArcFace`	ArcFace R50 face recognition embeddings	166 MB
`2d106det.onnx`	`DirectLandmark106`	2D face landmarks (106 points), CLAHE + face angle rotation. Used in face detection pipeline; 106-pt landmarks serve as fallback for masking when 68-pt unavailable	4.8 MB
`1k3d68.onnx`	—	3D face landmarks (68 points) — not used at runtime	137 MB
`genderage.onnx`	—	Gender and age estimation — not used at runtime	1.3 MB

Face Landmarks

Model	Description	Size	Input
`2dfan4.onnx`	2DFAN4 — 68-point face landmarks	93 MB	256px

FaceFusion-style 5/68 refinement: SCRFD detects face + coarse 5 kps, then 2DFAN4 produces 68 precise landmarks, converted to 5 alignment points (eye centers from 6 points each, exact nose tip, exact mouth corners). Improves face alignment quality for swap models.

Primary landmark model for face masking: 68-pt landmarks from 2DFAN4 are the preferred source for custom_paste_back compositing (hull, cutouts, mouth blend). 106-pt landmarks from 2d106det.onnx serve as fallback. Dual-landmark support: has_valid_68 preferred, has_valid_106 fallback, use_68 flag propagated through all mask functions. Landmarks are temporally smoothed via One Euro Filter in LandmarkSmoother (attribute face.landmark_2d_68).

Source: FaceFusion assets.

Person Detection

Model	Description	Size	Input
`yolov8n.onnx`	YOLOv8n — person detection (COCO class 0)	12 MB	640px

Used to distinguish "person left frame" vs "face occluded" during face swap.

Face Mask Models (FaceFusion 4-Mask System)

Occlusion detection (XSeg) and semantic face parsing (BiSeNet) models for composable mask pipeline. Used in GPU Worker's face_masker.py for box/occlusion/area/region masks.

Source: FaceFusion 3.x assets (Apache-2.0), mirrored here for reliability.

XSeg — Occlusion Detection

Model	Description	Size	Input	Output
`dfl_xseg.onnx`	XSeg v1 — legacy binary face mask (not used)	67 MB	256px	binary (face/bg)
`xseg_1.onnx`	XSeg model 1 — occlusion detection	67 MB	256px	binary (face/bg)
`xseg_2.onnx`	XSeg model 2 — occlusion detection	67 MB	256px	binary (face/bg)
`xseg_3.onnx`	XSeg model 3 — occlusion detection	67 MB	256px	binary (face/bg)

Runtime model selection via IPC: many (all 3 intersected), xseg_1, xseg_2, xseg_3. Input: NHWC float32 [0,1]. Output: intersection of all selected model masks (most conservative).

BiSeNet — Region Segmentation

Model	Description	Size	Input	Classes
`bisenet_resnet_34.onnx`	BiSeNet ResNet-34 (default)	89 MB	512px	19 regions
`bisenet_resnet_18.onnx`	BiSeNet ResNet-18 (lighter)	51 MB	512px	19 regions

Runtime model selection via IPC. Input: NCHW float32 ImageNet-normalized. 10 configurable face regions: skin, left-eyebrow, right-eyebrow, left-eye, right-eye, glasses, upper-lip, nose, lower-lip, mouth.

TensorRT Engine Cache

Pre-compiled TensorRT engines stored in trt_cache/ subfolder, keyed by GPU architecture and software versions. Eliminates cold-start TRT compilation (~180-300s) on new GPU instances.

Layout

trt_cache/
├── sm89/trt10.9_ort1.24/          # RTX 4090 (Ada Lovelace)
│   ├── manifest.json               # Metadata: cache_key, engine list, timestamps
│   ├── TensorrtExecutionProvider_*.engine   # Compiled TRT engines
│   ├── TensorrtExecutionProvider_*.profile  # Profiling data
│   └── timing.cache                # cuDNN/TRT timing optimization cache
├── sm86/trt10.9_ort1.24/          # RTX 3090 (Ampere)
│   └── ...
└── sm80/trt10.9_ort1.24/          # A100 (Ampere)
    └── ...

Cache Key

Format: {gpu_arch}/trt{trt_version}_ort{ort_version}

Component	Example	Source
`gpu_arch`	`sm89`	`nvidia-smi --query-gpu=compute_cap` → `8.9` → `sm89`
`trt_version`	`10.9`	`tensorrt.__version__` major.minor
`ort_version`	`1.24`	`onnxruntime.__version__` major.minor

Lifecycle

Download — at container boot, GPU Worker checks HF for matching cache key. If found, downloads all engines (~10-30s vs ~180-300s compile).
Compile — if no cache on HF, ONNX Runtime compiles TRT engines from scratch on first model load.
Self-seed upload — after compilation, engines are uploaded to HF so future instances skip compilation.
Incremental upload — if engines were downloaded from HF but new models compiled locally after (e.g., YOLOv8n during warmup), only the new engines are uploaded.

manifest.json

{
  "cache_key": "sm89/trt10.9_ort1.24",
  "gpu_arch": "sm89",
  "trt_version": "10.9",
  "ort_version": "1.24",
  "created_at": "2025-03-07T12:00:00Z",
  "machine_id": "C.12345",
  "engine_files": [
    "TensorrtExecutionProvider_model_hash.engine",
    "TensorrtExecutionProvider_model_hash.profile",
    "timing.cache"
  ]
}

Manifest serves as both metadata and upload gate — its presence signals that cache was downloaded, and engine_files list enables incremental upload detection.

GFPGAN (optional, not used in real-time)

Face restoration and enhancement. Too slow for real-time streaming (~50-150ms per frame).

Model	Description	Size
`gfpgan/GFPGANv1.4.pth`	GFPGAN v1.4 restoration	332 MB
`gfpgan/weights/detection_Resnet50_Final.pth`	RetinaFace detector	104 MB
`gfpgan/weights/parsing_parsenet.pth`	ParseNet segmentation	81 MB

Usage

GPU Worker (production)

Models are baked into the Docker image at build time (buffalo_l + default swap + landmark + mask models). Alternative swap models (HyperSwap) are downloaded on-demand by ModelDownloadService.

TRT engine cache is downloaded asynchronously at boot via trt_cache.py (non-blocking — /health responds immediately).

# Manual download (local development)
HF_TOKEN=hf_xxx ./scripts/download_models.sh /models

Docker build

docker build --build-arg HF_TOKEN=hf_xxx -t morphstream-gpu-worker .

Python (huggingface_hub)

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="latark/MorphStream",
    filename="inswapper_128_fp16.onnx",
    token="hf_xxx"
)

Scripts

convert_scrfd_fp16.py

Converts SCRFD det_10g.onnx from FP32 to FP16:

pip install onnx onnxconverter-common
python scripts/convert_scrfd_fp16.py \
    --input buffalo_l/det_10g.onnx \
    --output buffalo_l/det_10g_fp16.onnx

Key: op_block_list=['BatchNormalization'] prevents epsilon underflow (1e-5 → 0 in FP16 → NaN).

License

MIT License

Downloads last month: 1,775

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support