MorphStream Models
Models and TensorRT engine cache for real-time face processing used by MorphStream GPU Worker.
Private repository β requires access token for downloads.
Structure
/
βββ inswapper_128.onnx # Standard face swap (529MB)
βββ inswapper_128_fp16.onnx # FP16 optimized - default (265MB)
βββ hyperswap_1a_256.onnx # HyperSwap variant A (384MB)
βββ hyperswap_1b_256.onnx # HyperSwap variant B (384MB)
βββ hyperswap_1c_256.onnx # HyperSwap variant C (384MB)
βββ yolov8n.onnx # Person detection (12MB)
βββ dfl_xseg.onnx # XSeg v1 face segmentation β legacy (67MB)
βββ xseg_1.onnx # XSeg occlusion model 1 (67MB)
βββ xseg_2.onnx # XSeg occlusion model 2 (67MB)
βββ xseg_3.onnx # XSeg occlusion model 3 (67MB)
βββ 2dfan4.onnx # 68-point face landmarks (93MB)
βββ bisenet_resnet_34.onnx # BiSeNet face parsing ResNet-34 (89MB)
βββ bisenet_resnet_18.onnx # BiSeNet face parsing ResNet-18 (51MB)
βββ buffalo_l/ # Direct ONNX face analysis models
β βββ det_10g.onnx # SCRFD face detection FP32 (16MB)
β βββ det_10g_fp16.onnx # SCRFD face detection FP16 (8.1MB)
β βββ w600k_r50.onnx # ArcFace recognition embeddings (166MB)
β βββ 1k3d68.onnx # 3D landmarks, 68 points (137MB)
β βββ 2d106det.onnx # 2D landmarks, 106 points (4.8MB)
β βββ genderage.onnx # Gender/age estimation (1.3MB)
βββ gfpgan/ # Face enhancement (not used in real-time)
β βββ GFPGANv1.4.pth
β βββ weights/
β βββ detection_Resnet50_Final.pth
β βββ parsing_parsenet.pth
βββ trt_cache/ # Pre-compiled TensorRT engines
β βββ sm89/trt10.9_ort1.24/ # RTX 4090
β βββ sm86/trt10.9_ort1.24/ # RTX 3090
β βββ ... # Other GPU arch + version combos
βββ scripts/
βββ convert_scrfd_fp16.py # FP32 β FP16 conversion utility
Face Swap Models
| Model | Description | Size | Input | Format |
|---|---|---|---|---|
inswapper_128.onnx |
Standard quality | 529 MB | 128px | ONNX FP32 |
inswapper_128_fp16.onnx |
FP16 optimized (default) | 265 MB | 128px | ONNX FP16 |
hyperswap_1a_256.onnx |
High quality β variant A | 384 MB | 256px | ONNX FP32 |
hyperswap_1b_256.onnx |
High quality β variant B | 384 MB | 256px | ONNX FP32 |
hyperswap_1c_256.onnx |
High quality β variant C | 384 MB | 256px | ONNX FP32 |
Face Analysis (buffalo_l)
Models originally from InsightFace buffalo_l pack. GPU Worker loads them directly via ONNX Runtime (DirectSCRFD, DirectArcFace, DirectLandmark106) without the InsightFace Python library.
| Model | GPU Worker Class | Description | Size |
|---|---|---|---|
det_10g.onnx |
DirectSCRFD |
SCRFD face detection (FP32) | 16 MB |
det_10g_fp16.onnx |
DirectSCRFD |
SCRFD face detection (FP16, ~2x faster on Tensor Cores) | 8.1 MB |
w600k_r50.onnx |
DirectArcFace |
ArcFace R50 face recognition embeddings | 166 MB |
2d106det.onnx |
DirectLandmark106 |
2D face landmarks (106 points), CLAHE + face angle rotation. Used in face detection pipeline; 106-pt landmarks serve as fallback for masking when 68-pt unavailable | 4.8 MB |
1k3d68.onnx |
β | 3D face landmarks (68 points) β not used at runtime | 137 MB |
genderage.onnx |
β | Gender and age estimation β not used at runtime | 1.3 MB |
Face Landmarks
| Model | Description | Size | Input |
|---|---|---|---|
2dfan4.onnx |
2DFAN4 β 68-point face landmarks | 93 MB | 256px |
FaceFusion-style 5/68 refinement: SCRFD detects face + coarse 5 kps, then 2DFAN4 produces 68 precise landmarks, converted to 5 alignment points (eye centers from 6 points each, exact nose tip, exact mouth corners). Improves face alignment quality for swap models.
Primary landmark model for face masking: 68-pt landmarks from 2DFAN4 are the preferred source for custom_paste_back compositing (hull, cutouts, mouth blend). 106-pt landmarks from 2d106det.onnx serve as fallback. Dual-landmark support: has_valid_68 preferred, has_valid_106 fallback, use_68 flag propagated through all mask functions. Landmarks are temporally smoothed via One Euro Filter in LandmarkSmoother (attribute face.landmark_2d_68).
Source: FaceFusion assets.
Person Detection
| Model | Description | Size | Input |
|---|---|---|---|
yolov8n.onnx |
YOLOv8n β person detection (COCO class 0) | 12 MB | 640px |
Used to distinguish "person left frame" vs "face occluded" during face swap.
Face Mask Models (FaceFusion 4-Mask System)
Occlusion detection (XSeg) and semantic face parsing (BiSeNet) models for composable mask pipeline.
Used in GPU Worker's face_masker.py for box/occlusion/area/region masks.
Source: FaceFusion 3.x assets (Apache-2.0), mirrored here for reliability.
XSeg β Occlusion Detection
| Model | Description | Size | Input | Output |
|---|---|---|---|---|
dfl_xseg.onnx |
XSeg v1 β legacy binary face mask (not used) | 67 MB | 256px | binary (face/bg) |
xseg_1.onnx |
XSeg model 1 β occlusion detection | 67 MB | 256px | binary (face/bg) |
xseg_2.onnx |
XSeg model 2 β occlusion detection | 67 MB | 256px | binary (face/bg) |
xseg_3.onnx |
XSeg model 3 β occlusion detection | 67 MB | 256px | binary (face/bg) |
Runtime model selection via IPC: many (all 3 intersected), xseg_1, xseg_2, xseg_3.
Input: NHWC float32 [0,1]. Output: intersection of all selected model masks (most conservative).
BiSeNet β Region Segmentation
| Model | Description | Size | Input | Classes |
|---|---|---|---|---|
bisenet_resnet_34.onnx |
BiSeNet ResNet-34 (default) | 89 MB | 512px | 19 regions |
bisenet_resnet_18.onnx |
BiSeNet ResNet-18 (lighter) | 51 MB | 512px | 19 regions |
Runtime model selection via IPC. Input: NCHW float32 ImageNet-normalized. 10 configurable face regions: skin, left-eyebrow, right-eyebrow, left-eye, right-eye, glasses, upper-lip, nose, lower-lip, mouth.
TensorRT Engine Cache
Pre-compiled TensorRT engines stored in trt_cache/ subfolder, keyed by GPU architecture and software versions. Eliminates cold-start TRT compilation (~180-300s) on new GPU instances.
Layout
trt_cache/
βββ sm89/trt10.9_ort1.24/ # RTX 4090 (Ada Lovelace)
β βββ manifest.json # Metadata: cache_key, engine list, timestamps
β βββ TensorrtExecutionProvider_*.engine # Compiled TRT engines
β βββ TensorrtExecutionProvider_*.profile # Profiling data
β βββ timing.cache # cuDNN/TRT timing optimization cache
βββ sm86/trt10.9_ort1.24/ # RTX 3090 (Ampere)
β βββ ...
βββ sm80/trt10.9_ort1.24/ # A100 (Ampere)
βββ ...
Cache Key
Format: {gpu_arch}/trt{trt_version}_ort{ort_version}
| Component | Example | Source |
|---|---|---|
gpu_arch |
sm89 |
nvidia-smi --query-gpu=compute_cap β 8.9 β sm89 |
trt_version |
10.9 |
tensorrt.__version__ major.minor |
ort_version |
1.24 |
onnxruntime.__version__ major.minor |
Lifecycle
- Download β at container boot, GPU Worker checks HF for matching cache key. If found, downloads all engines (~10-30s vs ~180-300s compile).
- Compile β if no cache on HF, ONNX Runtime compiles TRT engines from scratch on first model load.
- Self-seed upload β after compilation, engines are uploaded to HF so future instances skip compilation.
- Incremental upload β if engines were downloaded from HF but new models compiled locally after (e.g., YOLOv8n during warmup), only the new engines are uploaded.
manifest.json
{
"cache_key": "sm89/trt10.9_ort1.24",
"gpu_arch": "sm89",
"trt_version": "10.9",
"ort_version": "1.24",
"created_at": "2025-03-07T12:00:00Z",
"machine_id": "C.12345",
"engine_files": [
"TensorrtExecutionProvider_model_hash.engine",
"TensorrtExecutionProvider_model_hash.profile",
"timing.cache"
]
}
Manifest serves as both metadata and upload gate β its presence signals that cache was downloaded, and engine_files list enables incremental upload detection.
GFPGAN (optional, not used in real-time)
Face restoration and enhancement. Too slow for real-time streaming (~50-150ms per frame).
| Model | Description | Size |
|---|---|---|
gfpgan/GFPGANv1.4.pth |
GFPGAN v1.4 restoration | 332 MB |
gfpgan/weights/detection_Resnet50_Final.pth |
RetinaFace detector | 104 MB |
gfpgan/weights/parsing_parsenet.pth |
ParseNet segmentation | 81 MB |
Usage
GPU Worker (production)
Models are baked into the Docker image at build time (buffalo_l + default swap + landmark + mask models). Alternative swap models (HyperSwap) are downloaded on-demand by ModelDownloadService.
TRT engine cache is downloaded asynchronously at boot via trt_cache.py (non-blocking β /health responds immediately).
# Manual download (local development)
HF_TOKEN=hf_xxx ./scripts/download_models.sh /models
Docker build
docker build --build-arg HF_TOKEN=hf_xxx -t morphstream-gpu-worker .
Python (huggingface_hub)
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="latark/MorphStream",
filename="inswapper_128_fp16.onnx",
token="hf_xxx"
)
Scripts
convert_scrfd_fp16.py
Converts SCRFD det_10g.onnx from FP32 to FP16:
pip install onnx onnxconverter-common
python scripts/convert_scrfd_fp16.py \
--input buffalo_l/det_10g.onnx \
--output buffalo_l/det_10g_fp16.onnx
Key: op_block_list=['BatchNormalization'] prevents epsilon underflow (1e-5 β 0 in FP16 β NaN).
License
MIT License
- Downloads last month
- 1,775