YOLO11 Segmentation — EdgeFirst Model Zoo

YOLO11 Segmentation models trained on COCO 2017 (80 classes) and validated on real edge hardware through the EdgeFirst Profiler + Validator pipeline. Each row in the tables below cites the EdgeFirst Studio validation session (v-XXXX) that produced the measurement.

Part of the EdgeFirst Model Zoo.

Training experiment: View on EdgeFirst Studio — dataset, training configuration, metrics, and exported artifacts.

Architecture with C3k2 attention blocks.

Reference accuracy — ONNX FP32

Accuracy ceiling for each size, measured against COCO val2017 (5,000 images) with pycocotools. Quantized and compiled artifacts (TFLite INT8, HEF, etc.) are graded against this reference per the EdgeFirst publication rule.

Size	Params	GFLOPs	Box mAP@0.5	Box mAP@0.5-0.95	Mask mAP@0.5	Mask mAP@0.5-0.95	Source
Nano	2.6M	6.5	53.27%	38.14%	50.06%	30.75%	v-e97
Small	9.4M	21.5	61.72%	45.35%	58.20%	35.72%	v-e98
Medium	20.1M	68.0	67.11%	50.47%	63.55%	39.43%	v-e9a
Large	25.3M	87.6	—	—	—	—	—
XLarge	56.9M	195.0	—	—	—	—	—

Sizes. The EdgeFirst Model Zoo currently validates Nano, Small, and Medium. The Large and XLarge variants are not evaluated at this time — their parameter and GFLOP counts are listed above for reference, with accuracy shown as —.

Accuracy methodology & relation to Ultralytics

Every model in this zoo uses the official Ultralytics pretrained weights, byte-for-byte — there is no re-training. These are the same models Ultralytics ships, measured on the deployment-realistic path: a fixed-input ONNX graph (square letterbox, rect=False), stock pycocotools AP@[maxDets=100], and COCO crowd regions scored as normal detections. Ultralytics' headline COCO numbers use their internal validator (rectangular inference, crowd-ignored, maxDets=300), so a small, fully-explained offset on identical weights is expected — not an accuracy deficit.

The same relationship holds through the shared detection head — see any EdgeFirst detection card for the full three-way reconciliation table. The Mask mAP above is stock pycocotools segmentation AP on identical Ultralytics weights and tracks Ultralytics' official mask figures within a comparable ~1–2 pp deployment-methodology offset.

On-target validation results

Each row is one EdgeFirst Studio validation session. Click the Source link to inspect the full session — model artifact, dataset version, parameters, per-stage Perfetto trace, and the host hardware description (hostname, kernel version, SoC, NPU, profiler version).

Row conventions in the table below:

Rows whose Δ cell reads ref are the float reference runs each quantized/compiled measurement is graded against.
Rows without a number under the metric columns are validation sessions currently in progress, or a session not yet linked to its ONNX FP32 reference. The Studio Source link tracks the current status.
Rows whose Δ vs FP32 cell carries a ⚠ are below our accuracy expectations for that platform (more than 10 percentage points under the float reference). The numbers are real measurements on real hardware, reproducible from the linked Studio session, and we publish them as-is; we are investigating the results to make improvements, and the next snapshot of this card will reflect any recovered accuracy.
Rows whose metric cells read In progress indicate platforms where this model family already runs on target but accuracy work is still in progress with the silicon vendor, so we withhold the numbers until that work lands. The Studio source link tracks the session; the next snapshot of this card will publish the measured results once resolved.
Precision varies by target: the ONNX reference rows are FP32; macOS CoreML and NVIDIA Jetson TensorRT run FP16; the NXP i.MX 8M Plus, NXP i.MX 95 Neutron, and Hailo NPUs run INT8. The NXP Ara240 DNPU runs a mixed INT8/INT16 scheme — most of the model is INT8, with the box-regression path (and the ops feeding it) promoted to INT16 to improve localization accuracy.
Decoder variants. EdgeFirst ships three INT8 split-decoders. The table headlines the accuracy-recovering ones: smart (per-tensor rescaling — best accuracy, extra CPU ops add some latency) and, where a smart run is absent, logical (the latency-optimized default — no CPU overhead, slightly lower accuracy). The combined decoder is the standard-quantization baseline (equivalent to typical single-scale INT8, and how the reference numbers are produced); it loses the most accuracy — especially on segmentation, where box/mask dynamic range collapses under one scale — so it is published only as a downloadable reference artifact and in the metrics export, never headlined here. Smart and logical exist precisely to recover that loss. Full converter documentation: EdgeFirst model conversion — these are the converters used by this Model Zoo and the EdgeFirst Performance Index report.
Platform-label suffixes. (FRDM) / (Phytec) name the NXP i.MX 95 development board a session ran on. — latency / — throughput mark the two pipeline configurations the NXP i.MX 95 Neutron and NXP Ara240 targets run: the latency pipeline runs inference serially for the lowest per-frame latency; the throughput pipeline runs multiple inference workers for the highest FPS, which raises per-call inference time in exchange. Rows with neither suffix run a single pipeline.
End-to-end (ms) is the sequential per-image latency of the compute pipeline — preprocess → inference → postprocess. Image acquisition (camera or file load + JPEG decode) overlaps these stages and is excluded from this figure.
Realized FPS vs Core-throughput ceiling (FPS). Realized FPS is the measured steady-state throughput — the rate at which final results are actually delivered over the full validation pipeline. It normally exceeds 1000 / end-to-end because the runtime overlaps stages across frames, and it is the true, priority number. Core-throughput ceiling (FPS) (shown with a ~) is the accelerator's core ceiling — 1000 / device-compute-time, the rate the NPU/DNPU could sustain if it were the only bottleneck — so it is a possibly-achievable note, not a claim. It is read from the isolated device-compute stage, which (unlike the host capture/preprocess stages, whose measured time inflates when the pipeline is backpressured) is stable and load-independent. Whether a deployment approaches it depends on the surrounding pipeline, and two levers dominate: (1) host bottlenecks — these validation runs decode a JPEG per image, whereas a live camera pipeline skips that decode and can run closer to the ceiling; and (2) confidence threshold — validation runs at 0.001 to capture every detection for mAP, which makes NMS/decode heavy, while a deployment threshold of 0.25–0.75 produces far fewer candidate boxes and lighter postprocessing, raising realized FPS toward the ceiling.

Size	Platform	Box mAP@0.5	Mask mAP@0.5-0.95	Δ mask vs FP32 (pp)	Inference (ms)	End-to-end (ms)	Realized FPS	Core-throughput ceiling (FPS)	Source
Nano	ONNX FP32 (AWS Graviton · 4-core)	53.30%	30.74%	-0.01	338.67	374.17	11.4	~3	v-ed7
Nano	ONNX FP32 (AWS Graviton4 · 48-core)	53.30%	30.74%	-0.01	98.47	124.43	121.1	~10	v-eed
Nano	ONNX FP32 (AWS Graviton4 · 8-core)	53.30%	30.74%	-0.01	284.66	306.90	26.9	~4	v-ee4
Nano	ONNX FP32 (Intel Core i9-13900F · 32-core)	53.28%	30.74%	-0.01	51.98	77.65	51.2	~51	v-a4a
Nano	ONNX FP32 (Intel Xeon Platinum 8488C · 24-core)	53.28%	30.74%	-0.01	94.20	134.37	105.0	~11	v-eaf
Nano	ONNX FP32 (Intel Xeon Platinum 8488C · 4-core)	53.28%	30.74%	-0.01	216.36	260.38	33.2	~5	v-ea6
Nano	ONNX FP32 (CUDA)	53.27%	30.75%	ref	12.39	40.66	129.6	~146	v-e97
Nano	ONNX FP32 (CUDA)	53.28%	30.75%	+0.00	10.76	21.15	244.6	~244	v-a8c
Nano	ONNX FP16 (CUDA)	53.29%	30.74%	-0.01	8.42	19.60	305.4	~305	v-aa1
Nano	Apple M2 Max — CoreML Neural Engine (FP16)	52.89%	30.50%	-0.25	2.74	12.68	349.9	~351	v-9f0
Nano	Apple M2 Max — CoreML Metal GPU (FP16)	51.85%	29.94%	-0.81	6.86	18.07	290.7	~291	v-739
Nano	Apple M2 Max — CoreML CPU (FP16)	52.85%	30.51%	-0.24	20.86	29.03	90.6	~91	v-9f1
Nano	Apple iPhone 17 Pro — CoreML Neural Engine (FP16)	53.28%	30.71%	-0.04	2.90	25.48	176.1	~409	v-c07
Nano	Apple iPhone 17 Pro — CoreML Metal GPU (FP16)	53.30%	30.73%	-0.02	11.84	31.67	132.7	~149	v-c05
Nano	Apple iPhone 17 Pro — CoreML CPU (FP16)	53.25%	30.74%	-0.01	27.12	38.98	70.7	~73	v-c00
Nano	Apple iPhone 15 Pro — CoreML Neural Engine (FP16)	53.27%	30.72%	-0.03	2.98	32.96	132.9	~386	v-c0d
Nano	Apple iPhone 15 Pro — CoreML Metal GPU (FP16)	53.29%	30.74%	-0.01	19.61	56.22	80.0	~89	v-c0b
Nano	Apple iPhone 15 Pro — CoreML CPU (FP16)	53.32%	30.75%	+0.00	35.62	52.55	53.6	~56	v-c09
Nano	NXP i.MX 8M Plus + VeriSilicon NPU (FRDM)	48.41%	27.17%	-3.58	104.78	200.77	8.6	~9	v-8d1
Nano	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — latency	37.52%	21.48%	-9.27 ⚠	110.06	162.41	8.2	~9	v-c86
Nano	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — throughput	48.41%	27.17%	-3.58	105.33	214.20	8.5	~9	v-c85
Nano	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — latency	In progress	—	—	—	—	—	—	v-8b9
Nano	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — throughput	In progress	—	—	—	—	—	—	v-8be
Nano	NXP i.MX 95 + eIQ Neutron NPU (Phytec) — throughput	In progress	—	—	—	—	—	—	v-89c
Nano	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — latency	In progress	—	—	—	—	—	—	v-bd1
Nano	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — throughput	In progress	—	—	—	—	—	—	v-e0e
Nano	NXP Ara240 (FRDM) — latency	49.18%	28.03%	-2.72	9.56	40.86	57.4	~63	v-a1a
Nano	NXP Ara240 (FRDM) — throughput	49.18%	28.03%	-2.72	9.80	61.08	58.1	~229	v-a1b
Nano	Raspberry Pi 5 + Hailo-8L NPU	51.15%	29.32%	-1.43	24.26	43.75	40.4	~40	v-a44
Nano	NVIDIA Jetson Orin Nano (TensorRT FP16)	53.27%	30.77%	+0.02	6.84	58.61	79.2	~66	v-91b
Small	ONNX FP32 (AWS Graviton · 4-core)	61.74%	35.71%	-0.01	953.20	979.26	4.2	~1	v-ed8
Small	ONNX FP32 (AWS Graviton4 · 8-core)	61.74%	35.71%	-0.01	789.81	803.95	10.0	~1	v-ee8
Small	ONNX FP32 (AWS Graviton4 · 48-core)	61.74%	35.71%	-0.01	261.88	278.31	48.7	~4	v-ee1
Small	ONNX FP32 (Intel Core i9-13900F · 32-core)	61.72%	35.73%	+0.01	128.91	151.38	21.5	~21	v-a51
Small	ONNX FP32 (Intel Xeon Platinum 8488C · 24-core)	61.72%	35.72%	+0.00	232.11	268.33	53.1	~4	v-eb2
Small	ONNX FP32 (Intel Xeon Platinum 8488C · 4-core)	61.72%	35.72%	+0.00	458.07	484.61	17.0	~2	v-eae
Small	ONNX FP32 (CUDA)	61.72%	35.72%	ref	14.47	39.23	143.8	~151	v-e98
Small	ONNX FP32 (CUDA)	61.72%	35.72%	+0.00	19.67	30.87	153.4	~153	v-a93
Small	ONNX FP16 (CUDA)	61.69%	35.73%	+0.01	13.04	25.38	224.4	~224	v-aa8
Small	Apple M2 Max — CoreML Neural Engine (FP16)	60.16%	34.89%	-0.83	6.81	13.26	253.7	~254	v-730
Small	Apple M2 Max — CoreML Metal GPU (FP16)	61.03%	35.29%	-0.43	21.32	29.98	134.1	~134	v-9f2
Small	Apple M2 Max — CoreML CPU (FP16)	61.03%	35.29%	-0.43	40.34	47.99	48.1	~48	v-9f3
Small	Apple iPhone 17 Pro — CoreML Neural Engine (FP16)	60.98%	35.34%	-0.38	7.78	21.29	189.4	~197	v-c62
Small	Apple iPhone 17 Pro — CoreML Metal GPU (FP16)	61.06%	35.32%	-0.40	21.66	36.28	84.3	~85	v-c70
Small	Apple iPhone 17 Pro — CoreML CPU (FP16)	61.03%	35.30%	-0.42	58.74	74.70	33.2	~33	v-c69
Small	Apple iPhone 15 Pro — CoreML Neural Engine (FP16)	60.98%	35.35%	-0.37	6.44	28.97	152.4	~188	v-c2e
Small	Apple iPhone 15 Pro — CoreML Metal GPU (FP16)	61.07%	35.33%	-0.39	63.39	76.38	30.7	~31	v-c3d
Small	Apple iPhone 15 Pro — CoreML CPU (FP16)	61.02%	35.30%	-0.42	70.42	91.59	27.6	~28	v-c36
Small	NXP i.MX 8M Plus + VeriSilicon NPU (FRDM)	42.80%	22.19%	-13.53 ⚠	198.08	285.44	4.8	~5	v-955
Small	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — latency	32.07%	16.65%	-19.07 ⚠	203.33	254.84	4.6	~5	v-c98
Small	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — throughput	42.80%	22.19%	-13.53 ⚠	198.54	298.59	4.8	~5	v-c96
Small	NXP Ara240 (FRDM) — latency	57.97%	33.61%	-2.11	15.63	42.29	53.6	~54	v-a28
Small	NXP Ara240 (FRDM) — throughput	57.97%	33.61%	-2.11	16.15	66.74	63.7	~94	v-a29
Small	Raspberry Pi 5 + Hailo-8L NPU	59.50%	34.43%	-1.29	38.47	57.35	23.8	~24	v-8ed
Small	NVIDIA Jetson Orin Nano (TensorRT FP16)	61.68%	35.73%	+0.01	15.14	64.68	88.1	~68	v-923
Medium	ONNX FP32 (AWS Graviton · 4-core)	67.13%	39.45%	+0.02	3010.99	3034.53	1.3	~0	v-ed9
Medium	ONNX FP32 (AWS Graviton4 · 8-core)	67.13%	39.45%	+0.02	2447.34	2458.34	3.3	~0	v-eea
Medium	ONNX FP32 (AWS Graviton4 · 48-core)	67.13%	39.45%	+0.02	833.72	852.30	15.3	~1	v-ee3
Medium	ONNX FP32 (Intel Core i9-13900F · 32-core)	67.10%	39.43%	+0.00	364.67	385.66	7.9	~8	v-a58
Medium	ONNX FP32 (Intel Xeon Platinum 8488C · 24-core)	67.10%	39.43%	+0.00	675.69	710.38	18.6	~1	v-eb5
Medium	ONNX FP32 (Intel Xeon Platinum 8488C · 4-core)	67.10%	39.43%	+0.00	1727.02	1753.18	4.6	~1	v-eb0
Medium	ONNX FP32 (CUDA)	67.11%	39.43%	ref	33.42	50.53	105.1	~105	v-e9a
Medium	ONNX FP32 (CUDA)	67.11%	39.42%	-0.01	53.53	63.75	64.2	~64	v-a9a
Medium	ONNX FP16 (CUDA)	67.10%	39.44%	+0.01	28.97	42.38	114.2	~114	v-aaf
Medium	Apple M2 Max — CoreML Neural Engine (FP16)	65.03%	38.29%	-1.14	24.35	31.55	79.3	~79	v-7ff
Medium	Apple M2 Max — CoreML Metal GPU (FP16)	65.99%	38.70%	-0.73	62.30	70.45	47.2	~47	v-9e1
Medium	Apple M2 Max — CoreML CPU (FP16)	65.96%	38.67%	-0.76	90.94	98.48	21.7	~22	v-9e0
Medium	Apple iPhone 17 Pro — CoreML Neural Engine (FP16)	65.98%	38.80%	-0.63	28.12	35.90	69.0	~69	v-c61
Medium	Apple iPhone 17 Pro — CoreML Metal GPU (FP16)	66.00%	38.72%	-0.71	84.74	96.05	22.6	~23	v-c6f
Medium	Apple iPhone 17 Pro — CoreML CPU (FP16)	65.98%	38.66%	-0.77	158.35	173.90	12.5	~13	v-c68
Medium	Apple iPhone 15 Pro — CoreML Neural Engine (FP16)	65.98%	38.80%	-0.63	32.92	44.19	58.5	~59	v-c2c
Medium	Apple iPhone 15 Pro — CoreML Metal GPU (FP16)	66.00%	38.71%	-0.72	199.21	216.50	9.8	~10	v-c3c
Medium	Apple iPhone 15 Pro — CoreML CPU (FP16)	65.98%	38.68%	-0.75	203.18	226.79	9.7	~10	v-c35
Medium	NXP i.MX 8M Plus + VeriSilicon NPU (FRDM)	54.71%	30.31%	-9.12 ⚠	392.18	468.93	2.5	~3	v-9ab
Medium	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — latency	40.48%	22.86%	-16.57 ⚠	401.31	452.24	2.4	~2	v-ca9
Medium	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — throughput	54.71%	30.31%	-9.12 ⚠	396.28	488.91	2.5	~3	v-ca8
Medium	NXP Ara240 (FRDM) — latency	62.87%	37.36%	-2.07	39.06	61.74	24.8	~25	v-a36
Medium	NXP Ara240 (FRDM) — throughput	62.85%	37.35%	-2.08	39.18	63.09	29.1	~29	v-a37
Medium	Raspberry Pi 5 + Hailo-8L NPU	64.44%	37.62%	-1.81	79.87	99.64	11.8	~12	v-934
Medium	NVIDIA Jetson Orin Nano (TensorRT FP16)	67.07%	39.43%	+0.00	72.33	97.67	54.8	~55	v-92c
	Apple iPhone 17 Pro — CoreML Neural Engine (FP16)	65.98%	38.80%	-0.63	33.57	42.62	58.0	~30	v-f77
	Apple iPhone 17 Pro — CoreML Metal GPU (FP16)	66.00%	38.72%	-0.71	85.00	95.86	22.6	~12	v-f78
	Apple iPhone 17 Pro — CoreML CPU (FP16)	65.98%	38.66%	-0.77	177.14	191.83	11.2	~6	v-f7a
	Apple iPhone 15 Pro — CoreML Neural Engine (FP16)	65.98%	38.80%	-0.63	39.88	53.11	48.5	~25	v-f4c
	Apple iPhone 15 Pro — CoreML Metal GPU (FP16)	66.00%	38.71%	-0.72	212.26	228.95	9.2	~5	v-f61
	Apple iPhone 15 Pro — CoreML CPU (FP16)	65.98%	38.68%	-0.75	211.81	234.39	9.3	~5	v-f06

⚠ Below expectations — under investigation. The rows marked ⚠ above measure more than 10 percentage points below the same training session's float reference: the model accuracy on that platform is below our expectations. We publish the measured numbers rather than hiding them, and we are investigating the results to make improvements — the next snapshot of this card will reflect any recovered accuracy.

Validation pipeline

These results are produced by the EdgeFirst on-target validation pipeline:

EdgeFirst Profiler runs on the target hardware, executes the full inference pipeline (image load → decode → preprocess → inference → postprocess), and emits per-image predictions in EdgeFirst Arrow/Parquet plus a Perfetto trace.
EdgeFirst Validator consumes the predictions and trace, computes pycocotools accuracy metrics and per-stage timing summaries, and publishes the results to the Studio validation session.
EdgeFirst HAL (open source) provides the hardware-accelerated preprocessing and post-decoding primitives used at both validation and deployment time, so the timings measured here reflect the same accelerated paths a production runtime would take.

Inference latency is reported as the on-accelerator inference time. End-to-end latency is the sequential per-image latency across the compute pipeline — preprocessing, inference, and postprocessing; image acquisition (file or camera load and JPEG decode) overlaps these stages and is excluded from this figure.

Two throughput figures are reported. Realized FPS is the measured steady-state rate at which final results are emitted, measured directly from the profiler's per-frame result-emission timestamps over the steady-state stream — trace-independent; the Perfetto trace's own FPS is used only as a fallback on sessions where that scalar isn't available. It is the true, priority number and generally exceeds 1000 / end-to-end because the runtime overlaps stages across frames. Core-throughput ceiling (FPS) is the accelerator's core ceiling — 1000 / device-compute-time, i.e. the throughput if the accelerator were the only bottleneck. It is taken from the isolated device-compute stage (on transfer-split runtimes the trace separates host↔device transfers from device compute), which is load-independent — unlike the host capture/preprocess service times, whose measured cost inflates under pipeline backpressure (the same 5000 JPEGs cost ~7.8 ms/frame serialized but far more under throughput backpressure), so the slowest-stage figure would understate a fast accelerator. It is a possibly-achievable ceiling, not a measured result: reaching it depends on the deployment pipeline. A validation run decodes a JPEG per image and evaluates at a 0.001 confidence threshold (to capture every detection for mAP), both of which load the host and postprocess stages; a production camera pipeline (no JPEG decode) at a deployment threshold of 0.25–0.75 (far fewer candidate boxes through NMS) moves realized throughput toward the core-throughput ceiling.

See EdgeFirst Studio for the full validation pipeline.

Downloads

Artifacts are organized by deployment target. Each model file embeds the EdgeFirst edgefirst.json metadata (training session, dataset version, calibration artifact, converter chain) so a single file is sufficient for deployment — no sidecar configuration required.

Browse and download every artifact from the repository file tree. Files are organized into per-target folders and follow the naming convention yolo11{size}-seg-{precision}[-smart]{extension}:

Target	Folder	Format
ONNX FP32	`onnx/`	`.onnx`
TFLite INT8	`tflite/`	`.tflite`
NXP i.MX 95 (eIQ Neutron)	`imx95/`	`.imx95.tflite`
NXP Ara240	`ara240/`	`.dvm`
RPi5 + Hailo-8L (13 TOPS)	`hailo/`	`.hailo8l.hef`
NVIDIA Jetson (TensorRT)	`jetson/`	`.engine`

Each file embeds its edgefirst.json metadata (training session, dataset version, calibration artifact, converter chain), so a single download is sufficient for deployment — no sidecar configuration required.

Inference example (Python)

from edgefirst.hal import Model, TensorImage

# Load the model — embedded edgefirst.json carries labels and decoder config
model = Model("yolo11n-seg-int8.tflite")

# Run inference on an image
image = TensorImage.from_file("image.jpg")
results = model.predict(image)

# Iterate detections
for det in results.detections:
    print(f"{det.label}: {det.confidence:.2f} at {det.bbox}")
# Segmentation models additionally return one per-instance binary mask per
# detection (a H×W array thresholded at 128), decoded from the prototype masks
# and mask coefficients. See the EdgeFirst HAL mask materialization / overlay
# helpers for accessing and drawing them.

EdgeFirst HAL

Traceability

Every measurement in the tables above is reachable through the EdgeFirst Studio validation framework. The v-XXXX Source link on each row resolves to a public Studio URL of the form:

https://edgefirst.studio/public/validation/v-XXXX/details?mode=charts

The link lands on the Charts view — live system traces (CPU, memory, temperature, power) and per-stage timing recorded during the validation run. The Info and Metrics tabs on the same page carry the configuration and full COCO metric breakdown.

From there, the full provenance chain is one click deeper: training session ID, dataset version, calibration artifact, converter chain (e.g. TFLite quantizer + Neutron compile), validation parameters, and the host hardware description (hostname, kernel version, SoC, NPU, profiler version). The same model file you download from this repository embeds the same chain in its edgefirst.json metadata.

Model	Task	Link
YOLOv5 Detection	Detection	EdgeFirst/yolov5-det
YOLOv8 Detection	Detection	EdgeFirst/yolov8-det
YOLOv8 Segmentation	Segmentation	EdgeFirst/yolov8-seg
YOLO11 Detection	Detection	EdgeFirst/yolo11-det
YOLO26 Detection	Detection	EdgeFirst/yolo26-det
YOLO26 Segmentation	Segmentation	EdgeFirst/yolo26-seg

Train your own with EdgeFirst Studio

Train on your own dataset with EdgeFirst Studio:

Free tier includes YOLO training with automatic INT8 quantization and edge deployment.
Upload datasets via EdgeFirst Recorder or COCO/YOLO format.
AI-assisted annotation with auto-labeling.
CameraAdaptor integration for native sensor format training.
Deploy trained models to edge devices via EdgeFirst Client.

Technical notes

Quantization pipeline

All TFLite INT8 models are produced by EdgeFirst's quantization pipeline (details):

ONNX export — standard Ultralytics export with simplify=True
TF-wrapped ONNX — box coordinates normalized to [0, 1] inside DFL decode
Split decoder — boxes, scores, and mask coefficients split into separate output tensors so each receives an independent INT8 quantization scale
Smart calibration — calibration samples selected via greedy coverage maximization; the artifact is content-addressed by parameter hash and cached in Studio for deterministic reuse
Full integer INT8 — uint8 input, int8 output, MLIR quantizer

Split decoder output format

Segmentation (e.g. yolo11n-seg):

boxes — (1, 4, 8400) normalized [0, 1] coordinates
scores — (1, 80, 8400) per-class probabilities
mask_coefs — (1, 32, 8400) per-anchor mask coefficients
protos — (1, 160, 160, 32) prototype masks

Each tensor has its own quantization scale and zero point. The EdgeFirst HAL handles dequantization and reassembly automatically; no application code change is required across NPU targets.

Embedded metadata

TFLite: edgefirst.json and labels.txt embedded in the ZIP-format model file
ONNX: edgefirst.json embedded in model.metadata_props

No sidecar files required; the model artifact is self-contained.

Limitations

COCO bias — models trained on COCO (80 classes) inherit the dataset's biases (Western-centric scenes, particular object distributions, limited weather/lighting diversity).
Quantization loss — integer quantization introduces accuracy loss relative to FP32: INT8 on the NXP i.MX 8M Plus / i.MX 95 Neutron and Hailo NPUs, and a mixed INT8/INT16 scheme on the NXP Ara240 (the box-regression path is promoted to INT16 for localization accuracy). The magnitude per platform is shown in the Δ vs FP32 column above.
Configurations under active investigation — a subset of INT8 results measure below expectations and are marked ⚠ above; these are tracked for resolution, not accepted as final. The main cases are YOLO11 / YOLO26 on the NXP i.MX 8M Plus VeriSilicon NPU (the most constrained accelerator, where the newer architectures quantize poorly) and some NXP Ara240 segmentation runs. YOLO11 / YOLO26 on the NXP i.MX 95 eIQ Neutron NPU are not yet supported (a delegate limitation) and render without numbers. Each next card snapshot reflects any recovered accuracy.
Input resolution — all models expect 640×640 input; other resolutions require letterboxing.

License

Model weights in this repository are derived from Ultralytics YOLO and remain © Ultralytics Inc., licensed AGPL-3.0 — use requires AGPL-3.0 compliance or an Ultralytics Enterprise License.

The validation results, this model card, and its metadata are Au-Zone Technologies' own contribution, licensed CC BY-NC 4.0 (Attribution — NonCommercial) — see the repository LICENSE for the full text and citation requirements.

Citation

@software{edgefirst_yolo11_seg,
  title = { {YOLO11 Segmentation — EdgeFirst Model Zoo} },
  author = {Au-Zone Technologies},
  url = {https://huggingface.co/EdgeFirst/yolo11-seg},
  year = {2026},
  license = {CC-BY-NC-4.0},
}

_{EdgeFirst Studio · GitHub · Docs · Au-Zone Technologies

Model weights © Ultralytics Inc. (AGPL-3.0) · Validation results & card © 2026 Au-Zone Technologies (CC BY-NC 4.0)

NXP^®, i.MX, eIQ^®, Neutron, and Ara240 are trademarks or products of NXP Semiconductors. Hailo is a trademark of Hailo Technologies Ltd. Jetson is a trademark of NVIDIA Corporation. All other trademarks are the property of their respective owners.}

Downloads last month: 89

Evaluation results

Box mAP@0.5 (Nano ONNX FP32) on COCO val2017
self-reported

53.270
Mask mAP@0.5-0.95 (Nano ONNX FP32) on COCO val2017
self-reported

30.750
Box mAP@0.5 (Small ONNX FP32) on COCO val2017
self-reported

61.720
Mask mAP@0.5-0.95 (Small ONNX FP32) on COCO val2017
self-reported

35.720
Box mAP@0.5 (Medium ONNX FP32) on COCO val2017
self-reported

67.110
Mask mAP@0.5-0.95 (Medium ONNX FP32) on COCO val2017
self-reported

39.430

EdgeFirst
/

yolo11-seg

YOLO11 Segmentation — EdgeFirst Model Zoo

Reference accuracy — ONNX FP32

Accuracy methodology & relation to Ultralytics

On-target validation results

Validation pipeline

Downloads

Inference example (Python)

EdgeFirst HAL

Traceability

See also

Train your own with EdgeFirst Studio

Technical notes

Quantization pipeline

Split decoder output format

Embedded metadata

Limitations

License

Citation

Evaluation results