YOLOv5 Detection — EdgeFirst Model Zoo

YOLOv5 Detection models trained on COCO 2017 (80 classes) and validated on real edge hardware through the EdgeFirst Profiler + Validator pipeline. Each row in the tables below cites the EdgeFirst Studio validation session (v-XXXX) that produced the measurement.

Part of the EdgeFirst Model Zoo.

Training experiment: View on EdgeFirst Studio — dataset, training configuration, metrics, and exported artifacts.

Legacy architecture, wide deployment base.

Reference accuracy — ONNX FP32

Accuracy ceiling for each size, measured against COCO val2017 (5,000 images) with pycocotools. Quantized and compiled artifacts (TFLite INT8, HEF, etc.) are graded against this reference per the EdgeFirst publication rule.

Size	Params	GFLOPs	mAP@0.5	mAP@0.5-0.95	mAP@0.75	Source
Nano	1.9M	4.5	47.75%	32.95%	35.62%	v-e61
Small	7.2M	16.5	57.33%	41.26%	44.95%	v-e74
Medium	21.2M	49.0	63.29%	46.99%	51.33%	v-e88
Large	46.5M	109.1	—	—	—	—
XLarge	86.7M	205.7	—	—	—	—

Sizes. The EdgeFirst Model Zoo currently validates Nano, Small, and Medium. The Large and XLarge variants are not evaluated at this time — their parameter and GFLOP counts are listed above for reference, with accuracy shown as —.

Accuracy methodology & relation to Ultralytics

Every model in this zoo uses the official Ultralytics pretrained weights, byte-for-byte — there is no re-training. These are the same models Ultralytics ships, measured on the deployment-realistic path: a fixed-input ONNX graph (square letterbox, rect=False), stock pycocotools AP@[maxDets=100], and COCO crowd regions scored as normal detections. Ultralytics' headline COCO numbers use their internal validator (rectangular inference, crowd-ignored, maxDets=300), so a small, fully-explained offset on identical weights is expected — not an accuracy deficit.

Reconciling nano detection (COCO val2017, mAP@0.5:0.95, FP32):

Source	Nano mAP@0.5:0.95	What it measures
EdgeFirst (this zoo)	32.95%	Full deployment path — fixed-input ONNX + `pycocotools`
Ultralytics-validator proxy	33.71%	Portable re-implementation of the Ultralytics validator
Ultralytics (official)	34.3%	Ultralytics' published COCO figure

The ~1 pp spread decomposes into a ~0.6–0.7 pp methodology leg (square-letterbox / crowd / maxDets) and a ~0.5–0.9 pp deployment-decode leg. The methodology leg is measured, not assumed: a rect=True + crowd-ignored parity pass reproduces the official figure (e.g. YOLOv5n → 34.4 vs 34.3 official). The decode leg shrinks toward zero for the NMS-free YOLO26, which already matches its official number. None of the offset reflects a weight or training difference.

On-target validation results

Each row is one EdgeFirst Studio validation session. Click the Source link to inspect the full session — model artifact, dataset version, parameters, per-stage Perfetto trace, and the host hardware description (hostname, kernel version, SoC, NPU, profiler version).

Row conventions in the table below:

Rows whose Δ cell reads ref are the float reference runs each quantized/compiled measurement is graded against.
Rows without a number under the metric columns are validation sessions currently in progress, or a session not yet linked to its ONNX FP32 reference. The Studio Source link tracks the current status.
Rows whose Δ vs FP32 cell carries a ⚠ are below our accuracy expectations for that platform (more than 10 percentage points under the float reference). The numbers are real measurements on real hardware, reproducible from the linked Studio session, and we publish them as-is; we are investigating the results to make improvements, and the next snapshot of this card will reflect any recovered accuracy.
Precision varies by target: the ONNX reference rows are FP32; macOS CoreML and NVIDIA Jetson TensorRT run FP16; the NXP i.MX 8M Plus, NXP i.MX 95 Neutron, and Hailo NPUs run INT8. The NXP Ara240 DNPU runs a mixed INT8/INT16 scheme — most of the model is INT8, with the box-regression path (and the ops feeding it) promoted to INT16 to improve localization accuracy.
Decoder variants. EdgeFirst ships three INT8 split-decoders. The table headlines the accuracy-recovering ones: smart (per-tensor rescaling — best accuracy, extra CPU ops add some latency) and, where a smart run is absent, logical (the latency-optimized default — no CPU overhead, slightly lower accuracy). The combined decoder is the standard-quantization baseline (equivalent to typical single-scale INT8, and how the reference numbers are produced); it loses the most accuracy — especially on segmentation, where box/mask dynamic range collapses under one scale — so it is published only as a downloadable reference artifact and in the metrics export, never headlined here. Smart and logical exist precisely to recover that loss. Full converter documentation: EdgeFirst model conversion — these are the converters used by this Model Zoo and the EdgeFirst Performance Index report.
Platform-label suffixes. (FRDM) / (Phytec) name the NXP i.MX 95 development board a session ran on. — latency / — throughput mark the two pipeline configurations the NXP i.MX 95 Neutron and NXP Ara240 targets run: the latency pipeline runs inference serially for the lowest per-frame latency; the throughput pipeline runs multiple inference workers for the highest FPS, which raises per-call inference time in exchange. Rows with neither suffix run a single pipeline.
End-to-end (ms) is the sequential per-image latency of the compute pipeline — preprocess → inference → postprocess. Image acquisition (camera or file load + JPEG decode) overlaps these stages and is excluded from this figure.
Realized FPS vs Core-throughput ceiling (FPS). Realized FPS is the measured steady-state throughput — the rate at which final results are actually delivered over the full validation pipeline. It normally exceeds 1000 / end-to-end because the runtime overlaps stages across frames, and it is the true, priority number. Core-throughput ceiling (FPS) (shown with a ~) is the accelerator's core ceiling — 1000 / device-compute-time, the rate the NPU/DNPU could sustain if it were the only bottleneck — so it is a possibly-achievable note, not a claim. It is read from the isolated device-compute stage, which (unlike the host capture/preprocess stages, whose measured time inflates when the pipeline is backpressured) is stable and load-independent. Whether a deployment approaches it depends on the surrounding pipeline, and two levers dominate: (1) host bottlenecks — these validation runs decode a JPEG per image, whereas a live camera pipeline skips that decode and can run closer to the ceiling; and (2) confidence threshold — validation runs at 0.001 to capture every detection for mAP, which makes NMS/decode heavy, while a deployment threshold of 0.25–0.75 produces far fewer candidate boxes and lighter postprocessing, raising realized FPS toward the ceiling.

Size	Platform	mAP@0.5	Δ vs FP32 (pp)	mAP@0.5-0.95	Inference (ms)	End-to-end (ms)	Realized FPS	Core-throughput ceiling (FPS)	Source
Nano	ONNX FP32 (AWS Graviton · 4-core)	47.74%	-0.01	32.94%	342.52	355.18	11.5	~3	v-e53
Nano	ONNX FP32 (AWS Graviton4 · 48-core)	47.74%	-0.01	32.94%	73.32	85.89	168.7	~14	v-e59
Nano	ONNX FP32 (AWS Graviton4 · 8-core)	47.74%	-0.01	32.94%	213.36	220.46	36.9	~5	v-e56
Nano	ONNX FP32 (Intel Core i9-13900F · 32-core)	47.76%	+0.01	32.95%	34.27	45.08	80.1	~80	v-a45
Nano	ONNX FP32 (Intel Xeon Platinum 8488C · 24-core)	47.75%	+0.00	32.95%	67.99	89.22	159.4	~15	v-e5b
Nano	ONNX FP32 (Intel Xeon Platinum 8488C · 4-core)	47.76%	+0.01	32.95%	147.37	160.10	51.2	~7	v-e58
Nano	ONNX FP32 (CUDA)	47.75%	ref	32.95%	6.76	15.78	328.9	~329	v-e61
Nano	ONNX FP32 (CUDA)	47.75%	+0.00	32.95%	7.71	12.85	357.8	~358	v-a87
Nano	ONNX FP16 (CUDA)	47.75%	+0.00	32.93%	6.00	11.65	456.8	~457	v-a9c
Nano	ONNX FP32	47.75%	+0.00	32.95%	244.15	260.73	48.1	~4	v-e06
Nano	ONNX FP32	47.74%	-0.01	32.94%	355.69	366.50	37.1	~3	v-df5
Nano	ONNX FP32	47.75%	+0.00	32.95%	62.87	80.54	187.6	~16	v-dc1
Nano	ONNX FP32	47.74%	-0.01	32.94%	23.34	33.78	124.3	~43	v-d9e
Nano	ONNX FP32	47.76%	+0.01	32.95%	36.62	54.17	98.5	~28	v-cbc
Nano	ONNX FP32	47.74%	-0.01	32.94%	36.87	44.18	26.8	~27	v-cbb
Nano	ONNX FP32	47.74%	-0.01	32.94%	28.26	38.71	137.1	~36	v-cba
Nano	ONNX FP32	47.76%	+0.01	32.95%	35.42	44.28	26.9	~28	v-cb8
Nano	ONNX FP32	47.75%	+0.00	32.95%	6.61	15.57	340.2	~341	v-ca3
Nano	ONNX FP32	47.74%	-0.01	32.94%	73.80	81.48	13.4	~14	v-c7e
Nano	Apple M2 Max — CoreML Neural Engine (FP16)	47.12%	-0.63	32.42%	1.62	5.39	862.2	~864	v-baf
Nano	Apple M2 Max — CoreML Metal GPU (FP16)	47.16%	-0.59	32.45%	5.53	9.41	477.4	~477	v-9cb
Nano	Apple M2 Max — CoreML CPU (FP16)	47.14%	-0.61	32.43%	15.43	19.38	121.3	~121	v-9cc
Nano	Apple iPhone 17 Pro — CoreML Neural Engine (FP16)	47.13%	-0.62	32.42%	1.81	6.52	807.4	~839	v-c49
Nano	Apple iPhone 17 Pro — CoreML Metal GPU (FP16)	47.16%	-0.59	32.45%	4.63	9.07	385.3	~393	v-c5d
Nano	Apple iPhone 17 Pro — CoreML CPU (FP16)	47.14%	-0.61	32.42%	15.70	22.16	120.4	~121	v-c53
Nano	Apple iPhone 15 Pro — CoreML Neural Engine (FP16)	47.14%	-0.61	32.42%	1.81	10.19	601.8	~673	v-c14
Nano	Apple iPhone 15 Pro — CoreML Metal GPU (FP16)	47.16%	-0.59	32.45%	12.47	16.72	151.3	~152	v-c28
Nano	Apple iPhone 15 Pro — CoreML CPU (FP16)	47.14%	-0.61	32.42%	17.16	25.34	108.7	~110	v-c1e
Nano	NXP i.MX 8M Plus + VeriSilicon NPU (FRDM)	46.53%	-1.22	31.89%	57.78	110.65	14.6	~17	v-8bf
Nano	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — latency	44.51%	-3.24	29.47%	64.14	106.94	13.4	~15	v-c79
Nano	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — throughput	46.53%	-1.22	31.89%	58.37	132.24	14.4	~17	v-c76
Nano	NXP i.MX 95 + eIQ Neutron NPU — latency	46.46%	-1.29	31.57%	12.76	38.67	61.2	~63	v-e30
Nano	NXP i.MX 95 + eIQ Neutron NPU — throughput	46.46%	-1.29	31.58%	32.56	67.58	85.3	~50	v-e31
Nano	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — latency	45.86%	-1.89	31.14%	12.95	43.75	55.2	~56	v-8ad
Nano	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — throughput	45.86%	-1.89	31.14%	20.97	59.88	75.1	~45	v-8ae
Nano	NXP i.MX 95 + eIQ Neutron NPU (Phytec) — throughput	45.86%	-1.89	31.14%	25.10	58.30	84.0	~49	v-897
Nano	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — latency	46.46%	-1.29	31.57%	13.08	40.94	58.0	~60	v-e20
Nano	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — throughput	46.46%	-1.29	31.58%	22.65	62.67	78.9	~47	v-e09
Nano	NXP Ara240 (FRDM) — latency	45.00%	-2.75	30.12%	8.28	20.52	104.1	~106	v-a10
Nano	NXP Ara240 (FRDM) — throughput	44.99%	-2.76	30.11%	8.66	31.50	193.5	~194	v-a11
Nano	Raspberry Pi 5 + Hailo-8L NPU	45.97%	-1.78	31.57%	14.67	26.44	66.6	~67	v-8de
Nano	NVIDIA Jetson Orin Nano (TensorRT FP16)	47.77%	+0.02	32.93%	8.54	23.99	271.2	~271	v-919
Small	ONNX FP32 (AWS Graviton · 4-core)	57.29%	-0.04	41.22%	688.02	699.25	5.8	~1	v-e54
Small	ONNX FP32 (AWS Graviton4 · 8-core)	57.29%	-0.04	41.23%	563.94	570.32	14.1	~2	v-e71
Small	ONNX FP32 (AWS Graviton4 · 48-core)	57.29%	-0.04	41.23%	187.11	198.04	68.0	~5	v-e64
Small	ONNX FP32 (Intel Core i9-13900F · 32-core)	57.32%	-0.01	41.26%	82.06	93.67	33.6	~34	v-a4c
Small	ONNX FP32 (Intel Xeon Platinum 8488C · 24-core)	57.33%	+0.00	41.26%	153.74	174.85	80.1	~7	v-e65
Small	ONNX FP32 (Intel Xeon Platinum 8488C · 4-core)	57.33%	+0.00	41.26%	302.69	312.31	25.9	~3	v-e79
Small	ONNX FP32 (CUDA)	57.33%	ref	41.26%	9.54	18.54	321.5	~322	v-e74
Small	ONNX FP32 (CUDA)	57.33%	+0.00	41.26%	13.69	19.29	224.0	~224	v-a8e
Small	ONNX FP16 (CUDA)	57.33%	+0.00	41.22%	9.73	15.36	319.3	~319	v-aa3
Small	ONNX FP32	57.33%	+0.00	41.26%	62.68	71.88	15.4	~16	v-ccf
Small	ONNX FP32	57.29%	-0.04	41.23%	90.00	97.06	11.1	~11	v-ccd
Small	ONNX FP32	57.29%	-0.04	41.23%	64.68	74.47	61.4	~15	v-cca
Small	ONNX FP32	57.33%	+0.00	41.26%	60.60	78.55	62.3	~17	v-cc8
Small	Apple M2 Max — CoreML Neural Engine (FP16)	56.72%	-0.61	40.68%	4.31	7.85	394.1	~404	v-9ea
Small	Apple M2 Max — CoreML Metal GPU (FP16)	56.75%	-0.58	40.76%	13.81	17.34	205.1	~205	v-9eb
Small	Apple M2 Max — CoreML CPU (FP16)	56.72%	-0.61	40.71%	29.96	34.04	64.2	~64	v-9ec
Small	Apple iPhone 17 Pro — CoreML Neural Engine (FP16)	56.72%	-0.61	40.70%	4.75	8.40	366.9	~372	v-c4a
Small	Apple iPhone 17 Pro — CoreML Metal GPU (FP16)	56.75%	-0.58	40.76%	12.18	15.35	157.2	~158	v-c5e
Small	Apple iPhone 17 Pro — CoreML CPU (FP16)	56.72%	-0.61	40.71%	35.75	41.89	54.4	~55	v-c54
Small	Apple iPhone 15 Pro — CoreML Neural Engine (FP16)	56.72%	-0.61	40.69%	5.35	10.17	313.2	~318	v-c15
Small	Apple iPhone 15 Pro — CoreML Metal GPU (FP16)	56.73%	-0.60	40.74%	32.94	40.39	57.9	~58	v-c29
Small	Apple iPhone 15 Pro — CoreML CPU (FP16)	56.72%	-0.61	40.71%	42.20	51.12	45.8	~46	v-c1f
Small	NXP i.MX 8M Plus + VeriSilicon NPU (FRDM)	56.87%	-0.46	40.69%	109.41	161.03	8.3	~9	v-939
Small	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — latency	54.21%	-3.12	37.02%	116.35	158.76	7.9	~9	v-c8b
Small	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — throughput	56.87%	-0.46	40.69%	110.65	181.56	8.2	~9	v-c8a
Small	NXP i.MX 95 + eIQ Neutron NPU — latency	57.43%	+0.10	40.93%	33.20	63.59	28.8	~29	v-e38
Small	NXP i.MX 95 + eIQ Neutron NPU — throughput	57.42%	+0.09	40.92%	247.74	276.95	31.9	~32	v-e39
Small	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — latency	56.67%	-0.66	40.36%	33.73	67.63	27.3	~27	v-8fa
Small	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — throughput	56.67%	-0.66	40.36%	124.58	159.61	31.2	~31	v-8fb
Small	NXP i.MX 95 + eIQ Neutron NPU (Phytec) — throughput	56.67%	-0.66	40.36%	121.68	154.45	32.0	~32	v-8f0
Small	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — latency	57.43%	+0.10	40.92%	33.21	65.39	28.8	~29	v-e21
Small	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — throughput	57.42%	+0.09	40.92%	250.01	281.20	31.6	~32	v-e10
Small	NXP Ara240 (FRDM) — latency	54.56%	-2.77	37.69%	13.47	25.54	68.1	~68	v-a1e
Small	NXP Ara240 (FRDM) — throughput	54.54%	-2.79	37.67%	13.50	26.58	96.3	~96	v-a1f
Small	Raspberry Pi 5 + Hailo-8L NPU	56.13%	-1.20	40.02%	29.36	43.90	32.1	~32	v-8eb
Small	NVIDIA Jetson Orin Nano (TensorRT FP16)	57.32%	-0.01	41.24%	22.04	34.77	174.6	~174	v-921
Medium	ONNX FP32 (AWS Graviton · 4-core)	63.30%	+0.01	47.00%	1697.56	1706.65	2.4	~1	v-e55
Medium	ONNX FP32 (AWS Graviton4 · 8-core)	63.30%	+0.01	47.00%	1416.24	1422.32	5.6	~1	v-e73
Medium	ONNX FP32 (AWS Graviton4 · 48-core)	63.30%	+0.01	47.00%	459.77	470.92	27.7	~2	v-e6b
Medium	ONNX FP32 (Intel Core i9-13900F · 32-core)	63.29%	+0.00	46.99%	200.15	211.96	14.1	~14	v-a53
Medium	ONNX FP32 (Intel Xeon Platinum 8488C · 24-core)	63.29%	+0.00	46.99%	364.00	385.39	34.1	~3	v-e6a
Medium	ONNX FP32 (Intel Xeon Platinum 8488C · 4-core)	63.29%	+0.00	46.99%	933.88	944.85	8.5	~1	v-e7a
Medium	ONNX FP32 (CUDA)	63.29%	ref	46.99%	21.19	29.12	176.2	~176	v-e88
Medium	ONNX FP32 (CUDA)	63.29%	+0.00	46.99%	33.81	40.48	97.0	~97	v-a95
Medium	ONNX FP16 (CUDA)	63.27%	-0.02	46.94%	20.03	26.34	166.1	~166	v-aaa
Medium	ONNX FP32	63.30%	+0.01	47.00%	216.23	223.31	4.6	~5	v-cd1
Medium	ONNX FP32	63.30%	+0.01	47.00%	152.85	162.79	26.1	~7	v-cce
Medium	ONNX FP32	63.29%	+0.00	46.99%	136.29	145.37	7.2	~7	v-cc7
Medium	ONNX FP32	63.29%	+0.00	46.99%	136.86	155.30	28.3	~7	v-cc6
Medium	Apple M2 Max — CoreML Neural Engine (FP16)	62.78%	-0.51	46.53%	11.53	15.16	162.2	~164	v-75b
Medium	Apple M2 Max — CoreML Metal GPU (FP16)	62.80%	-0.49	46.52%	33.27	37.50	87.8	~88	v-765
Medium	Apple M2 Max — CoreML CPU (FP16)	62.75%	-0.54	46.46%	56.58	60.68	34.6	~35	v-9dc
Medium	Apple iPhone 17 Pro — CoreML Neural Engine (FP16)	62.78%	-0.51	46.53%	11.72	15.15	161.2	~162	v-c48
Medium	Apple iPhone 17 Pro — CoreML Metal GPU (FP16)	62.80%	-0.49	46.51%	29.45	37.56	65.1	~66	v-c5c
Medium	Apple iPhone 17 Pro — CoreML CPU (FP16)	62.75%	-0.54	46.45%	91.73	98.82	21.5	~22	v-c52
Medium	Apple iPhone 15 Pro — CoreML Neural Engine (FP16)	62.78%	-0.51	46.54%	14.38	19.03	130.2	~131	v-c13
Medium	Apple iPhone 15 Pro — CoreML Metal GPU (FP16)	62.80%	-0.49	46.51%	102.48	117.41	18.9	~19	v-c27
Medium	Apple iPhone 15 Pro — CoreML CPU (FP16)	62.75%	-0.54	46.46%	104.76	113.87	18.8	~19	v-c1d
Medium	NXP i.MX 8M Plus + VeriSilicon NPU (FRDM)	62.79%	-0.50	46.34%	206.67	258.59	4.6	~5	v-97b
Medium	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — latency	59.62%	-3.67	41.74%	215.14	257.32	4.4	~5	v-c9d
Medium	NXP i.MX 8M Plus + VeriSilicon NPU (Verdin) — throughput	62.79%	-0.50	46.34%	208.98	279.94	4.5	~5	v-c9c
Medium	NXP i.MX 95 + eIQ Neutron NPU — latency	63.42%	+0.13	46.86%	77.85	116.52	12.6	~13	v-e3d
Medium	NXP i.MX 95 + eIQ Neutron NPU — throughput	63.42%	+0.13	46.86%	604.29	640.13	13.2	~13	v-e3e
Medium	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — latency	62.77%	-0.52	46.30%	78.06	110.95	12.3	~12	v-96b
Medium	NXP i.MX 95 + eIQ Neutron NPU (FRDM) — throughput	62.77%	-0.52	46.30%	301.76	335.74	13.1	~13	v-96c
Medium	NXP i.MX 95 + eIQ Neutron NPU (Phytec) — throughput	62.77%	-0.52	46.30%	297.44	329.30	13.2	~13	v-94b
Medium	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — latency	63.42%	+0.13	46.86%	77.47	120.05	12.7	~13	v-e22
Medium	NXP i.MX 95 + eIQ Neutron NPU (Verdin) — throughput	63.42%	+0.13	46.87%	604.31	647.61	13.2	~13	v-e17
Medium	NXP Ara240 (FRDM) — latency	60.72%	-2.57	43.55%	24.38	36.41	39.0	~39	v-a2c
Medium	NXP Ara240 (FRDM) — throughput	60.70%	-2.59	43.54%	24.36	37.19	47.0	~47	v-a2d
Medium	Raspberry Pi 5 + Hailo-8L NPU	62.40%	-0.89	46.17%	57.38	72.71	16.2	~16	v-906
Medium	NVIDIA Jetson Orin Nano (TensorRT FP16)	63.31%	+0.02	46.96%	44.49	56.80	88.2	~88	v-92a

Validation pipeline

These results are produced by the EdgeFirst on-target validation pipeline:

EdgeFirst Profiler runs on the target hardware, executes the full inference pipeline (image load → decode → preprocess → inference → postprocess), and emits per-image predictions in EdgeFirst Arrow/Parquet plus a Perfetto trace.
EdgeFirst Validator consumes the predictions and trace, computes pycocotools accuracy metrics and per-stage timing summaries, and publishes the results to the Studio validation session.
EdgeFirst HAL (open source) provides the hardware-accelerated preprocessing and post-decoding primitives used at both validation and deployment time, so the timings measured here reflect the same accelerated paths a production runtime would take.

Inference latency is reported as the on-accelerator inference time. End-to-end latency is the sequential per-image latency across the compute pipeline — preprocessing, inference, and postprocessing; image acquisition (file or camera load and JPEG decode) overlaps these stages and is excluded from this figure.

Two throughput figures are reported. Realized FPS is the measured steady-state rate at which final results are emitted, measured directly from the profiler's per-frame result-emission timestamps over the steady-state stream — trace-independent; the Perfetto trace's own FPS is used only as a fallback on sessions where that scalar isn't available. It is the true, priority number and generally exceeds 1000 / end-to-end because the runtime overlaps stages across frames. Core-throughput ceiling (FPS) is the accelerator's core ceiling — 1000 / device-compute-time, i.e. the throughput if the accelerator were the only bottleneck. It is taken from the isolated device-compute stage (on transfer-split runtimes the trace separates host↔device transfers from device compute), which is load-independent — unlike the host capture/preprocess service times, whose measured cost inflates under pipeline backpressure (the same 5000 JPEGs cost ~7.8 ms/frame serialized but far more under throughput backpressure), so the slowest-stage figure would understate a fast accelerator. It is a possibly-achievable ceiling, not a measured result: reaching it depends on the deployment pipeline. A validation run decodes a JPEG per image and evaluates at a 0.001 confidence threshold (to capture every detection for mAP), both of which load the host and postprocess stages; a production camera pipeline (no JPEG decode) at a deployment threshold of 0.25–0.75 (far fewer candidate boxes through NMS) moves realized throughput toward the core-throughput ceiling.

See EdgeFirst Studio for the full validation pipeline.

Downloads

Artifacts are organized by deployment target. Each model file embeds the EdgeFirst edgefirst.json metadata (training session, dataset version, calibration artifact, converter chain) so a single file is sufficient for deployment — no sidecar configuration required.

Browse and download every artifact from the repository file tree. Files are organized into per-target folders and follow the naming convention yolov5{size}-det-{precision}[-smart]{extension}:

Target	Folder	Format
ONNX FP32	`onnx/`	`.onnx`
TFLite INT8	`tflite/`	`.tflite`
NXP i.MX 95 (eIQ Neutron)	`imx95/`	`.imx95.tflite`
NXP Ara240	`ara240/`	`.dvm`
RPi5 + Hailo-8L (13 TOPS)	`hailo/`	`.hailo8l.hef`
NVIDIA Jetson (TensorRT)	`jetson/`	`.engine`

Each file embeds its edgefirst.json metadata (training session, dataset version, calibration artifact, converter chain), so a single download is sufficient for deployment — no sidecar configuration required.

Inference example (Python)

from edgefirst.hal import Model, TensorImage

# Load the model — embedded edgefirst.json carries labels and decoder config
model = Model("yolov5n-det-int8.tflite")

# Run inference on an image
image = TensorImage.from_file("image.jpg")
results = model.predict(image)

# Iterate detections
for det in results.detections:
    print(f"{det.label}: {det.confidence:.2f} at {det.bbox}")

EdgeFirst HAL

Traceability

Every measurement in the tables above is reachable through the EdgeFirst Studio validation framework. The v-XXXX Source link on each row resolves to a public Studio URL of the form:

https://edgefirst.studio/public/validation/v-XXXX/details?mode=charts

The link lands on the Charts view — live system traces (CPU, memory, temperature, power) and per-stage timing recorded during the validation run. The Info and Metrics tabs on the same page carry the configuration and full COCO metric breakdown.

From there, the full provenance chain is one click deeper: training session ID, dataset version, calibration artifact, converter chain (e.g. TFLite quantizer + Neutron compile), validation parameters, and the host hardware description (hostname, kernel version, SoC, NPU, profiler version). The same model file you download from this repository embeds the same chain in its edgefirst.json metadata.

Model	Task	Link
YOLOv8 Detection	Detection	EdgeFirst/yolov8-det
YOLOv8 Segmentation	Segmentation	EdgeFirst/yolov8-seg
YOLO11 Detection	Detection	EdgeFirst/yolo11-det
YOLO11 Segmentation	Segmentation	EdgeFirst/yolo11-seg
YOLO26 Detection	Detection	EdgeFirst/yolo26-det
YOLO26 Segmentation	Segmentation	EdgeFirst/yolo26-seg

Train your own with EdgeFirst Studio

Train on your own dataset with EdgeFirst Studio:

Free tier includes YOLO training with automatic INT8 quantization and edge deployment.
Upload datasets via EdgeFirst Recorder or COCO/YOLO format.
AI-assisted annotation with auto-labeling.
CameraAdaptor integration for native sensor format training.
Deploy trained models to edge devices via EdgeFirst Client.

Technical notes

Quantization pipeline

All TFLite INT8 models are produced by EdgeFirst's quantization pipeline (details):

ONNX export — standard Ultralytics export with simplify=True
TF-wrapped ONNX — box coordinates normalized to [0, 1] inside DFL decode
Split decoder — boxes and scores split into separate output tensors so each receives an independent INT8 quantization scale
Smart calibration — calibration samples selected via greedy coverage maximization; the artifact is content-addressed by parameter hash and cached in Studio for deterministic reuse
Full integer INT8 — uint8 input, int8 output, MLIR quantizer

Split decoder output format

Detection (e.g. yolov5n):

boxes — (1, 4, 8400) normalized [0, 1] coordinates
scores — (1, 80, 8400) per-class probabilities

Each tensor has its own quantization scale and zero point. The EdgeFirst HAL handles dequantization and reassembly automatically; no application code change is required across NPU targets.

Embedded metadata

TFLite: edgefirst.json and labels.txt embedded in the ZIP-format model file
ONNX: edgefirst.json embedded in model.metadata_props

No sidecar files required; the model artifact is self-contained.

Limitations

COCO bias — models trained on COCO (80 classes) inherit the dataset's biases (Western-centric scenes, particular object distributions, limited weather/lighting diversity).
Quantization loss — integer quantization introduces accuracy loss relative to FP32: INT8 on the NXP i.MX 8M Plus / i.MX 95 Neutron and Hailo NPUs, and a mixed INT8/INT16 scheme on the NXP Ara240 (the box-regression path is promoted to INT16 for localization accuracy). The magnitude per platform is shown in the Δ vs FP32 column above.
Configurations under active investigation — a subset of INT8 results measure below expectations and are marked ⚠ above; these are tracked for resolution, not accepted as final. The main cases are YOLO11 / YOLO26 on the NXP i.MX 8M Plus VeriSilicon NPU (the most constrained accelerator, where the newer architectures quantize poorly) and some NXP Ara240 segmentation runs. YOLO11 / YOLO26 on the NXP i.MX 95 eIQ Neutron NPU are not yet supported (a delegate limitation) and render without numbers. Each next card snapshot reflects any recovered accuracy.
Input resolution — all models expect 640×640 input; other resolutions require letterboxing.

License

Model weights in this repository are derived from Ultralytics YOLO and remain © Ultralytics Inc., licensed AGPL-3.0 — use requires AGPL-3.0 compliance or an Ultralytics Enterprise License.

The validation results, this model card, and its metadata are Au-Zone Technologies' own contribution, licensed CC BY-NC 4.0 (Attribution — NonCommercial) — see the repository LICENSE for the full text and citation requirements.

Citation

@software{edgefirst_yolov5_det,
  title = { {YOLOv5 Detection — EdgeFirst Model Zoo} },
  author = {Au-Zone Technologies},
  url = {https://huggingface.co/EdgeFirst/yolov5-det},
  year = {2026},
  license = {CC-BY-NC-4.0},
}

_{EdgeFirst Studio · GitHub · Docs · Au-Zone Technologies

Model weights © Ultralytics Inc. (AGPL-3.0) · Validation results & card © 2026 Au-Zone Technologies (CC BY-NC 4.0)

NXP^®, i.MX, eIQ^®, Neutron, and Ara240 are trademarks or products of NXP Semiconductors. Hailo is a trademark of Hailo Technologies Ltd. Jetson is a trademark of NVIDIA Corporation. All other trademarks are the property of their respective owners.}

Downloads last month: 146

Evaluation results

mAP@0.5 (Nano ONNX FP32) on COCO val2017
self-reported

47.750
mAP@0.5-0.95 (Nano ONNX FP32) on COCO val2017
self-reported

32.950
mAP@0.5 (Small ONNX FP32) on COCO val2017
self-reported

57.330
mAP@0.5-0.95 (Small ONNX FP32) on COCO val2017
self-reported

41.260
mAP@0.5 (Medium ONNX FP32) on COCO val2017
self-reported

63.290
mAP@0.5-0.95 (Medium ONNX FP32) on COCO val2017
self-reported

46.990

EdgeFirst
/

yolov5-det

YOLOv5 Detection — EdgeFirst Model Zoo

Reference accuracy — ONNX FP32

Accuracy methodology & relation to Ultralytics

On-target validation results

Validation pipeline

Downloads

Inference example (Python)

EdgeFirst HAL

Traceability

See also

Train your own with EdgeFirst Studio

Technical notes

Quantization pipeline

Split decoder output format

Embedded metadata

Limitations

License

Citation

Evaluation results