bh4's picture

bh4

bh4

·

AI & ML interests

None yet

Recent Activity

new activity 16 days ago

cretz/FastWan2.2-TI2V-5B-ONNX-sharded:Inference code

liked a model about 2 months ago

litert-community/gemma-4-E2B-it-litert-lm

repliedto reaperdoesntknow's post 2 months ago

We present a methodology for training small language models on CPU at FP32 precision that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training. Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross- architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces- sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper- iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum training progressing from language to logic to transfer to depth; (4) continuous belt-fed data ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that transformers were designed for GPU hardware rather than mathematical optimality, and that architecture designed for geometric correctness—metric-space attention, triangle inequality enforcement, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models, CPU training produces more capable models at a fraction of the cost.

View all activity

Organizations

None yet

spaces 2

WebGPU Chat Qwen2

Generate images from text prompts

Experimental Moondream WebGPU

Render 3D graphics using WebGPU

models 10

bh4/moonshine-tiny-vi-ONNX

Automatic Speech Recognition • Updated Dec 24, 2025 • 4

bh4/ge2b

Image-Text-to-Text • Updated Aug 4, 2025 • 1

bh4/IndicTrans3-beta-Q2_K-GGUF

5B • Updated May 19, 2025 • 7

bh4/whisper-ben

Automatic Speech Recognition • 0.8B • Updated Apr 28, 2025 • 7 • 1

bh4/checkpoint-50

0.2B • Updated Mar 30, 2025

bh4/diwhis-bn

Updated Dec 14, 2024

bh4/bb335m-Q4_K_M-GGUF

0.3B • Updated Nov 20, 2024 • 6

bh4/bb335m

0.3B • Updated Nov 20, 2024 • 3

bh4/mt5-small-lm-adapt-Q4_K_M-GGUF

0.3B • Updated Nov 17, 2024 • 1

bh4/flan-t5-small-Q4_K_M-GGUF

77M • Updated Nov 17, 2024 • 10

datasets 1

bh4/versang

Viewer • Updated Nov 26, 2024 • 6.09M • 44