s3nh PRO
s3nh
AI & ML interests
Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh
Recent Activity
liked a model 10 days ago
netflix/void-model updated a model 24 days ago
s3nh/Nanbeige-4.1-3B-Uncensored updated a model 24 days ago
s3nh/SmolLLM-3B-UncensoredOrganizations
reactedto mitkox's post with 🚀 about 2 months ago
reactedto Tonic's post with 🔥 about 2 months ago
Post
3375
🙋🏻♂️hello my lovelies ,
it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.
repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw
you can also run it locally and see for yourself :
docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest
just a few quite minor details i'll take care of but i wanted to share here first
it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.
repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw
you can also run it locally and see for yourself :
docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest
just a few quite minor details i'll take care of but i wanted to share here first
reactedto MonsterMMORPG's post with 🔥 about 2 months ago
Post
2993
SECourses Musubi Trainer upgraded to V27 and FLUX 2, FLUX Klein, Z-Image training added with demo configs - amazing VRAM optimized - read the news
App is here : https://www.patreon.com/posts/137551634
Full tutorial how to use and train : https://youtu.be/DPX3eBTuO_Y
App is here : https://www.patreon.com/posts/137551634
Full tutorial how to use and train : https://youtu.be/DPX3eBTuO_Y
reactedto codelion's post with 🔥 2 months ago
Post
6154
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!
Key findings from our research on optimal architectures for small language models:
→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning
We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.
Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m
Key findings from our research on optimal architectures for small language models:
→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning
We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.
Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m
reactedto giux78's post with 🔥 2 months ago
Post
237
Together with @mferraretto and @efederici we released #Nesso-4B, a new model specialized for agentic workflows.
mii-llm/nesso-4B
#Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases.
As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments.
Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough.
#Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI
mii-llm/nesso-4B
#Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases.
As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments.
Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough.
#Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI
reactedto AdinaY's post with 🔥 2 months ago
Post
415
GLM just entered the OCR field🔥
zai-org/GLM-OCR
✨ 0.9B
✨ MIT licensed
✨ Multimodal GLM-V architecture
✨ #1 on OmniDocBench v1.5 (94.62)
zai-org/GLM-OCR
✨ 0.9B
✨ MIT licensed
✨ Multimodal GLM-V architecture
✨ #1 on OmniDocBench v1.5 (94.62)
reactedto raincandy-u's post with 🔥 2 months ago
Post
3030
Introducing Rain-v2: Democratizing LLM training on gaming GPUs! ⚡
Following Rain-100M, we’re scaling up. Rain-v2 features a larger training dataset.
We’ve published a comprehensive blog covering the end-to-end journey—from raw data collection to rigorous evaluation and safety testing.
HF Repo: 🤗 raincandy-u/Rain-v2
Blog: 📚
https://angelkawaii.xyz/2026/01/29/rain-v2/
Special thanks to the open-source community and the SmolLM2 team for their foundational work! 🚀
HuggingFaceTB
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)
Following Rain-100M, we’re scaling up. Rain-v2 features a larger training dataset.
We’ve published a comprehensive blog covering the end-to-end journey—from raw data collection to rigorous evaluation and safety testing.
HF Repo: 🤗 raincandy-u/Rain-v2
Blog: 📚
https://angelkawaii.xyz/2026/01/29/rain-v2/
Special thanks to the open-source community and the SmolLM2 team for their foundational work! 🚀
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)
reactedto raincandy-u's post with 👍🔥 3 months ago
Post
5574
🤗 Just released Rain-100M, an experimental ~97M-parameter Qwen3-style language model trained from random initialization.
Repo: raincandy-u/Rain-100M
Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only
Tokenizer: custom 16k BPE, context length 4096
Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
Repo: raincandy-u/Rain-100M
Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only
Tokenizer: custom 16k BPE, context length 4096
Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
reactedto sourceoftruthdata's post with ❤️🤗 5 months ago
Post
4327
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
reactedto ZennyKenny's post with 👍 6 months ago
posted an update 6 months ago
Post
761
Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on
Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
psychotheraputic preferences just landed on
Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
repliedto their post 6 months ago
Thanks!
posted an update 6 months ago
Post
4327
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
s3nh/EduHelp-8B
Glad to share my work, have a wonderful day!
reactedto Severian's post with 👀 6 months ago
Post
430
New Technique to Deeply Poison AI on Images and Prove Creative Provenance
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.
reactedto Severian's post with 👍 6 months ago
Post
3259
MLX port of BDH (Baby Dragon Hatchling) is up!
I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.
Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.
I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.
Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX
If you try it on your own data, feedback and PRs are welcome.
I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.
Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.
I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.
Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX
If you try it on your own data, feedback and PRs are welcome.
reactedto mitkox's post with 🚀 6 months ago
Post
413
Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig:
307 tok/sec
1.1M tok/hour
The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.
307 tok/sec
1.1M tok/hour
The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.