Open to Work

42 9 65

DedeProGames PRO

DedeProGames

AI & ML interests

Thinking and Agentic Finetuning

Recent Activity

upvoted a collection about 14 hours ago

Terminal Agent Research

updated a collection about 14 hours ago

Terminal Agent Research

updated a model about 14 hours ago

OrionLLM/Terminus-Qwen3-8b

View all activity

Organizations

upvoted a collection about 14 hours ago

Terminal Agent Research

Collection

Our research for small Terminal Agentic Models and Agentic datasets • 2 items • Updated about 14 hours ago • 1

updated a collection about 14 hours ago

Terminal Agent Research

Collection

Our research for small Terminal Agentic Models and Agentic datasets • 2 items • Updated about 14 hours ago • 1

updated a model about 14 hours ago

OrionLLM/Terminus-Qwen3-8b

Text Generation • 8B • Updated about 14 hours ago • 176 • 2

updated a collection about 14 hours ago

Terminal Agent Research

Collection

Our research for small Terminal Agentic Models and Agentic datasets • 2 items • Updated about 14 hours ago • 1

reactedto their post with 😎🧠👍 about 16 hours ago

Post

2935

🔥 GRM2 - The small one that surpasses the big ones.
What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can.
GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks.
🤗 Model: OrionLLM/GRM2-3b
The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.

🚀 Chat with GRM:
DedeProGames/GRM2-Chat

🏆 Download official GGUFs: OrionLLM/GRM2-3b-GGUF

updated a dataset about 16 hours ago

OrionLLM/OpenAgentInstruct

Viewer • Updated about 16 hours ago • 15.2k • 18 • 1

upvoted a collection about 16 hours ago

Medical Research

Collection

Our research for medical models and datasets • 4 items • Updated about 16 hours ago • 1

updated a collection about 16 hours ago

Medical Research

Collection

Our research for medical models and datasets • 4 items • Updated about 16 hours ago • 1

updated a model about 16 hours ago

DedeProGames/medqwen-1.5b

Text Generation • 2B • Updated about 16 hours ago • 334 • 2

updated a dataset about 16 hours ago

OrionLLM/OpenMedicalInstruct

Viewer • Updated about 16 hours ago • 20k • 43 • 2

upvoted a collection about 16 hours ago

GRM-2.5

Collection

Reasoning models for complex reasoning, challenging tasks, and all kinds of chat and everyday use. • 2 items • Updated 4 days ago • 1

reactedto anakin87's post with 🔥❤️ 1 day ago

Post

3230

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

liked a Space 1 day ago

VOID

👀

VOID: Video Object and Interaction Deletion

New activity in SL-AI/CRePE-Mini 1 day ago

Update README.md

#1 opened 1 day ago by

DedeProGames

DedeProGames PRO

AI & ML interests

Recent Activity

Organizations

DedeProGames's activity

VOID

Update README.md