Text Generation
Transformers
Safetensors
PyTorch
English
llama
mathematical-reasoning
chain-of-thought
educational
interpretable-ai
thinking-model
text-generation-inference
Instructions to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="shivash/enhanced-hybrid-transformer-768d-trained-thinking")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking") model = AutoModelForCausalLM.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "shivash/enhanced-hybrid-transformer-768d-trained-thinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shivash/enhanced-hybrid-transformer-768d-trained-thinking", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/shivash/enhanced-hybrid-transformer-768d-trained-thinking
- SGLang
How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "shivash/enhanced-hybrid-transformer-768d-trained-thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shivash/enhanced-hybrid-transformer-768d-trained-thinking", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "shivash/enhanced-hybrid-transformer-768d-trained-thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shivash/enhanced-hybrid-transformer-768d-trained-thinking", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with Docker Model Runner:
docker model run hf.co/shivash/enhanced-hybrid-transformer-768d-trained-thinking
Enhanced Hybrid Transformer with Internal Thinking Capabilities
This model extends the base hybrid transformer architecture with explicit chain-of-thought reasoning capabilities, allowing it to show its internal thinking process before providing answers.
π§ Key Features
- Explicit Reasoning: Uses
<thinking>tags to show step-by-step problem analysis - Mathematical Problem Solving: Trained specifically on GSM8K and AQuA-RAT datasets
- Dual-Temperature Generation: Different temperatures for reasoning (0.8) vs final answers (0.2)
- Extended Context: 768 token context length optimized for complex reasoning
- Self-Verification: Includes checking and validation steps in reasoning process
π Training Details
- Base Model: Enhanced Hybrid Transformer (768 dimensions)
- Training Data: 20,000 samples from GSM8K (65%) and AQuA-RAT (35%)
- Training Epochs: 6 epochs with cosine learning rate scheduling
- Context Length: 768 tokens
- Special Tokens:
<thinking>and</thinking>for reasoning sections - Final Training Loss: 2.9602
- Final Validation Loss: 1.8570
π Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking")
tokenizer = AutoTokenizer.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking")
# Example usage
prompt = "Problem: A bakery sells 24 cupcakes in the morning and 36 cupcakes in the afternoon. If each cupcake costs $2, how much money did they make?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=250,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
π Expected Output Format
The model will show its reasoning process:
Problem: A bakery sells 24 cupcakes in the morning and 36 cupcakes in the afternoon. If each cupcake costs $2, how much money did they make?
<thinking>
Let me break down this problem step by step:
1. First, I need to find the total number of cupcakes sold
2. Then multiply by the price per cupcake
3. Morning sales: 24 cupcakes
4. Afternoon sales: 36 cupcakes
5. Total cupcakes = 24 + 36 = 60 cupcakes
6. Price per cupcake = $2
7. Total revenue = 60 Γ $2 = $120
Let me verify this makes sense...
</thinking>
Looking at this problem, I need to calculate the total revenue.
Total cupcakes sold = 24 + 36 = 60 cupcakes
Total money made = 60 Γ $2 = $120
Therefore, the bakery made $120.
π― Model Capabilities vs Standard Models
- Transparent Reasoning: Unlike black-box models, shows explicit thinking steps
- Educational Value: Useful for understanding problem-solving approaches
- Mathematical Focus: Specialized for arithmetic and word problems
- Self-Checking: Includes verification steps in reasoning process
- Interpretable: Clear separation between reasoning and final answers
π Training Configuration
TRAIN_CONFIG = {
"num_epochs": 6,
"batch_size": 4,
"gradient_accumulation_steps": 8,
"learning_rate": 4e-5,
"warmup_ratio": 0.15,
"weight_decay": 8e-3,
"max_length": 768,
"temperature_thinking": 0.8,
"temperature_final": 0.2
}
π Model Architecture
Based on a hybrid transformer architecture with:
- 142,427,136 parameters
- 768-dimensional embeddings
- Enhanced tokenization for thinking patterns
- Specialized attention mechanisms for reasoning tasks
π Training Datasets
- GSM8K: Grade school math word problems (65% of training data)
- AQuA-RAT: Algebraic reasoning with rationale (35% of training data)
- All examples enhanced with explicit thinking patterns
β οΈ Limitations
- Primarily trained on mathematical reasoning tasks
- May struggle with non-mathematical questions
- Thinking process is structured but may not always be perfect
- Limited to 768 token context length
Model trained and uploaded by shivash using enhanced hybrid transformer architecture with thinking capabilities.
- Downloads last month
- -
Datasets used to train shivash/enhanced-hybrid-transformer-768d-trained-thinking
Benchmark β’ Updated β’ 17.6k β’ 956k β’ 1.33k
deepmind/aqua_rat
Viewer β’ Updated β’ 196k β’ 8.77k β’ 72