Instructions to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="shivash/enhanced-hybrid-transformer-768d-trained-thinking")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking")
model = AutoModelForCausalLM.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "shivash/enhanced-hybrid-transformer-768d-trained-thinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shivash/enhanced-hybrid-transformer-768d-trained-thinking",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/shivash/enhanced-hybrid-transformer-768d-trained-thinking

SGLang

How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "shivash/enhanced-hybrid-transformer-768d-trained-thinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shivash/enhanced-hybrid-transformer-768d-trained-thinking",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "shivash/enhanced-hybrid-transformer-768d-trained-thinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shivash/enhanced-hybrid-transformer-768d-trained-thinking",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use shivash/enhanced-hybrid-transformer-768d-trained-thinking with Docker Model Runner:
```
docker model run hf.co/shivash/enhanced-hybrid-transformer-768d-trained-thinking
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Enhanced Hybrid Transformer with Internal Thinking Capabilities

This model extends the base hybrid transformer architecture with explicit chain-of-thought reasoning capabilities, allowing it to show its internal thinking process before providing answers.

🧠 Key Features

Explicit Reasoning: Uses <thinking> tags to show step-by-step problem analysis
Mathematical Problem Solving: Trained specifically on GSM8K and AQuA-RAT datasets
Dual-Temperature Generation: Different temperatures for reasoning (0.8) vs final answers (0.2)
Extended Context: 768 token context length optimized for complex reasoning
Self-Verification: Includes checking and validation steps in reasoning process

📊 Training Details

Base Model: Enhanced Hybrid Transformer (768 dimensions)
Training Data: 20,000 samples from GSM8K (65%) and AQuA-RAT (35%)
Training Epochs: 6 epochs with cosine learning rate scheduling
Context Length: 768 tokens
Special Tokens: <thinking> and </thinking> for reasoning sections
Final Training Loss: 2.9602
Final Validation Loss: 1.8570

🚀 Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking")
tokenizer = AutoTokenizer.from_pretrained("shivash/enhanced-hybrid-transformer-768d-trained-thinking")

# Example usage
prompt = "Problem: A bakery sells 24 cupcakes in the morning and 36 cupcakes in the afternoon. If each cupcake costs $2, how much money did they make?"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=250,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

📝 Expected Output Format

The model will show its reasoning process:

Problem: A bakery sells 24 cupcakes in the morning and 36 cupcakes in the afternoon. If each cupcake costs $2, how much money did they make?

<thinking>
Let me break down this problem step by step:

1. First, I need to find the total number of cupcakes sold
2. Then multiply by the price per cupcake
3. Morning sales: 24 cupcakes
4. Afternoon sales: 36 cupcakes
5. Total cupcakes = 24 + 36 = 60 cupcakes
6. Price per cupcake = $2
7. Total revenue = 60 × $2 = $120

Let me verify this makes sense...
</thinking>

Looking at this problem, I need to calculate the total revenue.

Total cupcakes sold = 24 + 36 = 60 cupcakes
Total money made = 60 × $2 = $120

Therefore, the bakery made $120.

🎯 Model Capabilities vs Standard Models

Transparent Reasoning: Unlike black-box models, shows explicit thinking steps
Educational Value: Useful for understanding problem-solving approaches
Mathematical Focus: Specialized for arithmetic and word problems
Self-Checking: Includes verification steps in reasoning process
Interpretable: Clear separation between reasoning and final answers

📈 Training Configuration

TRAIN_CONFIG = {
    "num_epochs": 6,
    "batch_size": 4,
    "gradient_accumulation_steps": 8,
    "learning_rate": 4e-5,
    "warmup_ratio": 0.15,
    "weight_decay": 8e-3,
    "max_length": 768,
    "temperature_thinking": 0.8,
    "temperature_final": 0.2
}

🔄 Model Architecture

Based on a hybrid transformer architecture with:

142,427,136 parameters
768-dimensional embeddings
Enhanced tokenization for thinking patterns
Specialized attention mechanisms for reasoning tasks

📚 Training Datasets

GSM8K: Grade school math word problems (65% of training data)
AQuA-RAT: Algebraic reasoning with rationale (35% of training data)
All examples enhanced with explicit thinking patterns

⚠️ Limitations

Primarily trained on mathematical reasoning tasks
May struggle with non-mathematical questions
Thinking process is structured but may not always be perfect
Limited to 768 token context length

Model trained and uploaded by shivash using enhanced hybrid transformer architecture with thinking capabilities.

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

shivash
/

enhanced-hybrid-transformer-768d-trained-thinking