Instructions to use Agnuxo/CAJAL-4B-P2PCLAW with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Agnuxo/CAJAL-4B-P2PCLAW with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Agnuxo/CAJAL-4B-P2PCLAW")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")
model = AutoModelForCausalLM.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use Agnuxo/CAJAL-4B-P2PCLAW with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Agnuxo/CAJAL-4B-P2PCLAW")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use Agnuxo/CAJAL-4B-P2PCLAW with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Agnuxo/CAJAL-4B-P2PCLAW"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Agnuxo/CAJAL-4B-P2PCLAW",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Agnuxo/CAJAL-4B-P2PCLAW

SGLang

How to use Agnuxo/CAJAL-4B-P2PCLAW with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Agnuxo/CAJAL-4B-P2PCLAW" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Agnuxo/CAJAL-4B-P2PCLAW",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Agnuxo/CAJAL-4B-P2PCLAW" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Agnuxo/CAJAL-4B-P2PCLAW",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Pi new

How to use Agnuxo/CAJAL-4B-P2PCLAW with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Agnuxo/CAJAL-4B-P2PCLAW"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Agnuxo/CAJAL-4B-P2PCLAW"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

MLX LM

How to use Agnuxo/CAJAL-4B-P2PCLAW with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Agnuxo/CAJAL-4B-P2PCLAW"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Agnuxo/CAJAL-4B-P2PCLAW"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Agnuxo/CAJAL-4B-P2PCLAW",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use Agnuxo/CAJAL-4B-P2PCLAW with Docker Model Runner:
```
docker model run hf.co/Agnuxo/CAJAL-4B-P2PCLAW
```

CAJAL-4B-P2PCLAW

🧠 The Research LLM That Fits in Your Pocket

CAJAL-4B is a 4-billion parameter language model fine-tuned specifically for scientific paper generation. Unlike generic chatbots, CAJAL understands academic structure, citation formats, LaTeX, and domain-specific terminology.

Named after Santiago Ramón y Cajal, the father of modern neuroscience, this model embodies rigorous, structured thinking applied to scientific writing.

🚀 Quick Start

Option 1: HuggingFace Transformers (Python)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")
tokenizer = AutoTokenizer.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")

prompt = """Write an abstract for a paper on decentralized AI peer review 
using formal verification and IPFS-backed persistence."""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Option 2: llama.cpp / LM Studio (Local, No Code)

Download the GGUF from Releases

Open LM Studio → Load Model → Select GGUF

System prompt:

You are CAJAL, a research assistant specialized in scientific writing.
Generate well-structured, cited academic content.
Use LaTeX formatting for equations when relevant.
Prefer precise, technical language over vague generalizations.

Option 3: Ollama

ollama pull agnuxo/cajal-4b-p2pclaw
ollama run agnuxo/cajal-4b-p2pclaw

Option 4: vLLM (Fast Inference Server)

python -m vllm.entrypoints.openai.api_server \
  --model Agnuxo/CAJAL-4B-P2PCLAW \
  --quantization awq

Option 5: MLX (Apple Silicon)

import mlx_lm

model, tokenizer = mlx_lm.load("Agnuxo/CAJAL-4B-P2PCLAW")
response = mlx_lm.generate(model, tokenizer, prompt="Write a paper abstract...")

📊 What Makes It Different

Feature	CAJAL-4B	Generic 4B	Why It Matters
Paper structure	✅ Native understanding	⚠️ Generic chat	Knows IMRAD format
Citations	✅ BibTeX, APA, MLA	❌ Hallucinates	Real citation formats
LaTeX	✅ Equations, tables	❌ No	Research-ready output
Domain terms	✅ Physics, CS, Bio	⚠️ Surface-level	Technical depth
Methodology	✅ Detailed procedures	⚠️ Vague	Reproducible methods
VRAM usage	✅ 3.5GB (Q4_K_M)	Similar	Runs on consumer GPUs
Local inference	✅ 100% offline	⚠️ Depends	No API/cloud needed

🎯 Benchmarks

Task	CAJAL-4B	Qwen3.5-4B	Gemma-4B	Phi-4-mini
Abstract generation	92/100	71/100	68/100	79/100
Citation accuracy	88/100	52/100	48/100	61/100
LaTeX correctness	94/100	43/100	41/100	55/100
Methodology detail	89/100	64/100	59/100	72/100
Literature review	85/100	69/100	67/100	74/100

Evaluated by BenchClaw 17-judge tribunal on 50 paper generation tasks.

💻 Hardware Requirements

Quantization	File Size	VRAM Required	Speed (RTX 3090)	Speed (M3 Max)
Q4_K_M	2.3 GB	3.5 GB	~45 tok/s	~35 tok/s
Q5_K_M	2.7 GB	4.2 GB	~42 tok/s	~32 tok/s
Q8_0	4.1 GB	5.0 GB	~38 tok/s	~28 tok/s
F16	8.0 GB	9.0 GB	~35 tok/s	~25 tok/s

CPU-only: Works on any modern CPU. ~5 tok/s on Ryzen 7 5800X.

🏗️ Training Details

Base model: Qwen/Qwen3.5-4B
Fine-tuning: LoRA (r=64, alpha=128) on domain-specific corpus
Dataset: 50+ peer-reviewed papers + 200+ technical reports from P2PCLAW ecosystem
Context length: 32K tokens
Training hardware: RTX 3090 24GB
License: Apache 2.0

🔬 Example Outputs

Example 1: Abstract Generation

Prompt: Write an abstract for a paper on thermodynamic reservoir computing using SHA-256 ASICs.

Output:

We present a novel architecture for thermodynamic reservoir computing 
that repurposes Bitcoin SHA-256 ASICs as physical substrates. By mapping 
the spin-glass dynamics of the hashing pipeline onto a continuous-time 
reservoir, we achieve energy-efficient computation at the thermodynamic 
limit. Our experimental evaluation on repurposed Antminer S9 hardware 
demonstrates a 2.3x improvement in energy-delay product compared to 
conventional FPGA implementations, with a theoretical bound derived 
from Landauer's principle...

Example 2: Methodology Section

Prompt: Write a methodology section for evaluating decentralized AI peer review with 17 judges.

Output:

### 3.1 Evaluation Protocol

We employ a multi-dimensional tribunal consisting of 17 independent 
judge agents, each specialized in a distinct evaluation criterion:

- **Reasoning Depth** (15%): Logical structure, argument validity, 
  inference chain completeness.
- **Mathematical Rigor** (12%): Proof correctness, notation consistency, 
  theorem applicability.
- **Code Quality** (10%): Reproducibility, documentation, test coverage.
...

Each judge produces a scalar score $s_i \in [0, 100]$ and a confidence 
weight $w_i \in [0, 1]$. The composite score is computed as:

 $S = \frac{\sum_{i=1}^{17} w_i s_i}{\sum_{i=1}^{17} w_i}$ 

A paper achieves **Tribunal Pass** if $S \geq 75$ and no individual 
$s_i < 50$ (no veto condition).

🧩 Integration with P2PCLAW Ecosystem

CAJAL is one component of the P2PCLAW distributed research network:

Component	Role	Link
OpenCLAW-P2P	Core protocol, Lean 4 proofs	GitHub
BenchClaw	17-judge evaluation	Web
EnigmAgent	Secure credential vault	GitHub
AgentBoot	Bare-metal automation	Web
P2PCLAW Main	Research network	Website

⚠️ Limitations

Domain specificity: Optimized for STEM fields. Less effective for humanities or creative writing.
Hallucination risk: Like all LLMs, may generate plausible-sounding but incorrect citations. Always verify references.
Language: Primarily trained on English scientific papers. Spanish, Chinese, Japanese, Russian support is experimental.
Length: Best for sections up to ~2000 words. Very long papers (>10K words) may lose coherence.
Recency: Training data cutoff limits knowledge of papers published after training date.

📚 Citations

If you use CAJAL in research, please cite:

@article{angulo_cajal_2026,
  author = {Angulo de Lafuente, Francisco},
  title = {{CAJAL-4B}: A Research-Specialized Language Model for 
    Decentralized Scientific Writing},
  journal = {arXiv preprint},
  eprint = {2604.19792},
  year = {2026},
  url = {https://arxiv.org/abs/2604.19792}
}

🤝 Contributing

⭐ Star the repo: github.com/Agnuxo1/CAJAL
🐛 Report issues: GitHub Issues
💰 Sponsor development: GitHub Sponsors

📜 License

Apache 2.0 — free for research and commercial use.

Built by Francisco Angulo de Lafuente · P2PCLAW · Independent Research

ORCID: 0009-0001-1634-7063

Downloads last month: 517

Safetensors

Model size

4B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Paper for Agnuxo/CAJAL-4B-P2PCLAW

OpenCLAW-P2P v6.0: Resilient Multi-Layer Persistence, Live Reference Verification, and Production-Scale Evaluation of Decentralized AI Peer Review

Paper • 2604.19792 • Published Apr 6 • 1