Instructions to use reaperdoesntknow/GPT-X2-125M-CIx-Long-Context with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use reaperdoesntknow/GPT-X2-125M-CIx-Long-Context with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="reaperdoesntknow/GPT-X2-125M-CIx-Long-Context", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/GPT-X2-125M-CIx-Long-Context", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use reaperdoesntknow/GPT-X2-125M-CIx-Long-Context with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "reaperdoesntknow/GPT-X2-125M-CIx-Long-Context"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/GPT-X2-125M-CIx-Long-Context",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/reaperdoesntknow/GPT-X2-125M-CIx-Long-Context

SGLang

How to use reaperdoesntknow/GPT-X2-125M-CIx-Long-Context with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "reaperdoesntknow/GPT-X2-125M-CIx-Long-Context" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/GPT-X2-125M-CIx-Long-Context",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "reaperdoesntknow/GPT-X2-125M-CIx-Long-Context" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/GPT-X2-125M-CIx-Long-Context",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use reaperdoesntknow/GPT-X2-125M-CIx-Long-Context with Docker Model Runner:
```
docker model run hf.co/reaperdoesntknow/GPT-X2-125M-CIx-Long-Context
```

GPT-X2-125M CIx Long-Context

Model Summary

This model is a custom-code derivative of AxiomicLabs/GPT-X2-125M, adapted for experimental long-context causal language modeling and architecture research.

The repository includes a Hugging Face Transformers-compatible GPT-X2 implementation with optional Symplectic Metric-RoPE Governor support and training utilities built around CIxOpt, a heterogeneous optimizer developed for efficient parameter routing across large projection matrices, sensitive normalization parameters, and optional governor modules.

The model is intended as a research checkpoint for compact long-context generation, positional encoding experiments, optimizer testing, and continued fine-tuning.

Base Model

Base model: AxiomicLabs/GPT-X2-125M
Model family: GPT-X2
Task: Causal language modeling / text generation
Language: English
Library: Hugging Face Transformers with custom code
License: Apache 2.0, unless otherwise restricted by upstream model or dataset terms

Architecture

This implementation uses a compact decoder-only GPT-X2-style architecture.

Default configuration:

text model_type: gptx2 vocab_size: 32768 hidden_size: 576 num_hidden_layers: 30 num_attention_heads: 9 num_key_value_heads: 3 head_dim: 64 intermediate_size: 1536 max_position_embeddings: 32768 rope_theta: 100000.0 rms_norm_eps: 1e-6 tie_word_embeddings: true

Core architecture features:

Decoder-only causal language model
30 transformer blocks
576 hidden size
9 query attention heads
3 key/value heads
Grouped-query attention
Rotary position embeddings
Optional YaRN-style RoPE scaling support
RMSNorm
SwiGLU MLP layers
Tied input and output embeddings
Dynamic cache support for generation
Left-padding-aware position ID handling
Safe causal language modeling loss behavior when labels are masked

Symplectic Metric-RoPE Governor

This checkpoint’s codebase includes an optional experimental mechanism called Symplectic Metric-RoPE Governor.

The governor is designed to test whether rotary position encoding can be modulated through a learned phase-space control layer while preserving identity behavior at initialization.

When enabled, the governor adds:

Global Hamiltonian-style clock state
Per-layer local symplectic clock state
Metric projection modules
Mass and spin deltas for rotary frequency modulation
Context-aware clock updates
Beam-search-safe clock-state reordering
Clock diagnostics
Optional clock regularization terms

The metric projection layers are designed to initialize safely so that zeroed governor projections begin from standard RoPE behavior before learning deviations.

Relevant configuration fields include:

text use_symplectic_rope symplectic_global_k_dim symplectic_local_k_dim symplectic_global_dt symplectic_local_dt symplectic_n_steps symplectic_context_scale symplectic_momentum_inject_scale metric_grad_scale metric_mass_global_scale metric_spin_global_scale metric_mass_local_scale metric_spin_local_scale metric_radial_base clock_reg_coeff clock_metric_reg_coeff clock_state_reg_coeff clock_smooth_reg_coeff return_clock_diagnostics

CIxOpt Optimizer

Training and experimentation were designed around CIxOpt, a custom heterogeneous optimizer.

CIxOpt supports:

AdamW-style adaptive updates
Lion-style sign momentum
AdaMax routing
ASGD-style averaging
Optional low-rank projected momentum
Native foreach vectorization where dimensions allow
Gradient centralization
Decoupled weight decay
Discrepancy-aware caution filtering for sign updates
Activation-aware decay hooks
fp32 optimizer state for fp16/bf16 parameter safety
Parameter-name registration for architecture-aware routing

The optimizer can route different parts of the model differently. For example:

text large projection matrices -> sign-momentum style updates normalization / sensitive params -> AdamW-style updates embedding / lm-head surfaces -> AdaMax-compatible routing governor / clock parameters -> precise adaptive update path

This makes the model useful for studying optimizer behavior in compact architectures where not every parameter type should be treated the same way.

Intended Use

This model is intended for:

Compact causal language modeling research
Long-context generation experiments
RoPE / YaRN / positional encoding studies
Symplectic Metric-RoPE experiments
CIxOpt optimizer testing
Continuation training experiments
Architecture ablations
Instruction-style fine-tuning experiments
Lightweight local text-generation prototypes

Out-of-Scope Use

This is an experimental research checkpoint. It should not be used as a sole authority or autonomous decision-maker in high-stakes settings.

Do not rely on this model alone for:

Medical advice
Legal conclusions
Financial decisions
Emergency response
Personnel screening
Critical infrastructure operations
Surveillance targeting
Autonomous cyber operations
Any setting requiring verified factual accuracy

Limitations

Known or expected limitations:

May hallucinate facts, dates, citations, or technical details
May inherit limitations from AxiomicLabs/GPT-X2-125M
May be sensitive to prompt formatting
Long-context support does not guarantee accurate long-range reasoning
Symplectic Metric-RoPE behavior is experimental
CIxOpt-based training may produce behavior different from AdamW-trained baselines
Safety behavior has not been fully evaluated
Benchmark results are not yet included
Generated outputs should be reviewed before use

Installation

Install the core dependencies:

bash pip install torch transformers safetensors

Because this model uses custom architecture code, load it with:

python trust_remote_code=True

Usage

python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "YOUR_USERNAME/YOUR_MODEL_REPO" tokenizer = AutoTokenizer.from_pretrained( model_id, trust_remote_code=True, ) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token prompt = "Explain why stable positional encoding matters for long-context language models." inputs = tokenizer( prompt, return_tensors="pt", ).to(model.device) with torch.inference_mode(): output = model.generate( **inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.95, repetition_penalty=1.05, pad_token_id=tokenizer.eos_token_id, ) print(tokenizer.decode(output[0], skip_special_tokens=True))

Chat-Style Usage

If the tokenizer includes a chat template:

python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "YOUR_USERNAME/YOUR_MODEL_REPO" tokenizer = AutoTokenizer.from_pretrained( model_id, trust_remote_code=True, ) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) messages = [ { "role": "user", "content": "What is Symplectic Metric-RoPE and why might it help long-context modeling?" } ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True, ).to(model.device) with torch.inference_mode(): output = model.generate( **inputs, max_new_tokens=384, do_sample=True, temperature=0.7, top_p=0.95, repetition_penalty=1.05, pad_token_id=tokenizer.eos_token_id, ) generated = output[0][inputs["input_ids"].shape[-1]:] print(tokenizer.decode(generated, skip_special_tokens=True))

Suggested Generation Settings

Balanced exploratory generation:

python generation_config = { "max_new_tokens": 384, "do_sample": True, "temperature": 0.7, "top_p": 0.95, "repetition_penalty": 1.05, }

More deterministic generation:

python generation_config = { "max_new_tokens": 384, "do_sample": False, }

Training Notes

This model is trained as a causal language model.

Recommended training setup:

text loss: causal language modeling loss padding labels: -100 optimizer: CIxOpt or AdamW-compatible optimizer gradient clipping: recommended use_cache during training: false mixed precision: bf16 preferred where supported

When training with padded batches, labels should mask padding tokens:

python labels = input_ids.clone() labels[attention_mask == 0] = -100

For chat-style supervised fine-tuning, assistant-only label masking is recommended when possible.

Evaluation

Formal benchmark results have not yet been added.

Recommended evaluations:

Held-out perplexity
Short-context and long-context generation checks
IFEval-style instruction following
Small reasoning suites
Repetition and degeneration testing
Side-by-side comparison against AxiomicLabs/GPT-X2-125M
Long-context retrieval and recall probes
Governor-on vs governor-off ablations, if applicable
CIxOpt vs AdamW optimizer comparisons

Safety and Responsible Use

This model may produce plausible but incorrect outputs. Users should independently verify important claims.

Before deployment, evaluate for:

Hallucination rate
Bias and toxicity
Prompt injection sensitivity
Refusal behavior
Domain-specific factuality
Robustness under long-context prompting
Failure modes introduced by custom positional encoding experiments

Citation

Base model:

bibtex @misc{axiomiclabs_gptx2_125m, title = {GPT-X2-125M}, author = {AxiomicLabs}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/AxiomicLabs/GPT-X2-125M}} }

If referencing this derivative checkpoint, cite the model repository and base model together.

Author / Maintainer

Fine-tuning, custom architecture work, and optimizer experimentation by: Convergent Intelligence LLC

Research areas include AI systems, mathematical frameworks, intelligence analysis, optimizer design, and efficient language-model adaptation.

Disclaimer

This checkpoint is provided for research and experimentation. It is not a verified expert system. Outputs require human review, especially in factual, technical, legal, medical, financial, operational, or safety-critical settings.

Downloads last month: 381

Safetensors

Model size

0.1B params

Tensor type

BF16

Model tree for reaperdoesntknow/GPT-X2-125M-CIx-Long-Context

Base model

AxiomicLabs/GPT-X2-125M

Finetuned

(1)

this model

Collection including reaperdoesntknow/GPT-X2-125M-CIx-Long-Context

Convergent Optimizations

Collection

Various models trained with our heterogeneous optimizer designed for efficient parameter routing across large projection matrices. • 3 items • Updated 2 days ago