Llama-3.1-8B-Aurora-Chat v3
π Best Aurora chat model in our zoo (eval 2.80/5, +59% over base).
LoRA fine-tune of meta-llama/Llama-3.1-8B-Instruct specialized for the
ALCF Aurora supercomputer (Intel Xeon Sapphire
Rapids + Intel GPU Max 1550 / Ponte Vecchio, oneAPI / SYCL, PBS Pro).
Off-the-shelf code-LLMs hallucinate Aurora specifics β they suggest nvcc instead of
icpx -fsycl, srun / aprun instead of mpiexec, NERSC's /global/cfs instead of
/lus/flare, and CUDA device strings instead of xpu. This adapter teaches the base
model the actual Aurora toolchain, file system layout, scheduler conventions, and
recommended PyTorch/TensorFlow/SYCL idioms.
Model summary
| Base model | meta-llama/Llama-3.1-8B-Instruct |
| Format | GGUF, f16 β single file, llama.cpp / Ollama / LM Studio compatible |
| Fine-tuning | LoRA (PEFT) β r=32, Ξ±=64, dropout 0.0, 2 epochs |
| Optimizer | AdamW fused, lr 2e-4 cosine, warmup 3%, batch 1 Γ grad-accum 8 |
| Precision / seq-len | bf16, 1,536 tokens |
| Training data | aurora-docs-distill-multirank β 4,495 ChatML rows |
| Train loss (final) | 0.6224 |
| Hardware | 1 Aurora PVC tile (1/12 of a node, 64 GB HBM), IPEX + PyTorch 2.10 XPU backend |
| Eval (53-Q Aurora, 0β5) | 2.80 / 5 (base 1.76, +59.1%) |
Quick start
On Aurora (PVC GPU, SYCL llama.cpp build) β interactive PBS session:
# 1. Grab a debug node
qsub -I -A <project> -q debug -l select=1,walltime=01:00:00,filesystems=home:flare
# 2. Load the toolchain
module load frameworks
source /lus/flare/projects/<project>/scripts/env.sh # or your own oneAPI setup
export ONEAPI_DEVICE_SELECTOR=level_zero:gpu
# 3. Download to flare (NOT $HOME β quota is small)
hf download shazzadulimun/llama31-8b-aurora-chat-v3-gguf --local-dir /lus/flare/projects/<project>/models/aurora-chat-v3
# 4. Run on a single PVC tile
/path/to/llama.cpp/build_sycl/bin/llama-cli \
-m /lus/flare/projects/<project>/models/aurora-chat-v3/*.gguf \
-ngl 999 -sm none --temp 0.0 -cnv \
-p "How do I launch one MPI rank per GPU tile on Aurora?"
Anywhere else (laptop, workstation, any GPU):
hf download shazzadulimun/llama31-8b-aurora-chat-v3-gguf --local-dir ./model
./llama-cli -m ./model/*.gguf -ngl 999 --temp 0.0 -cnv
Or Ollama / LM Studio: ollama run hf.co/shazzadulimun/llama31-8b-aurora-chat-v3-gguf
Training data
Distilled from openai/gpt-oss-120b on ALCF Sophia (vLLM) over 416 cleaned chunks of
docs.alcf.anl.gov/aurora. 4,495
training rows + 562 validation rows in ChatML format with embedded
chain-of-thought (**Reasoning:** / **Answer:**).
Broad coverage, parallel-rank distillation. 20 worker ranks each took a disjoint slice (~21 chunks) of the cleaned docs.alcf.anl.gov/aurora corpus and asked the teacher for chain-of-thought QA pairs. Disjoint slicing maximizes phrasing diversity (each rank sees fresh context) while still covering every chunk exactly once.
Full corpus + reproduction scripts: SIslamMun/Generator @ aurora-datasets-2026-04-30.
Evaluation
53-question Aurora-domain holdout (programming models, ML/AI, systems/ops, debugging).
Judged by gpt-oss-120b on a 0β5 scale.
| Model | Avg | Ξ vs. base |
|---|---|---|
Llama-3.1-8B-Aurora-Chat v3 (-A data) β best |
2.80 | +59% |
| Llama-3.1-8B-Aurora-Ops v3 | 2.31 | +31% |
Llama-3.1-8B-Aurora-Chat v1 (-B data, single-rank ablation) |
2.45 | +39% |
| Llama-3.1-8B-Aurora-ML v3 | 2.13 | +21% |
| Llama-3.1-8B-Aurora-Coder v3 | 1.97 | +12% |
meta-llama/Llama-3.1-8B-Instruct (base) |
1.76 | β |
Closed frontier models (gpt-4o, claude-sonnet-4-5, the gpt-oss-120b teacher) score 3.6β4.1 on the same holdout β the goal here isn't to beat them, it's to distill enough Aurora knowledge into a small open model that runs on a single PVC tile.
Limitations
- Synthetic-data biases. Teacher (
gpt-oss-120b) can confabulate plausible-looking but incorrect commands. Treat outputs as a verifiable first draft, not authoritative. - Doc snapshot is fixed at 2026-04-29. Module versions, queue names, and APIs change β anything published after that date isn't reflected here.
- Aurora-only. Specifics (
/lus/flare,xpu, PBS queues) won't transfer to Frontier, Polaris, or other systems. - Use temperature β€ 0.1 for technical answers; higher temps invite invented flag names and paths.
Citation
@misc{aurora-llms-2026,
title = { Llama-3.1-8B-Aurora-Chat v3 },
author = { Islam Mun, Shazzadul },
year = { 2026 },
url = { https://huggingface.co/shazzadulimun/llama31-8b-aurora-chat-v3-gguf },
note = { LoRA fine-tune of Llama-3.1-8B-Instruct; data distilled from gpt-oss-120b on docs.alcf.anl.gov/aurora }
}
License
Apache-2.0 for the adapter weights and synthetic training data. Source corpus is public
ALCF user documentation. Base model retains its own license β see
meta-llama/Llama-3.1-8B-Instruct.
- Downloads last month
- 231
16-bit
Model tree for shazzadulimun/llama31-8b-aurora-chat-v3-gguf
Base model
meta-llama/Llama-3.1-8B