Instructions to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Flexan/RthItalia-PINDARO-AI-CODE-GGUF",
	filename="PINDARO-AI-CODE.IQ3_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Flexan/RthItalia-PINDARO-AI-CODE-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Flexan/RthItalia-PINDARO-AI-CODE-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M

Ollama
How to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with Ollama:
```
ollama run hf.co/Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M
```

Unsloth Studio new

How to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Flexan/RthItalia-PINDARO-AI-CODE-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Flexan/RthItalia-PINDARO-AI-CODE-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Flexan/RthItalia-PINDARO-AI-CODE-GGUF to start chatting

Docker Model Runner
How to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with Docker Model Runner:
```
docker model run hf.co/Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M
```

Lemonade

How to use Flexan/RthItalia-PINDARO-AI-CODE-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Flexan/RthItalia-PINDARO-AI-CODE-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.RthItalia-PINDARO-AI-CODE-GGUF-Q4_K_M

List all available models

lemonade list

GGUF Files for PINDARO-AI-CODE

These are the GGUF files for RthItalia/PINDARO-AI-CODE.

Downloads

GGUF Link	Quantization	Description
Download	Q2_K	Lowest quality
Download	Q3_K_S
Download	IQ3_S	Integer quant, preferable over Q3_K_S
Download	IQ3_M	Integer quant
Download	Q3_K_M
Download	Q3_K_L
Download	IQ4_XS	Integer quant
Download	Q4_K_S	Fast with good performance
Download	Q4_K_M	Recommended: Perfect mix of speed and performance
Download	Q5_K_S
Download	Q5_K_M
Download	Q6_K	Very good quality
Download	Q8_0	Best quality
Download	f16	Full precision, don't bother; use a quant

Note from Flexan

I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.

If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.

You can find more info about me and what I do here.

MODEL_CARD - PINDARO AI CODE

Date: 2026-03-02 Model path: e:\Pindaro\PINDARO AI CODE

1. Model Identity

Name: PINDARO AI CODE
Family: LLaMA-style causal LM
Intended role: coding assistant
Format support:
- Hugging Face (model.safetensors)
- GGUF F16 (pindaro-f16.gguf)
- GGUF Q4_K_M (pindaro-q4_k_m.gguf)

2. Technical Specs

Architecture: LlamaForCausalLM
model_type: llama
Layers: 22
Hidden size: 2048
Attention heads: 32
KV heads: 4
Intermediate size: 5632
Max context: 2048
Vocab size: 32002
Tensor count in safetensors: 201
Parameter count (computed): 1,100,056,576
Dtype in config: float16

3. Chat / Prompt Format

Template is aligned to registered special tokens:

<|noesis|> (id 32000)
<|end|> (id 32001)

Configured template:

{{ bos_token }}{% for message in messages %}<|noesis|>
{% if message['role'] == 'system' %}### System
{{ message['content'] }}
{% elif message['role'] == 'user' %}### Question
{{ message['content'] }}
{% elif message['role'] == 'assistant' %}### Answer
{{ message['content'] }}
{% endif %}<|end|>
{% endfor %}{% if add_generation_prompt %}<|noesis|>
### Answer
```
{% endif %}

4. Local Artifact Integrity (SHA256)

model.safetensors: F77C27B8BABF9FCAB83A7DC68BA58934E8C8C031C9F10B4B73E802D4FBFE0CEC
config.json: B37C45060F3E2F5F9B91903C9CCB32F3C21076E809954FDA6C01D987CD8F25CC
generation_config.json: 6FF47E725C0EC6D0F1895670DE7EE68E61A4F99703F6C8E89AEA6AB14EA02DC3
tokenizer.json: 51433F06369AC3E597DFA23A811215E3511B8F86588A830DED72344B76A193EE
tokenizer_config.json: A0567C49A117AF9AF332874CFD333DDD622A09C5E9765131CEEE6344CB22A3DE
tokenizer.model: 9E556AFD44213B6BD1BE2B850EBBBD98F5481437A8021AFAF58EE7FB1818D347
special_tokens_map.json: D7805E093432AFCDE852968CDEBA3DE08A6FE66E77609F4701DECB87FC492F33
added_tokens.json: ECE349D292E246EAC9A9072C1730F023E61567984A828FB0D25DCCB14E3B7592
pindaro-f16.gguf: BDAAEB6FB712E9A4D952082CF415B05C7D076B33786D39063BBFB3A7E5DB2031
pindaro-q4_k_m.gguf: 5F98CC3454774ED5ED80D71A71ADFD0DAFF760FC9EEF0900DDD4F7EDA2E20FEF

5. Smoke Tests (2026-03-02)

Environment:

Python 3.11.9
Transformers 4.57.3
Torch 2.10.0+cpu

Results:

AutoConfig load: PASS
AutoTokenizer load: PASS
AutoModel load: PASS
Chat-template render: PASS
Template special-token alignment: PASS
Deterministic generation: PASS

Observed non-blocking warning:

Folder name with spaces may trigger a Python module-name warning in some runtimes.

6. Known Issues

Folder-name warning risk

PINDARO AI CODE has spaces; some tools warn on module naming.

Attention-mask warning in some calls

As pad_token equals eos_token, pass attention_mask explicitly for stable behavior.

7. Recommended Next Steps

Optional packaging cleanup

Rename folder to a no-space slug (example: PINDARO_AI_CODE) when compatible with your deployment scripts.

Add coding eval gate

HumanEval pass@1
MBPP subset
Prompt-format adherence checks

8. Usage Example

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

path = r"e:\Pindaro\PINDARO AI CODE"
tokenizer = AutoTokenizer.from_pretrained(path, local_files_only=True)
model = AutoModelForCausalLM.from_pretrained(path, local_files_only=True, dtype=torch.float16)

messages = [
    {"role": "system", "content": "You are a coding assistant."},
    {"role": "user", "content": "Write a Python function add(a, b)."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
)
outputs = model.generate(inputs, max_new_tokens=80, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

9. Limitations and Safety

No training-data statement is included in this folder.
No official benchmark sheet is included.
Code generation can be plausible but wrong; always run tests.

10. Release Readiness

Current status: READY FOR LOCAL USE.

Packaging/runtime blockers are resolved.
Remaining items are evaluation and packaging polish.

Downloads last month: 62

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Flexan/RthItalia-PINDARO-AI-CODE-GGUF

Base model

RthItalia/PINDARO-AI-CODE

Quantized

(1)

this model

Collection including Flexan/RthItalia-PINDARO-AI-CODE-GGUF

Community GGUFs

Collection

This collection contains quantized GGUF files for community models that did not have GGUF equivalents available yet. I do not own these models. • 58 items • Updated Apr 16