Mesh LLM

Qwen3-8B-Q4_K_M

Distributed GGUF inference package for Mesh LLM

Website GitHub Discord

GGUF layer package for running Qwen3-8B-Q4_K_M across a local Mesh LLM cluster.

This package is derived from unsloth/Qwen3-8B-GGUF and keeps the original GGUF distribution split into per-layer artifacts for distributed inference.

Highlights

Run locally Pool multiple machines OpenAI-compatible Package variant
Private inference on your hardware Split layers across peers Serve /v1/chat/completions locally Q4_K_M layer package

Model Overview

Property Value
Source model unsloth/Qwen3-8B-GGUF
Model id unsloth/Qwen3-8B-GGUF
Family Qwen3
Parameter scale 8B
Quantization Q4_K_M
Layer count 36
Activation width 4096
Package size 4.9 GB
Source file Qwen3-8B-Q4_K_M.gguf
Package repo meshllm/Qwen3-8B-Q4_K_M-layers

Recommended Use

  • Local and private inference with Mesh LLM.
  • Multi-machine serving when the full GGUF is too large for one host.
  • OpenAI-compatible chat/completions workflows through Mesh LLM's local API.

For upstream architecture details, chat template guidance, sampling recommendations, license terms, and benchmark notes, see the source model card: unsloth/Qwen3-8B-GGUF.

Quickstart

# Run this on each machine that should contribute memory/compute.
mesh-llm serve --model "meshllm/Qwen3-8B-Q4_K_M-layers" --split
# Check the mesh and discover the OpenAI-compatible model name.
curl -s http://localhost:3131/api/status
curl -s http://localhost:3131/v1/models
# Send an OpenAI-compatible chat request.
curl -s http://localhost:3131/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Qwen3-8B-GGUF",
    "messages": [{"role": "user", "content": "Write a tiny hello-world function in Rust."}],
    "max_tokens": 128
  }'

Package Variant

Property Value
Format layer-package
Canonical source ref unsloth/Qwen3-8B-GGUF@main/Qwen3-8B-Q4_K_M.gguf
Source revision main
Source SHA-256 120307ba529eb2439d6c430d94104dabd578497bc7bfe7e322b5d9933b449bd4
Skippy ABI 0.1.22
Package manifest SHA-256 7e3ecea929276d13b71a835ceee5cfccd2c075bd35f04a9e2bc7042c28bce0a6

What Is Included

Artifact Path Contents SHA-256
Manifest model-package.json Package schema, source identity, checksums 7e3ecea929276d13b71a835ceee5cfccd2c075bd35f04a9e2bc7042c28bce0a6
Metadata shared/metadata.gguf 0 tensors, 5.7 MB 0f665e355aceb65f6187f2961a64febb626e4da4dff785f575745f2d4c935542
Embeddings shared/embeddings.gguf 1 tensors, 339.5 MB 7bb6ba6790073cd060241136d8f0e4ba0665b887fbe7d1947bdcf7abda8605ae
Output head shared/output.gguf 2 tensors, 492.5 MB 35393b015f2eea59d8652506d75455a306330c0e54a4ecf2d5de86559f150d56
Transformer layers layers/layer-*.gguf 36 layer artifacts, 396 tensors, 4.1 GB see model-package.json

Validation

Generated by the Mesh LLM HF Jobs splitter from mesh-llm ref main and validated before upload:

skippy-model-package validate-package "/source/Qwen3-8B-Q4_K_M.gguf" "$PACKAGE_DIR"

Links

Downloads last month
2,364
GGUF
Model size
0.2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meshllm/Qwen3-8B-Q4_K_M-layers

Finetuned
Qwen/Qwen3-8B
Quantized
(2)
this model