Instructions to use faxenoff/code-daemon-denoise-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use faxenoff/code-daemon-denoise-v1 with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
code-daemon-denoise-v1
A tiny, fast bilingual (EN + RU) word denoiser — it decides whether a single word form is a meaningful technical term (keep) or noise / ballast (drop). It ships with the UltraCode MCP server, where it runs in the knowledge-graph pipeline: classifying the UNKNOWN word forms harvested from a codebase's docs/identifiers so the search vocabulary stays clean.
- Frozen encoder —
intfloat/multilingual-e5-small(XLM-RoBERTa, 384-dim), no weight changes. Mean-pooling + L2-norm are baked into the graph. - Trained linear head — a logistic-regression probe (scikit-learn) over the 384-dim embedding,
folded with its input scaler into a single affine
P(keep) = sigmoid(w·e + b). Ships asdenoise_head.json({dim, w[384], b, strip_threshold}) — no Python at runtime; the daemon does the dot product in-process. - Vocab-pruned — the 250k-token SentencePiece vocab is cut by character class to Latin + Cyrillic + punctuation (142k tokens), lossless for EN + RU, dropping the INT8 weights from
121 MB to **76 MB**. The pruned-vocab id map is folded into a remap-Gather at the model input.
How it was made
- Encoder: export the frozen mE5-small to ONNX with mean-pool + L2-norm fused, prune the embedding table to the kept character classes, and PTQ-quantize to INT8 (NNCF) for OpenVINO.
- Head: embed a bilingual word-label set (EN: WordNet/BNC mid-frequency lemmas; RU:
Taiga/OpenCorpora/Nerus mid-Zipf) plus per-language manual gold, fit
LogisticRegression(class_weight="balanced"), then foldStandardScaler+ LR into one(w, b). Astrip_threshold(default 0.95) trades strip precision vs recall.
Words are embedded with a fixed "vocab: " prefix (the daemon pads every candidate word the same
way) so very short inputs are not dropped by batch de-duplication — the head is trained on the
prefixed embeddings, so reproduce the prefix for standalone use.
Built for speed
- Short, single-word inputs — one length bucket only: batch 64 × seq 40 (
-s_…_b64_s40). - INT8 weights (OpenVINO CPU); the embedding mean-pool + L2-norm are fused into the graph so
the output is already
[batch, 384]. - CPU-first by design — on the daemon it runs on OpenVINO CPU and is moved to a discrete GPU (TensorRT / TVM) only when the card is large (≥12 GB total VRAM) with free room.
Intended use
Per-word "is this a technical term?" classification for cleaning a search vocabulary. Encode a word
(with the "vocab: " prefix) with the bundled SentencePiece + mE5-small, then apply the linear head:
import onnxruntime as ort, sentencepiece as spm, numpy as np, json
sp = spm.SentencePieceProcessor(model_file="sentencepiece.bpe.model")
sess = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
head = json.load(open("denoise_head.json")) # {dim, w[dim], b, strip_threshold}
w, b, thr = np.array(head["w"], np.float32), head["b"], head["strip_threshold"]
def p_keep(words, max_len=40):
toks = [[2, *sp.encode("vocab: " + x)[: max_len - 2], 3] for x in words] # bos … eos
L = max(len(t) for t in toks)
ids = np.array([t + [0] * (L - len(t)) for t in toks], dtype=np.int64) # pad=0
mask = (ids != 0).astype(np.int64)
emb = sess.run(None, {"input_ids": ids, "attention_mask": mask})[0] # mean-pooled+L2 [B,384]
return 1.0 / (1.0 + np.exp(-(emb @ w + b))) # P(keep)
scores = p_keep(["mutex", "tensorrt", "пожалуйста", "asdfgh"])
# keep where score >= thr ; the rest is ballast
What's in this repo
Pre-compiled, ready-to-run engines named per runtime × GPU arch × OS (single s bucket):
- OpenVINO
*_ov_cpu_int8_b64_s40.{xml,bin}— Intel/AMD/any CPU, INT8 (the default lane). - TensorRT
*_{win_x64,linux_x64}_trt_sm_{86,89,120}.engine— NVIDIA, BF16 (optional GPU lane; the INT8 lane is OV CPU — this remap-baked SentencePiece ONNX isn't compatible with generic INT8 PTQ). - TVM
*_b64_s40_{win_x64,…}_tvm_vulkan.{dll,so}— Vulkan fallback (optional GPU lane). - Head —
denoise_head.json(the trained affine; required). - Tokenizer —
sentencepiece.bpe.model(+tokenizer_config.json). The daemon feeds raw SentencePiece ids; the fairseq +1 offset and pruned-vocab remap are baked into the ONNX. - ONNX source —
model.onnx(FP32, pruned, mean-pool + L2-norm + remap fused) — the build source for the TRT/TVM engines and for standaloneonnxruntimeuse.
Evaluation
On a frozen held-out word set (EN + RU): SAFE F1 ≈ 0.79, BALLAST F1 ≈ 0.84, strip precision ≈
0.88 at strip_threshold = 0.95. The INT8 vocab-pruned build matches the full-vocab FP build (F1 0.79
vs 0.79) at 38 % of the size.
License & attribution
The encoder weights are intfloat/multilingual-e5-small
(Apache-2.0), redistributed here in compiled form unchanged; this repo is therefore released under
Apache-2.0. The linear head and the build/quantization tooling are original to UltraCode. Backbone:
XLM-RoBERTa. Not legal advice.
- Downloads last month
- 5
Model tree for faxenoff/code-daemon-denoise-v1
Base model
intfloat/multilingual-e5-small
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js