File size: 5,864 Bytes
44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b 44bcd16 4ee431b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | ---
language: en
library_name: lf4
license: mit
pipeline_tag: sentence-similarity
tags:
- lf4
- lf4-static-embedding
- static-embedding
- 4-bit
- quantized
- code-search
- tool-search
- embedding
- codebase
- semantic-search
---
# Vortex-Embed-4.7M
`Vortex-Embed-4.7M` is an ultra-lightweight, **4-bit quantized static sentence embedding model** designed for high-throughput semantic code search and tool retrieval. Delivering a 256-dimensional space within a **4.7 MB** footprint, the model completely bypasses heavy deep learning frameworks like PyTorch or Hugging Face Transformers, making it ideal for edge computing, local IDE plugins, and resource-constrained CLI tools.
This model is deployed as the native, default embedder inside [**vortexa**](https://github.com/OEvortex/vortexa)βthe open-source AST-aware codebase indexing and semantic search engine.
---
## β‘ Key Highlights
* **Zero Heavy Dependencies:** Built strictly on NumPy, Safetensors, and Tokenizers. No PyTorch, no execution graphs, no CUDA requirements.
* **Aggressive Compression:** Compressed **6.4Γ** via LF4 block-quantization while retaining **99.69%** cosine similarity relative to the unquantized FP32 baseline.
* **Blazing Fast Execution:** Sub-millisecond inference (~0.15ms per text string) with linear search scaling.
---
## π Performance Benchmarks
### Quantization Fidelity & Speed
All metrics evaluated on a commodity x86 CPU baseline.
| Metric | Target Value | Notes |
| :--- | :--- | :--- |
| **Cosine Preservation (vs FP32)** | `0.9969` | Near-zero degradation in vector geometry |
| **Mean Squared Error (MSE)** | `0.257` | Absolute error tracking across the vocabulary |
| **Inference Latency** | `~0.15ms` | Per single text encoding execution |
| **Cold Boot / Load Time** | `~144ms` | Disk serialization to memory initialization |
| **Local Search Latency** | `14.6ms` | P50 latency across 2,707 indexed code chunks |
| **Tool Search Accuracy** | `100%` | 15/15 strict functional tool-intent matches |
### Architectural Efficiency Comparison
Why choose a quantized static embedding over a traditional Transformer-based bi-encoder architecture?
| Architectural Feature | Vortex-Embed-4.7M (Static) | BGE / BERT-Base (Transformer) |
| :--- | :--- | :--- |
| **Inference Latency** | **π 0.15ms** | ~50.0ms |
| **Cold Start Latency** | **π 144ms** | ~5000ms |
| **On-Disk Footprint** | **π 4.7 MB** | ~400+ MB |
| **Hardware Prerequisite** | **Commodity CPU** | Dedicated GPU Highly Recommended |
| **Domain Performance** | **Optimized for Code / Tools** | General Text Semantics |
---
## π οΈ Architecture & Quantization Details
The model utilizes a learned token-to-embedding static matrix combined with custom **LF4 per-block quantization**. Sentences are processed via tokenization, sequential row-lookup with inline dequantization, mean pooling, and final L2 normalization.
### Structural Topology
```text
vocab_size = 29,528 | dimensions = 256 | bits = 4 | block_size = 32
```
### Tensor Layout Matrix
The underlying weights are stored safely inside a standard `.safetensors` dictionary container:
| Tensor Target | Data Type | Dimensions / Shape | Functional Description |
| --- | --- | --- | --- |
| `embedding_packed` | `uint8` | `(29528, 128)` | 4-bit packed array space (stores two 4-bit values per byte) |
| `embedding_scales` | `float16` | `(29528, 8)` | High-precision floating-point per-block scale multiplier |
| `embedding_zeros` | `float16` | `(29528, 8)` | High-precision floating-point per-block zero-point offset |
---
## π Quickstart Installation & Usage
### Prerequisite Environment
```bash
pip install numpy safetensors tokenizers
```
### 1. Seamless Codebase Indexing (Via `vortexa`)
For turnkey directory indexing, search, and MCP support, use the official core engine:
```bash
pip install vortexa
```
```python
from vortexa.core.indexer import CodebaseIndexer
# Native integration: vortexa resolves and loads Vortex-Embed-4.7M out of the box
indexer = CodebaseIndexer(root='.')
stats = indexer.index()
# Execute high-speed vector retrieval across code chunks
results = indexer.search('find CSV parser or file tokenizer', top_k=5)
```
### 2. Standalone Low-Level Inference (No Torch Pipeline)
For custom applications or minimal CLI tools requiring zero framework overhead:
```python
from lf4_model import LF4StaticEmbedding
# Streamlined serialization layer
model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')
# Encode source text directly into normalized NumPy arrays
embeddings = model.encode(['search the web', 'read file'])
# High-performance analytical matrix search mapping
scores, indices = model.search(query_emb, doc_emb, top_k=10)
```
### 3. Sentence-Transformers Framework Compatibility
If you prefer running within standard ML pipelines, use the modern native static backend:
```bash
pip install sentence-transformers
```
```python
from sentence_transformers import SentenceTransformer
# Load using the explicit static processing engine
model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
embeddings = model.encode(['search the web', 'read file'])
```
---
## π Citation & Attributions
If you leverage this model or the `vortexa` engine in technical research, production environments, or industrial applications, please reference the repository utilizing the following BibTeX schema:
```bibtex
@software{vortex-embed-4.7m,
title = {Vortex-Embed-4.7M: High-Performance 4-Bit Static Embedding Topology},
author = {VortexAI},
year = {2025},
url = {[https://huggingface.co/VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)}
}
``` |