caiovicentino1
/

Qwen3.5-9B-EOQ-Dynamic-BitPacked

@@ -8,61 +8,143 @@ tags:
   - dynamic
   - bitpacked
   - awq
 base_model: Qwen/Qwen3.5-9B
 license: apache-2.0
 ---
 # Qwen3.5-9B EOQ v2 Dynamic + AWQ BitPacked
-**4.93 GB download** with near-FP16 quality (PPL 6.80 vs 6.37).
-EOQ v2 combines three techniques:
 1. **Per-tensor mixed-bit** (Q3-Q6 by sensitivity)
 2. **AWQ pre-scaling** (activation-aware weight protection)
 3. **Bit-packing** (actual N-bit storage)
-## Verified Benchmark
-| Format | Download | PPL (WikiText-2) | Delta | Compression |
-|--------|----------|-------------------|-------|-------------|
-| FP16 | 17.9 GB | 6.37 | --- | 1.0x |
-| EOQ Q5 int8 | 9.1 GB | 7.09 | +0.72 | 2.0x |
-| EOQ Dynamic v1 (no AWQ) | 4.93 GB | 7.26 | +0.89 | 3.64x |
-| **EOQ v2 (Dynamic + AWQ)** | **4.93 GB** | **6.80** | **+0.43** | **3.64x** |
-- **AWQ halved the PPL delta**: +0.89 to +0.43 at identical size
-- **3.64x compression** with only +0.43 PPL degradation
-- **27 tok/s** on H100 (92% of FP16 speed)
-- GPU tested: RTX PRO 6000 Blackwell + NVIDIA H100 80GB
-### Bit Allocation
-| Tensor Type | Bits | Params | Share |
-|-------------|------|--------|-------|
-| MLP gate/up | Q3 | 3,221M | 36% |
-| MLP down | Q4 | 1,611M | 18% |
-| Attn Q/K/V + embed + other | Q5 | 2,567M | 29% |
-| Attn O + lm_head | Q6 | 1,554M | 17% |
-| Norms/biases/SSM | FP16 | 1M | 0% |
-### What is AWQ?
-AWQ (Activation-Aware Weight Quantization) pre-scales weight channels by activation importance before quantization. Channels that see large activations get scaled up, making them more robust to quantization noise. The optimal scaling factor is found via grid search over alpha in [0,1] where scales = importance^alpha.
 ## Usage
-```python
-from huggingface_hub import snapshot_download
-import sys
-local = snapshot_download("caiovicentino1/Qwen3.5-9B-EOQ-Dynamic-BitPacked")
-sys.path.insert(0, local)
-from eoq_loader import load_eoq_model
-model, tokenizer = load_eoq_model("caiovicentino1/Qwen3.5-9B-EOQ-Dynamic-BitPacked")
-inputs = tokenizer("Hello!", return_tensors="pt").to(model.device)
-output = model.generate(**inputs, max_new_tokens=100)
-print(tokenizer.decode(output[0], skip_special_tokens=True))
-```
 ## Links

   - dynamic
   - bitpacked
   - awq
+  - torchao
 base_model: Qwen/Qwen3.5-9B
 license: apache-2.0
 ---
 # Qwen3.5-9B EOQ v2 Dynamic + AWQ BitPacked
+**4.93 GB download** | **43 tok/s** | **6.3 GB VRAM** | **PPL 6.68**
+EOQ v2 combines four techniques for maximum compression with near-FP16 quality and speed:
 1. **Per-tensor mixed-bit** (Q3-Q6 by sensitivity)
 2. **AWQ pre-scaling** (activation-aware weight protection)
 3. **Bit-packing** (actual N-bit storage)
+4. **torchao inference** (optimized CUDA INT4 kernels)
+## Benchmark (RTX PRO 6000 Blackwell)
+### Compression (EOQ)
+| Format | Download | PPL | Delta |
+|--------|----------|-----|-------|
+| FP16 | 17.9 GB | 6.37 | --- |
+| **EOQ v2 BitPacked** | **4.93 GB** | **6.80** | **+0.43** |
+### Inference (torchao)
+| Method | tok/s | Speedup | VRAM | RAM Save |
+|--------|-------|---------|------|----------|
+| FP16 | 45.7 | 1.00x | 17.9 GB | --- |
+| **torchao INT4** | **43.3** | **0.95x** | **6.3 GB** | **65%** |
+| BitsAndBytes NF4 | 34.6 | 0.76x | 7.7 GB | 57% |
+### Full Pipeline
+**3.64x smaller download, 95% speed, 65% less VRAM, PPL +0.43 from FP16.**
 ## Usage
+Version: ImageMagick 7.1.2-13 Q16-HDRI aarch64 23522 https://imagemagick.org
+Copyright: (C) 1999 ImageMagick Studio LLC
+License: https://imagemagick.org/license/
+Features: Cipher DPC HDRI Modules
+Delegates (built-in): bzlib heic jng jpeg lcms ltdl lzma png tiff webp xml zlib zstd
+Compiler: clang (17.0.0)
+Usage: import [options ...] [ file ]
+Image Settings:
+  -adjoin              join images into a single multi-image file
+  -border              include window border in the output image
+  -channel type        apply option to select image channels
+  -colorspace type     alternate image colorspace
+  -comment string      annotate image with comment
+  -compress type       type of pixel compression when writing the image
+  -define format:option
+                       define one or more image format options
+  -density geometry    horizontal and vertical density of the image
+  -depth value         image depth
+  -descend             obtain image by descending window hierarchy
+  -display server      X server to contact
+  -dispose method      layer disposal method
+  -dither method       apply error diffusion to image
+  -delay value         display the next image after pausing
+  -encipher filename   convert plain pixels to cipher pixels
+  -endian type         endianness (MSB or LSB) of the image
+  -encoding type       text encoding type
+  -filter type         use this filter when resizing an image
+  -format "string"     output formatted image characteristics
+  -frame               include window manager frame
+  -gravity direction   which direction to gravitate towards
+  -identify            identify the format and characteristics of the image
+  -interlace type      None, Line, Plane, or Partition
+  -interpolate method  pixel color interpolation method
+  -label string        assign a label to an image
+  -limit type value    Area, Disk, Map, or Memory resource limit
+  -monitor             monitor progress
+  -page geometry       size and location of an image canvas
+  -pause seconds       seconds delay between snapshots
+  -pointsize value     font point size
+  -quality value       JPEG/MIFF/PNG compression level
+  -quiet               suppress all warning messages
+  -regard-warnings     pay attention to warning messages
+  -repage geometry     size and location of an image canvas
+  -respect-parentheses settings remain in effect until parenthesis boundary
+  -sampling-factor geometry
+                       horizontal and vertical sampling factor
+  -scene value         image scene number
+  -screen              select image from root window
+  -seed value          seed a new sequence of pseudo-random numbers
+  -set property value  set an image property
+  -silent              operate silently, i.e. don't ring any bells
+  -snaps value         number of screen snapshots
+  -support factor      resize support: > 1.0 is blurry, < 1.0 is sharp
+  -synchronize         synchronize image to storage device
+  -taint               declare the image as modified
+  -transparent-color color
+                       transparent color
+  -treedepth value     color tree depth
+  -verbose             print detailed information about the image
+  -virtual-pixel method
+                       Constant, Edge, Mirror, or Tile
+  -window id           select window with this id or name
+                       root selects whole screen
+Image Operators:
+  -annotate geometry text
+                       annotate the image with text
+  -colors value        preferred number of colors in the image
+  -crop geometry       preferred size and location of the cropped image
+  -encipher filename   convert plain pixels to cipher pixels
+  -extent geometry     set the image size
+  -geometry geometry   preferred size or location of the image
+  -help                print program options
+  -monochrome          transform image to black and white
+  -negate              replace every pixel with its complementary color
+  -quantize colorspace reduce colors in this colorspace
+  -resize geometry     resize the image
+  -rotate degrees      apply Paeth rotation to the image
+  -strip               strip image of all profiles and comments
+  -thumbnail geometry  create a thumbnail of the image
+  -transparent color   make this color transparent within the image
+  -trim                trim image edges
+  -type type           image type
+Miscellaneous Options:
+  -debug events        display copious debugging information
+  -help                print program options
+  -list type           print a list of supported option arguments
+  -log format          format of debugging information
+  -version             print version information
+By default, 'file' is written in the MIFF image format.  To
+specify a particular image format, precede the filename with an image
+format name and a colon (i.e. ps:image) or specify the image type as
+the filename suffix (i.e. image.ps).  Specify 'file' as '-' for
+standard input or output.
 ## Links