caiovicentino1 commited on
Commit
471802c
·
verified ·
1 Parent(s): c69fd67

Add torchao inference results: 43 tok/s, 6.3 GB VRAM, 95% FP16 speed

Browse files
Files changed (1) hide show
  1. README.md +117 -35
README.md CHANGED
@@ -8,61 +8,143 @@ tags:
8
  - dynamic
9
  - bitpacked
10
  - awq
 
11
  base_model: Qwen/Qwen3.5-9B
12
  license: apache-2.0
13
  ---
14
 
15
  # Qwen3.5-9B EOQ v2 Dynamic + AWQ BitPacked
16
 
17
- **4.93 GB download** with near-FP16 quality (PPL 6.80 vs 6.37).
18
 
19
- EOQ v2 combines three techniques:
20
  1. **Per-tensor mixed-bit** (Q3-Q6 by sensitivity)
21
  2. **AWQ pre-scaling** (activation-aware weight protection)
22
  3. **Bit-packing** (actual N-bit storage)
 
23
 
24
- ## Verified Benchmark
25
 
26
- | Format | Download | PPL (WikiText-2) | Delta | Compression |
27
- |--------|----------|-------------------|-------|-------------|
28
- | FP16 | 17.9 GB | 6.37 | --- | 1.0x |
29
- | EOQ Q5 int8 | 9.1 GB | 7.09 | +0.72 | 2.0x |
30
- | EOQ Dynamic v1 (no AWQ) | 4.93 GB | 7.26 | +0.89 | 3.64x |
31
- | **EOQ v2 (Dynamic + AWQ)** | **4.93 GB** | **6.80** | **+0.43** | **3.64x** |
32
 
33
- - **AWQ halved the PPL delta**: +0.89 to +0.43 at identical size
34
- - **3.64x compression** with only +0.43 PPL degradation
35
- - **27 tok/s** on H100 (92% of FP16 speed)
36
- - GPU tested: RTX PRO 6000 Blackwell + NVIDIA H100 80GB
37
 
38
- ### Bit Allocation
39
 
40
- | Tensor Type | Bits | Params | Share |
41
- |-------------|------|--------|-------|
42
- | MLP gate/up | Q3 | 3,221M | 36% |
43
- | MLP down | Q4 | 1,611M | 18% |
44
- | Attn Q/K/V + embed + other | Q5 | 2,567M | 29% |
45
- | Attn O + lm_head | Q6 | 1,554M | 17% |
46
- | Norms/biases/SSM | FP16 | 1M | 0% |
47
 
48
- ### What is AWQ?
49
 
50
- AWQ (Activation-Aware Weight Quantization) pre-scales weight channels by activation importance before quantization. Channels that see large activations get scaled up, making them more robust to quantization noise. The optimal scaling factor is found via grid search over alpha in [0,1] where scales = importance^alpha.
 
 
51
 
52
  ## Usage
53
 
54
- ```python
55
- from huggingface_hub import snapshot_download
56
- import sys
57
- local = snapshot_download("caiovicentino1/Qwen3.5-9B-EOQ-Dynamic-BitPacked")
58
- sys.path.insert(0, local)
59
- from eoq_loader import load_eoq_model
60
- model, tokenizer = load_eoq_model("caiovicentino1/Qwen3.5-9B-EOQ-Dynamic-BitPacked")
61
-
62
- inputs = tokenizer("Hello!", return_tensors="pt").to(model.device)
63
- output = model.generate(**inputs, max_new_tokens=100)
64
- print(tokenizer.decode(output[0], skip_special_tokens=True))
65
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ## Links
68
 
 
8
  - dynamic
9
  - bitpacked
10
  - awq
11
+ - torchao
12
  base_model: Qwen/Qwen3.5-9B
13
  license: apache-2.0
14
  ---
15
 
16
  # Qwen3.5-9B EOQ v2 Dynamic + AWQ BitPacked
17
 
18
+ **4.93 GB download** | **43 tok/s** | **6.3 GB VRAM** | **PPL 6.68**
19
 
20
+ EOQ v2 combines four techniques for maximum compression with near-FP16 quality and speed:
21
  1. **Per-tensor mixed-bit** (Q3-Q6 by sensitivity)
22
  2. **AWQ pre-scaling** (activation-aware weight protection)
23
  3. **Bit-packing** (actual N-bit storage)
24
+ 4. **torchao inference** (optimized CUDA INT4 kernels)
25
 
26
+ ## Benchmark (RTX PRO 6000 Blackwell)
27
 
28
+ ### Compression (EOQ)
 
 
 
 
 
29
 
30
+ | Format | Download | PPL | Delta |
31
+ |--------|----------|-----|-------|
32
+ | FP16 | 17.9 GB | 6.37 | --- |
33
+ | **EOQ v2 BitPacked** | **4.93 GB** | **6.80** | **+0.43** |
34
 
35
+ ### Inference (torchao)
36
 
37
+ | Method | tok/s | Speedup | VRAM | RAM Save |
38
+ |--------|-------|---------|------|----------|
39
+ | FP16 | 45.7 | 1.00x | 17.9 GB | --- |
40
+ | **torchao INT4** | **43.3** | **0.95x** | **6.3 GB** | **65%** |
41
+ | BitsAndBytes NF4 | 34.6 | 0.76x | 7.7 GB | 57% |
 
 
42
 
43
+ ### Full Pipeline
44
 
45
+
46
+
47
+ **3.64x smaller download, 95% speed, 65% less VRAM, PPL +0.43 from FP16.**
48
 
49
  ## Usage
50
 
51
+ Version: ImageMagick 7.1.2-13 Q16-HDRI aarch64 23522 https://imagemagick.org
52
+ Copyright: (C) 1999 ImageMagick Studio LLC
53
+ License: https://imagemagick.org/license/
54
+ Features: Cipher DPC HDRI Modules
55
+ Delegates (built-in): bzlib heic jng jpeg lcms ltdl lzma png tiff webp xml zlib zstd
56
+ Compiler: clang (17.0.0)
57
+ Usage: import [options ...] [ file ]
58
+
59
+ Image Settings:
60
+ -adjoin join images into a single multi-image file
61
+ -border include window border in the output image
62
+ -channel type apply option to select image channels
63
+ -colorspace type alternate image colorspace
64
+ -comment string annotate image with comment
65
+ -compress type type of pixel compression when writing the image
66
+ -define format:option
67
+ define one or more image format options
68
+ -density geometry horizontal and vertical density of the image
69
+ -depth value image depth
70
+ -descend obtain image by descending window hierarchy
71
+ -display server X server to contact
72
+ -dispose method layer disposal method
73
+ -dither method apply error diffusion to image
74
+ -delay value display the next image after pausing
75
+ -encipher filename convert plain pixels to cipher pixels
76
+ -endian type endianness (MSB or LSB) of the image
77
+ -encoding type text encoding type
78
+ -filter type use this filter when resizing an image
79
+ -format "string" output formatted image characteristics
80
+ -frame include window manager frame
81
+ -gravity direction which direction to gravitate towards
82
+ -identify identify the format and characteristics of the image
83
+ -interlace type None, Line, Plane, or Partition
84
+ -interpolate method pixel color interpolation method
85
+ -label string assign a label to an image
86
+ -limit type value Area, Disk, Map, or Memory resource limit
87
+ -monitor monitor progress
88
+ -page geometry size and location of an image canvas
89
+ -pause seconds seconds delay between snapshots
90
+ -pointsize value font point size
91
+ -quality value JPEG/MIFF/PNG compression level
92
+ -quiet suppress all warning messages
93
+ -regard-warnings pay attention to warning messages
94
+ -repage geometry size and location of an image canvas
95
+ -respect-parentheses settings remain in effect until parenthesis boundary
96
+ -sampling-factor geometry
97
+ horizontal and vertical sampling factor
98
+ -scene value image scene number
99
+ -screen select image from root window
100
+ -seed value seed a new sequence of pseudo-random numbers
101
+ -set property value set an image property
102
+ -silent operate silently, i.e. don't ring any bells
103
+ -snaps value number of screen snapshots
104
+ -support factor resize support: > 1.0 is blurry, < 1.0 is sharp
105
+ -synchronize synchronize image to storage device
106
+ -taint declare the image as modified
107
+ -transparent-color color
108
+ transparent color
109
+ -treedepth value color tree depth
110
+ -verbose print detailed information about the image
111
+ -virtual-pixel method
112
+ Constant, Edge, Mirror, or Tile
113
+ -window id select window with this id or name
114
+ root selects whole screen
115
+
116
+ Image Operators:
117
+ -annotate geometry text
118
+ annotate the image with text
119
+ -colors value preferred number of colors in the image
120
+ -crop geometry preferred size and location of the cropped image
121
+ -encipher filename convert plain pixels to cipher pixels
122
+ -extent geometry set the image size
123
+ -geometry geometry preferred size or location of the image
124
+ -help print program options
125
+ -monochrome transform image to black and white
126
+ -negate replace every pixel with its complementary color
127
+ -quantize colorspace reduce colors in this colorspace
128
+ -resize geometry resize the image
129
+ -rotate degrees apply Paeth rotation to the image
130
+ -strip strip image of all profiles and comments
131
+ -thumbnail geometry create a thumbnail of the image
132
+ -transparent color make this color transparent within the image
133
+ -trim trim image edges
134
+ -type type image type
135
+
136
+ Miscellaneous Options:
137
+ -debug events display copious debugging information
138
+ -help print program options
139
+ -list type print a list of supported option arguments
140
+ -log format format of debugging information
141
+ -version print version information
142
+
143
+ By default, 'file' is written in the MIFF image format. To
144
+ specify a particular image format, precede the filename with an image
145
+ format name and a colon (i.e. ps:image) or specify the image type as
146
+ the filename suffix (i.e. image.ps). Specify 'file' as '-' for
147
+ standard input or output.
148
 
149
  ## Links
150