Longer inference time

by dittops - opened Apr 1, 2024

Discussion

dittops

Apr 1, 2024

Inference time seems higher than a normal fp16 model. I was expecting better throughput as the advantage of 1bit models

Andriy

Apr 2, 2024

The advantage of 1 bit models is that they are 32 smaller compared ro 32 bit model. The inference on 1 bit models includes the overhead of dequantization.

dittops

Apr 2, 2024

However, as per the paper, there is a significant improvement in memory and throughput.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment