kostakoff (Dmitrii Kostakov)

posted an update 27 days ago

Post

166

Just for fun, let's run the Alibaba MNN benchmark on a DGX Spark!

From time to time, I look for something new or unusual in the AI world, and recently I stumbled upon MNN — a direct competitor to llama.cpp.

I found this project intriguing and set a small goal for myself: to run it on my DGX Spark. I was glad to see that MNN is open-source under the Apache 2.0 license, meaning I was free to fork and modify it.

However, MNN had a few issues out of the box:
- No support for CUDA 13.0
- No support for the Blackwell architecture sm_12
- No built-in support for CUDA benchmarking

I tackled these issues one by one and successfully compiled MNN on the DGX Spark. The benchmark results are currently quite low, but at least it works! Patch file here https://github.com/alibaba/MNN/issues/4289#issuecomment-4093931887

Here is the step-by-step guide on how I built MNN:

mkdir mnn && cd mnn
# Get the code
git clone https://github.com/alibaba/MNN.git
cd MNN

# Reset repo to a specific commit
git reset --hard b1d06d68b3366183d157f0703d7b8a8b61ae55b3

# Apply patch for CUDA 13.0
git apply ../my_changes.patch

mkdir build && cd build
# Configure the project
cmake .. \
  -DMNN_CUDA=ON \
  -DMNN_BUILD_LLM=ON \
  -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
  -DCMAKE_BUILD_TYPE=Release

# Build libraries and executable binaries
cmake --build . --config Release -j$(nproc)
make -j$(nproc)

How to run the test:
- Download the MNN model: taobao-mnn/Qwen3-30B-A3B-MNN
- Run the benchmark:

./MNN/build/llm_bench -m /path/to/qwen/config.json -a cuda -c 2 -p 512 -n 128 -kv true -rep 3

It works!
Hopefully, the MNN developers will add official CUDA 13 support.

llmlaba

posted an update about 1 month ago

Post

2147

Mining GPU Nvidia CMP 170HX - let's run some models!

To satisfy my curiosity, I investigated different GPUs and found this: a mining version of the A100 — the CMP 170HX.

It is a very interesting GPU. Based on public documentation, it has hardware similar to the datacenter A100. If you open it up and look at the board, you will see that it's very similar to an A100 board; it even has NVLink connectors.

Online, I found almost no information about how to run it, whether it works with LLMs, or if it's supported by default Nvidia drivers and CUDA. So, I decided to test it myself.
I installed it in my lab (see previous post https://huggingface.co/posts/kostakoff/584269728210158) and found that the default nvidia-driver-570 works with it out of the box. After that, I checked if CUDA was available, and it worked too.

The next step was to try running some models:
- Stable Diffusion XL with BNB4 quantization: It took around two minutes to generate an image, but it works!
- Compiled llama.cpp for CUDA (https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#compilation): I run Mistral 7B Q4_K_M, and this actually worked even better. It was able to generate 33 tokens per second and read 400 tokens per second.

There are some limitations related to power utilization:
- When running PyTorch, it doesn't utilize more than 80 watts.
- When running llama.cpp, utilization is a bit better but still limited to 113 watts.

I found this GitHub thread about the Nvidia CMP https://github.com/dartraiden/NVIDIA-patcher/issues/73, and it looks like this mining GPU has an internal rate limiter based on FMA compute calls. I haven't found a solution to bypass it yet.

llmlaba

1 reply

·

repliedto nicolay-r's post about 2 months ago

Good luck!

reactedto nicolay-r's post with 👍 about 2 months ago

Post

323

📢 I know this space is mosly for sharing works, but in this case I am open to work 💼🇬🇧

I know there are outstanding research labs and teams following this space 🙌

I would genuinely love to learn on ways to contribute, learn from strong lab environments, and help shape ideas into working systems.

What I bring (https://nicolayr.com):
• Applied NLP & deployment LLM-powered worflows with reasoning for IR (LangChain, LiteLLM)
• Architectures Engeneering: Transformers and / or backends (PyTorch, Tensorflow, flaxformer)
• End-to-end engineering: Frontend (JS, ReactJS) → Backend REST APIs (FastAPI) / Keycloak → Docker / NGINX → Cloud / MLOps
• Domain-specific experience in Healthcare (deploy & handle: DICOM-SR/SEG, NIFTI, databases: ORTHANC, Frontend: OHIF / cornerstone)
• Pasion about open-source NLP tooling for handling data (https://github.com/nicolay-r)

Would be happy to connect or hear any relevant suggestions on seeking for team 🧩

2 replies

·

reactedto darkc0de's post with 👀 about 2 months ago

Post

9722

1440GB of VRAM is incredibly satisfying 😁

17 replies

·

repliedto darkc0de's post about 2 months ago

AGPL-3.0 license - is a death sentence for software
https://github.com/p-e-w/heretic?tab=AGPL-3.0-1-ov-file#readme
Teams or Organizations will never use it

repliedto mitkox's post about 2 months ago

Excellent set!

I'll probably have something similar, but in my next life.

reactedto mitkox's post with ❤️ about 2 months ago

Post

5487

My USB charger has a Blackwell GPU and 128GB RAM.
What. A. Time. To. Be. Alive.
People in Sofia: “It’s freezing.”
Me: sitting next to 3kW of space AI heaters on my desk 👀
1x GLM-5, 2x MiniMax-M2.5, 1x Qwen3 Coder Next; all on single Aibrix/K8s cluster

6 replies

·

posted an update about 2 months ago

Post

2905

I found it very funny that the Hugging Face profile has a specific section where we can share our hardware.

It really brings back memories of the good old days when we used to flex our custom PC specs on enthusiast forums 20 years ago! That inspired me to fill out my own profile and share it here.

And this is my first set of GPUs that I am using to learn MLOps:
- RTX 3090 – the best one; unfortunately it doesn't support the latest FP8 and FP4, but it’s still very powerful.
- Tesla V100 – performance is almost like the RTX 3090, just much older.
- Tesla P100 – old, and doesn't have tensor cores, but still can handle small models.
- Radeon MI50 – old, similar to the P100, but uses ROCm instead of CUDA, which is actually a pretty good experience to setup.
- GTX 1080 Ti – mostly useless, no FP16 support.
- GTX 1660 – first generation of the Turing architecture, but mostly useless.

llmlaba

5 replies

·

reactedto their post with 🤗 about 2 months ago

Post

3345

My home lab for AI models - llmlaba v1

After I began learning MLOps I realized that I needed some kind of home lab, there are a lot of GPUs that I need to learn how to set up and test.
So I spent some time to do a researching which platform I could buy or build.
My requirements ware:
- Limited budget
- Power supply 1 kW or higher
- Few PCIe slots to be able to install more than one gpu
- Zero maintenance cost, I don't want spend a lot of time or money to maintain lab hardware, except for the GPUs

I chose the Intel Mac Pro 7.1:
- Prices on eBay acceptable
- Excelent cooling
- 1.4 kW power supply
- 7 PCIe slots
- Zero maintenance: I don't need to do anything with the Mac Pro hardware; it just works
- Classic UEFI boot loader

It requires a bit of OS preparation:
1. Install Ubuntu 24.04 (it works with the general PC ISO image)
2. Set up T2 drivers

sudo apt install -y dkms linux-headers-$(uname -r) applesmc-t2 apple-bce lm-sensors

3. Install t2fanrd to manually manage fans (/etc/t2fand.conf) https://wiki.t2linux.org/guides/fan/
4. Fix PCIe BAR: add pci=realloc to GRUB_CMDLINE_LINUX_DEFAULT so the Linux kernel will properly initializes server GPUs without Graphics Output Protocol
5. Install NVIDIA GPU driver:

sudo apt install nvidia-driver-570

And it works!
I was able to run server-grade Nvidia Tesla P100 (required DIY air duct), and consumer Nvidia Titan X, Titan V, GTX 1080 cards on the old Mac Pro 7.1 - even three in parallel.

llmlaba

3 replies

·

posted an update 2 months ago

Post

3345

My home lab for AI models - llmlaba v1

After I began learning MLOps I realized that I needed some kind of home lab, there are a lot of GPUs that I need to learn how to set up and test.
So I spent some time to do a researching which platform I could buy or build.
My requirements ware:
- Limited budget
- Power supply 1 kW or higher
- Few PCIe slots to be able to install more than one gpu
- Zero maintenance cost, I don't want spend a lot of time or money to maintain lab hardware, except for the GPUs

I chose the Intel Mac Pro 7.1:
- Prices on eBay acceptable
- Excelent cooling
- 1.4 kW power supply
- 7 PCIe slots
- Zero maintenance: I don't need to do anything with the Mac Pro hardware; it just works
- Classic UEFI boot loader

It requires a bit of OS preparation:
1. Install Ubuntu 24.04 (it works with the general PC ISO image)
2. Set up T2 drivers

sudo apt install -y dkms linux-headers-$(uname -r) applesmc-t2 apple-bce lm-sensors

3. Install t2fanrd to manually manage fans (/etc/t2fand.conf) https://wiki.t2linux.org/guides/fan/
4. Fix PCIe BAR: add pci=realloc to GRUB_CMDLINE_LINUX_DEFAULT so the Linux kernel will properly initializes server GPUs without Graphics Output Protocol
5. Install NVIDIA GPU driver:

sudo apt install nvidia-driver-570

And it works!
I was able to run server-grade Nvidia Tesla P100 (required DIY air duct), and consumer Nvidia Titan X, Titan V, GTX 1080 cards on the old Mac Pro 7.1 - even three in parallel.

llmlaba

3 replies

·

posted an update 3 months ago

Post

809

I created list of models based on permissive license (apache2, mit, openrail) and raw fp16 weights.
LLM:
- Mistral 7b v1
- Falcon 7b
- GLM4 9b
- Olmo3 7b
- Yi 9b
- Qwen3 8b
- Internlm3 8B
- PHI4
Multimodal LLM:
- Pixtral 12b
- Qwen3-VL-8B-Instruct
Picture generation:
- Stable Diffusion 1.5
- Stable Diffusion 2.0
- Stable Diffusion XL
Video generation:
- WAN 2.1 VACE Diffusers
TTS:
- SUNO Bark

This can be very useful for those who are just starting their AI LLM journey in PyTorch, like me.
Suggestions in the comments are welcome.

reactedto their post with 🚀 3 months ago

Post

1523

Hi Everyone!
I am new in LLM AI, and I think it is very cool.
Thanks HF authors that Hugging Face exist. It is great place to start learning AI.

2 replies

·

posted an update 3 months ago

Post

1523

Hi Everyone!
I am new in LLM AI, and I think it is very cool.
Thanks HF authors that Hugging Face exist. It is great place to start learning AI.

2 replies

·

Dmitrii Kostakov PRO

AI & ML interests

Recent Activity

Organizations

Dmitrii Kostakov PRO

AI & ML interests

Recent Activity

Organizations

kostakoff's activity