Embedl

Team

company

https://www.embedl.com

embedl

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

JonnaMat updated a model about 12 hours ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2

JonnaMat updated a Space about 14 hours ago

embedl/Edge-Inference-Benchmarks

JonnaMat new activity about 14 hours ago

embedl/Edge-Inference-Benchmarks:Demo

View all activity

JonnaMat

updated a model about 12 hours ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2

Image-Text-to-Text • 2B • Updated about 12 hours ago • 13k • 11

JonnaMat

updated a Space about 14 hours ago

Edge Inference Benchmarks

🚀

On-Device benchmarks across devices and models.

JonnaMat

in embedl/Edge-Inference-Benchmarks about 14 hours ago

Demo

#5 opened 3 days ago by

JonnaMat

updated a dataset about 14 hours ago

embedl/documentation-images

Viewer • Updated about 14 hours ago • 7 • 3.67k

JonnaMat

updated 3 collections about 18 hours ago

posted an update 2 days ago

Post

5851

🚀 FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference

🔎 Check out our latest FlashHead-enabled model: embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

🧩 Seamless integration with vllm:

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  embedl/vllm:latest-jetson-orin-flashhead \
  vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead" \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.75 \
    --max-num-seqs 2 \
    --trust-remote-code

1 reply

JonnaMat

in embedl/Edge-Inference-Benchmarks 3 days ago

Add accuracy data

#4 opened 9 days ago by

JonnaMat

published a model 3 days ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

Image-Text-to-Text • 2B • Updated 3 days ago • 749 • 6

AI & ML interests

Recent Activity

Team members 6

embedl's activity

Edge Inference Benchmarks

Demo

Add accuracy data