Text-to-Image
Diffusers
TensorRT
stable-diffusion
stable-diffusion-xl
tensorrt-rtx
nvidia
ampere
bf16
Instructions to use imgailab/sdxl-bf16-ampere with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use imgailab/sdxl-bf16-ampere with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("imgailab/sdxl-bf16-ampere", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - TensorRT
How to use imgailab/sdxl-bf16-ampere with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
SDXL TensorRT-RTX BF16 Ampere
TensorRT-RTX optimized engines for Stable Diffusion XL on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision.
Model Details
- Base Model: stabilityai/stable-diffusion-xl-base-1.0
- Architecture: AMPERE (Compute Capability 8.6)
- Precision: BF16 (16-bit brain floating point)
- TensorRT-RTX Version: 1.0.0.21
- Image Resolution: 1024x1024
- Batch Size: 1 (static)
Engine Files
This repository contains 4 TensorRT engine files:
clip.trt1.0.0.21.plan- CLIP text encoderclip2.trt1.0.0.21.plan- CLIP text encoder 2unetxl.trt1.0.0.21.plan- U-Net XL diffusion modelvae.trt1.0.0.21.plan- VAE decoder
Total Size: 6.5GB
Hardware Requirements
- NVIDIA RTX 30 series (RTX 3060, 3070, 3080, 3090) or A100
- Compute Capability 8.6
- Minimum 12GB VRAM recommended
- TensorRT-RTX 1.0.0.21 runtime
Usage
# Example usage with TensorRT-RTX backend
from imageai_server.shared.tensorrt_rtx_backend import TensorRTRTXBackend
backend = TensorRTRTXBackend()
backend.load_engines("path/to/engines")
image = backend.generate("A beautiful sunset over mountains")
Performance
- Inference Speed: ~2-3 seconds per image (RTX 3090)
- Memory Usage: ~6-8GB VRAM
- Optimizations: Static shapes, BF16 precision, Ampere-specific kernels
License
This model is released under the same license as the base SDXL model (OpenRAIL++).
Built With
- TensorRT-RTX 1.0.0.21
- NVIDIA Diffusion Demo
- Built on NVIDIA GeForce RTX 3090 (Ampere 8.6)
- Downloads last month
- -
Model tree for imgailab/sdxl-bf16-ampere
Base model
stabilityai/stable-diffusion-xl-base-1.0