Instructions to use Ricardouchub/SarcasmDiffusion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Ricardouchub/SarcasmDiffusion with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Ricardouchub/SarcasmDiffusion", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: mit | |
| base_model: | |
| - stabilityai/stable-diffusion-xl-base-1.0 | |
| pipeline_tag: text-to-image | |
| # SarcasmDiffusion — SDXL Fused Meme Generator | |
| **Model type:** Stable Diffusion XL (Base 1.0) fine‑tuned via **LoRA** (merged/fused) to learn the *visual* style of sarcastic/ironic memes. | |
| **Author:** Ricardo Urdaneta (github.com/Ricardouchub) | |
| --- | |
| ## Overview | |
| SarcasmDiffusion is a diffusion-based generative model focused on producing **clean meme-style photographs** that are suitable for **caption overlays** (text is added *after* generation). The model was LoRA‑fine‑tuned on a filtered and enriched subset of the *Hateful Memes* dataset to capture stylistic cues of humorous/ironic memes while **avoiding offensive content**. | |
| - **Base:** `stabilityai/stable-diffusion-xl-base-1.0` | |
| - **Fine‑tuning:** LoRA on the **UNet** only; **VAE** and **text encoders** are frozen. | |
| - **Exported artifact:** **Fused SDXL** (no external LoRA required at inference). | |
| > This model focuses on **style transfer for meme aesthetics** (composition, lighting, “stock-photo vibe”), *not* on rendering text inside images. Add titles/subtitles with your own overlay function or editor. | |
| --- | |
| ## Intended Use | |
| - Generating **meme-ready images** with space at the top/bottom for captions. | |
| - Creative exploration of humorous/ironic visual setups controlled by prompts. | |
| - Educational/portfolio use for **LoRA fine‑tuning workflows** with SDXL. | |
| ### Out of Scope / Limitations | |
| - **No text rendering inside the image** (explicitly discouraged via negative prompts). | |
| - May produce **stock-like** aesthetics by design. | |
| - Not suitable for generating or amplifying **harmful, hateful, or NSFW** content. | |
| - As with all text-to-image systems, prompts with ambiguous semantics can yield unpredictable outputs. | |
| --- | |
| ## Training Summary | |
| - **Base model:** SDXL Base 1.0 | |
| - **LoRA rank / alpha / dropout:** `r=8`, `alpha=16`, `dropout=0.05` | |
| - **Resolution:** 1024 (training); common inference at 768–896 for speed | |
| - **Batch:** 1 (gradient accumulation = 4) | |
| - **Steps:** ~9k (≈2 epoch on ~5k images) | |
| - **Learning Rate:** 0.0001 | |
| - **Precision:** fp16 (LoRA params kept in fp32 during training) | |
| - **Optimizer:** AdamW | |
| - **Scheduler:** cosine with warmup (recommended) | |
| - **Frozen:** VAE, text_encoder, text_encoder_2 | |
| ### Data | |
| - Source: *Hateful Memes* (Facebook AI). | |
| - We **excluded** labeled hateful samples and applied **NLP enrichment**: | |
| - Emotion scoring (GoEmotions distilled) and irony scoring (RoBERTa‑irony). | |
| - Heuristics + percentiles → tones: `humor / irony / neutral`. | |
| - Final training CSV: prompts balanced by tone; **negative prompts** to avoid text overlays, low‑quality artifacts, watermarks/logos, and unsafe content. | |
| > The dataset is **not** included here. Please obtain *Hateful Memes* under its original terms and reproduce the preprocessing if needed. | |
| --- | |
| ## Safety, Ethics & Mitigations | |
| - Hateful labels were filtered out **negative prompts** is used to avoid NSFW/hate/text overlays. | |
| - Despite mitigations, **misuse is possible**. Users are responsible for **prompting responsibly** and complying with local laws and platform policies. | |
| - Do not use the model to create defamatory, harassing, discriminatory, or otherwise harmful imagery. | |
| **Known risks:** dataset biases may remain; aesthetic biases (stock-photo look); occasional failure to respect negative prompts. | |
| --- | |
| ## How to Use | |
| ```python | |
| from diffusers import AutoPipelineForText2Image | |
| import torch | |
| pipe = AutoPipelineForText2Image.from_pretrained( | |
| "Ricardouchub/SarcasmDiffusion", | |
| torch_dtype=torch.float16 | |
| ).to("cuda") # use "cpu" if no GPU | |
| prompt = ( | |
| "sarcastic meme about checking the fridge for the third time, " | |
| "centered subject, plain background, high-contrast photo, stock photo style" | |
| ) | |
| negative = "nsfw, hate speech, slur, watermark, logo, low quality, blurry, busy background, text overlay" | |
| g = torch.Generator(device=pipe.device).manual_seed(123) | |
| image = pipe(prompt, | |
| negative_prompt=negative, | |
| num_inference_steps=22, | |
| guidance_scale=6.3, | |
| width=896, height=896, | |
| generator=g).images[0] | |
| image.save("sample.png") | |
| ``` | |
| ### Prompting Tips | |
| - Add **layout hints**: “centered subject”, “plain background”, “space at top and bottom”. | |
| - Keep **negative prompts** to avoid logos/text/NSFW. | |
| - Use seeds for reproducibility; `steps=18–28`, `guidance=5.5–7.5`, `size=768–1024`. | |
| --- | |
| ## Environment & Compatibility | |
| To ensure full compatibility when loading this model (fused SDXL with LoRA merged), use the following library versions: | |
| | Library | Recommended Version | Notes | | |
| |----------|--------------------|-------| | |
| | **Python** | 3.10 – 3.12 | Tested on Colab (Python 3.12) | | |
| | **PyTorch** | 2.6.0 + CUDA 12.4 | Any CUDA ≥ 12 works | | |
| | **diffusers** | **0.35.1** | Core inference & model loading | | |
| | **transformers** | **4.45.2** | Required for SDXL CLIPTextEncoder compatibility | | |
| | **accelerate** | **1.10.1** | Device and fp16 inference management | | |
| | **huggingface_hub** | **0.23.5** | Compatible with diffusers 0.35.x | | |
| | **safetensors** | ≥ 0.4.5 | For secure model weights loading | | |
| **Install in Colab or local environment:** | |
| ```bash | |
| pip install "diffusers==0.35.1" "transformers==4.45.2" "accelerate==1.10.1" "huggingface_hub==0.23.5" safetensors | |
| ``` | |
| > **Important:** | |
| > Using newer versions (e.g., `transformers ≥ 4.56`) may break compatibility due to API changes in `CLIPTextModel` (`offload_state_dict` argument). | |
| > Always match the versions above for smooth loading. | |
| --- | |
| ## License | |
| - **Code:** MIT | |
| - **Model weights:** follow the base model’s license (Stability AI / SDXL Base 1.0). | |
| - **Data:** Users must obtain *Hateful Memes* from its source and agree to its terms. | |
| > By using this model, you agree not to generate content that is illegal, harmful, or violates rights of others. | |
| --- | |
| ## Evaluation | |
| Qualitative assessment via fixed prompt sheets (humor/irony/neutral). Suggested automatic metrics for future work: CLIP‑score vs. caption, aesthetic predictors, and human preference studies. | |
| --- | |
| ## Acknowledgments | |
| - Stability AI — SDXL Base 1.0 | |
| - Hugging Face — Diffusers, Accelerate, PEFT | |
| - Facebook AI — Hateful Memes dataset |