Open to Collab

82 29 56

Felix Friedrich

felfri

https://felifri.github.io/

AI & ML interests

Multimodal AI; Post-training/Inference; AI Safety; AI Alignment

Recent Activity

liked a model 2 days ago

black-forest-labs/FLUX.2-klein-base-4B

upvoted a collection 2 days ago

FLUX.2

liked a model 2 days ago

black-forest-labs/FLUX.2-klein-9b-kv

View all activity

Organizations

authored 3 papers 2 months ago

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Paper • 2511.05613 • Published Nov 6, 2025

ActivationReasoning: Logical Reasoning in Latent Activation Spaces

Paper • 2510.18184 • Published Oct 21, 2025 • 1

Inference-time Physics Alignment of Video Generative Models with Latent World Models

Paper • 2601.10553 • Published Jan 15 • 12

authored a paper 6 months ago

Measuring and Guiding Monosemanticity

Paper • 2506.19382 • Published Jun 24, 2025 • 2

authored 16 papers 9 months ago

SLR: An Automated Synthesis Framework for Scalable Logical Reasoning

Paper • 2506.15787 • Published Jun 18, 2025 • 2

How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions

Paper • 2506.16679 • Published Jun 20, 2025 • 1

Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations

Paper • 2303.09289 • Published Mar 16, 2023 • 2

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Paper • 2305.15296 • Published May 24, 2023 • 1

Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness?

Paper • 2305.18398 • Published May 28, 2023 • 2

Interactively Providing Explanations for Transformer Language Models

Paper • 2110.02058 • Published Sep 2, 2021 • 1

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You

Paper • 2401.16092 • Published Jan 29, 2024 • 1

A Typology for Exploring the Mitigation of Shortcut Behavior

Paper • 2203.03668 • Published Mar 4, 2022 • 1

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 42

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Paper • 2404.08676 • Published Apr 6, 2024 • 3

LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Paper • 2406.05113 • Published Jun 7, 2024 • 3

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs

Paper • 2411.07122 • Published Nov 11, 2024 • 2

Felix Friedrich

AI & ML interests

Recent Activity

Organizations

felfri's activity