arxiv:2510.09462
Mikhail Terekhov
terekhov
AI & ML interests
Reinforcement Learning, Multi-objective Reinforcement Learning, RLHF
Recent Activity
liked a dataset about 22 hours ago
RoganInglis/control-tax upvoted a paper 6 months ago
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols authored a paper 6 months ago
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
Autoencoders