Papers
arxiv:2604.03253

Self-Execution Simulation Improves Coding Models

Published on Mar 11
· Submitted by
Gallil Maimon
on Apr 7
Authors:
,
,
,
,

Abstract

Code large language models can be trained to simulate program execution step-by-step, improving competitive programming performance through supervised fine-tuning and reinforcement learning with verifiable rewards.

AI-generated summary

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.

Community

Paper author Paper submitter

🚨New paper🚨 Self-Execution Simulation Improves Coding Models

Current reasoning CodeLMs before providing an answer to programming tasks.

We show that CodeLMs can be post-trained to explicitly simulate execution of tests in order to verify and fix their proposed solutions, leading to additional gains!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Self-Execution Simulation Improves Coding Models

This paper trains code LLMs to simulate program execution step-by-step, combining supervised fine-tuning on natural language execution traces with reinforcement learning using verifiable rewards. The resulting models can predict program outputs, self-verify candidate solutions against predicted execution behavior, and iteratively self-fix incorrect code — all without an external interpreter at inference time.

Key Idea

The core idea is to teach a language model to mentally execute code. Given source code and inputs, the model produces a step-by-step natural language trace of the execution and predicts the output. This execution simulation capability is trained via SFT on curated traces, then refined with RL where the reward is whether the predicted output matches the ground-truth result of actual execution.

ExecutionSimulation

Method / Approach

The trained execution simulator serves two downstream objectives. First, self-verification: given a competitive programming problem and multiple candidate solutions, the model simulates execution on test inputs and selects the candidate whose predicted outputs are most consistent. Second, self-fixing: when the simulated execution reveals an incorrect output, the model uses the trace to diagnose the bug and generate a corrected solution, iterating until the simulated output matches expectations.

SelfVerification

SelfFixing

Results

Self-execution simulation improves performance on competitive programming benchmarks through both better candidate selection (self-verification) and iterative repair (self-fixing). The approach demonstrates that internalizing execution semantics gives code models a powerful self-supervision signal that complements traditional generate-and-test workflows.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.03253
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.03253 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.03253 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.03253 in a Space README.md to link it from this page.

Collections including this paper 6