ishidalab

university

https://takashiishida.github.io

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

tksii authored a paper 1 day ago

Mitigating Reward Hacking in RLHF via Advantage Sign Robustness

tksii authored a paper 1 day ago

LLM Routing with Dueling Feedback

tksii authored a paper 1 day ago

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

View all activity

Papers

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

View all Papers

tksii

authored 3 papers 1 day ago

Mitigating Reward Hacking in RLHF via Advantage Sign Robustness

Paper • 2604.02986 • Published Apr 3 • 2

LLM Routing with Dueling Feedback

Paper • 2510.00841 • Published Oct 1, 2025

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Paper • 2606.07379 • Published 8 days ago • 5

skydddoogg

authored a paper 1 day ago

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Paper • 2606.07379 • Published 8 days ago • 5

skydddoogg

in ishidalab/capcode 1 day ago

Add task category and license metadata

#2 opened 1 day ago by

nielsr

skydddoogg

submitted a paper to Daily Papers 2 days ago

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Paper • 2606.07379 • Published 8 days ago • 5

skydddoogg

updated a dataset 5 days ago

ishidalab/capcode

Viewer • Updated 1 day ago • 756 • 41

skydddoogg

published a dataset 5 days ago

ishidalab/capcode

Viewer • Updated 1 day ago • 756 • 41

tksii

updated a dataset 13 days ago

ishidalab/capbencher

Viewer • Updated 13 days ago • 15.5k • 103 • 2

skydddoogg

authored a paper 4 months ago

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Paper • 2505.18102 • Published May 23, 2025 • 2

tksii

authored 2 papers 4 months ago

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

Paper • 2506.08762 • Published Jun 10, 2025

How Can I Publish My LLM Benchmark Without Giving the True Answers Away?

Paper • 2505.18102 • Published May 23, 2025 • 2

skydddoogg

updated a dataset 4 months ago

ishidalab/capbencher

Viewer • Updated 13 days ago • 15.5k • 103 • 2

tksii

published a dataset 4 months ago

ishidalab/capbencher

Viewer • Updated 13 days ago • 15.5k • 103 • 2

tksii

updated a dataset 4 months ago

ishidalab/capbencher

Viewer • Updated 13 days ago • 15.5k • 103 • 2

AI & ML interests

Recent Activity

Papers

Team members 4

ishidalab's activity

Add task category and license metadata