迭代困难问题

← All use cases

难度：高级

适用场景：Problems where each iteration can be scored, but the best result usually takes many passes Tasks with visual or subjective outputs that need both deterministic checks and an LLM-as-a-judge score Long-running Codex sessions where you want progress tracked clearly instead of relying on context

启动提示

I have a difficult task in this workspace and I want you to run it as an eval-driven improvement loop. Before changing anything:

Read AGENTS.md.
Find the script or command that scores the current output. Iteration loop:
Make one focused improvement at a time.
Re-run the eval command after each meaningful change.
Log the scores and what changed.
Inspect generated artifacts directly. If the output is visual, use view_image.
Keep going until both the overall score and the...

迭代困难问题 ​

启动提示 ​

迭代困难问题

启动提示