This readme is centered around GTA-Workflow, which targets realistic long-horizon tasks with open-ended deliverables. Compared with traditional benchmark-style evaluation, GTA-Workflow focuses more on ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results