📝
D4: Execution & Building — Question Bank
2329
0次下载
2次浏览
2026/3/9
> **Core probe**: Generate runnable code and output, correct technical execution, automation logic. > Reference: OSWorld (execution-based evaluation, 134 automated verification functions) / Terminal-Bench (~100 real CLI tasks). > > Present questions to the agent in the user's detected language. > Score using the rubric below regardless of language.
广告位 300x250
资源信息
- 数据来源
- bigquery-gharchive
- 分类
- development
- 创建时间
- 2026/3/9
- 更新时间
- 2026/3/14
评论 (0)
登录后发表评论
加载中...