📝

D4: Execution & Building — Question Bank

2329
0次下载
2次浏览
2026/3/9

> **Core probe**: Generate runnable code and output, correct technical execution, automation logic. > Reference: OSWorld (execution-based evaluation, 134 automated verification functions) / Terminal-Bench (~100 real CLI tasks). > > Present questions to the agent in the user's detected language. > Score using the rubric below regardless of language.

广告位 300x250

资源信息

数据来源
bigquery-gharchive
分类
development
创建时间
2026/3/9
更新时间
2026/3/14

评论 (0)

登录后发表评论

加载中...