Datasets

DatasetTasks
MichaelY310/devopsgym
728
pgcodellm/rebench-v2-test
20
factory-ai/legacy-bench
10
maxbittker/runebench
32
scale-ai/swe-atlas-qna
124
webgen-bench/webgen-bench
101
cookbook/test
1
scale-ai/swe-atlas-tw
90
cais/swebenchpro
731
futurehouse/bixbench
205
quesma/compilebench
15
bigcode/bigcodebench-hard-complete
145
LiteCoder/LiteCoder-rl
602
grafana/o11y-bench
63
nvats/codeskills-bench
23
gabeorlanski/slopcodebench
36
yanagiorigami/frontier-cs
172
sierra-research/tau3-bench
375
rexbench/rexbench
2
minnesotanlp/aar
1,400
cmu/refav
1,500
adyen/dabstep
450
xlang/ds-1000
1,000
apple/mmau
1,000
openthoughts/openthoughts-tblite
100
openai/mmmlu
150
meta/mlgym-bench
12
featurebench/featurebench-lite
30
featurebench/featurebench-modal
200
bigcode/humanevalfix
164