Refact-Bench

Introduction

Refact-Bench is a benchmarking tool designed to evaluate AI models on software engineering tasks using the SWE-Bench framework. It provides a standardized environment for testing model performance on real-world programming challenges extracted from GitHub issues.

Prerequisites

Before installing Refact-Bench, ensure you have the following:

Python 3.7 or higher
Docker installed and running
Git
pip package manager

Installation

1. Install Python Dependencies

First, install the required Python packages:

pip install -e .

This will install all dependencies listed in setup.py, including the refact package.

2. Prepare the Static LSP Binary

Clone the Refact repository and build the necessary components. To reproduce SWE evaluation results, you need to use following branches of refact:

https://github.com/smallcloudai/refact/tree/swe-boosted-prompt for SWE-lite
https://github.com/smallcloudai/refact/tree/swe-boosted-prompt-verified for SWE-verified

git clone https://github.com/smallcloudai/refact.git
pip install -e ./refact/refact-agent/engine/python_binding_and_cmdline
fakeide compile-static-lsp release

This step compiles the Language Server Protocol (LSP) implementation needed for code analysis.

3. Build Native LSP Binary

cd ./refact/refact-agent/engine/
cargo build --release
mkdir -p ./python_binding_and_cmdline/refact/bin/
cp ./target/release/refact-lsp ./python_binding_and_cmdline/refact/bin/refact-lsp

This builds the Rust-based LSP binary and places it in the correct location for the Python package to use.

Configuration

Set Up Docker Integration

Create a Docker integration configuration file:

mkdir -p ~/.config/refact/integrations.d/

Then set up the Docker integration configuration (this will overwrite any existing Docker integration config):

cat > ~/.config/refact/integrations.d/docker.yaml << 'EOF'
label: ref
docker_daemon_address: ''
docker_cli_path: docker
remote_docker: false
ssh_host: ''
ssh_user: root
ssh_port: '22'
ssh_identity_file: ''
available:
  on_your_laptop: true
  when_isolated: true
confirmation:
  ask_user: []
  deny: []
EOF

This configuration allows Refact-Bench to use Docker for creating isolated environments for each benchmark task.

Usage

Running Tasks

To run a benchmark task, use the fakeide run command. For example, to run the swe-verified tasks using the claude-3-7-sonnet model:

fakeide run --api-key <REFACT-API-KEY> --model claude-3-7-sonnet --docker tasks/swe/verified --experiment my-experiment

Replace <API-KEY> with Refact key and my-experiment with a name to group your benchmark runs.

To collect the results after running tasks:

fakeide collect --experiment my-experiment

The results of the benchmark will be stored in ./results/

Local Inference

If you want to test models on your self-hosted server, specify the --address-url parameter with your local address:

fakeide run --address-url http://localhost:8080 --api-key <API-KEY> --model refact/claude-3-7-sonnet --docker tasks/swe/verified

Note: Your server should be started on 0.0.0.0. A common use case with node is:

ssh <node-name> -L 0.0.0.0:8008:0.0.0.0:8008

Preparing SWE-Bench Tasks

The translation.py script in the tasks/swe directory is used to prepare SWE-Bench tasks for evaluation. It converts SWE-Bench datasets into the format required by Refact-Bench:

cd tasks/swe
python translation.py

This script processes the SWE-Bench datasets (Lite, Lite-dev, and Verified) and generates the necessary task files in the respective directories.

Project Structure

The main components of Refact-Bench are:

refact_scenarios/: Core Python package with the implementation of the benchmarking framework
tasks/: Contains the benchmark tasks
- swe/: SWE-Bench related tasks
  - verified/: Verified SWE-Bench tasks
  - lite/: SWE-Bench Lite tasks
  - lite-dev/: Development subset of SWE-Bench Lite
  - translation.py: Script to prepare SWE-Bench tasks
fakeide-logs/: Contains logs from benchmark runs

Troubleshooting

check the logs in the fakeide-logs/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
refact_scenarios		refact_scenarios
tasks/swe		tasks/swe
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Refact-Bench

Introduction

Table of Contents

Prerequisites

Installation

1. Install Python Dependencies

2. Prepare the Static LSP Binary

3. Build Native LSP Binary

Configuration

Set Up Docker Integration

Usage

Running Tasks

Local Inference

Preparing SWE-Bench Tasks

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

smallcloudai/refact-bench

Folders and files

Latest commit

History

Repository files navigation

Refact-Bench

Introduction

Table of Contents

Prerequisites

Installation

1. Install Python Dependencies

2. Prepare the Static LSP Binary

3. Build Native LSP Binary

Configuration

Set Up Docker Integration

Usage

Running Tasks

Local Inference

Preparing SWE-Bench Tasks

Project Structure

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages