by agentrebench · Agent Tool · ★ 68
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
AgentRE-Bench A benchmark for evaluating LLM agents on long-horizon reverse engineering tasks with deterministic scoring. Platform: Linux/Unix (ELF x86-64). Windows PE support planned for a future release. AgentRE-Bench gives an LLM agent a compiled ELF binary and a set of Linux static analysis tools (strings, objdump, readelf, etc.), then measures how well it can identify C2 infrastructure, encoding schemes, anti-analysis techniques, and communication protocols — all without human guidance. Why This Benchmark? Why Synthetic? All 13 binaries are compiled from purpose-built C sources with known ground truths. This gives us: Deterministic judging — every field has an exact expected answer, no ambiguity Controlled difficulty progression — from plaintext TCP shells (level 1) to metamorphic droppers with RC4 encryption (level 13) Reproducibility — anyone can compile identical binaries and verify scores Real malware would require subjective expert judgment and introduce licensing, ethics, and reproducibility issues. Synthetic samples eliminate all of that while testing the same analytical capabilities. Why Agentic? Traditional RE benchmarks ask a model a question and check the answer.
| Stars | 68 |
| Forks | 7 |
| Language | Python |
| Category | Agent Tool |
| License | MIT |
| Quality Score | 71.7476568415441/100 |
| Open Issues | 1 |
| Last Updated | 2026-07-01 |
| Created | 2026-02-12 |
| Platforms | python |
| Est. Tokens | ~16k |
These tools work well together with AgentRE-Bench for enhanced workflows:
Explore other popular agent tool tools:
AgentRE-Bench is AgentRE-Bench is an agentic benchmark that evaluates state-of-the-art models on long-horizon reverse engineering tasks, measuring their ability to analyze binaries, use tooling effectively, and reason. It is categorized as a Agent Tool with 68 GitHub stars.
AgentRE-Bench is primarily written in Python.
You can find installation instructions and usage details in the AgentRE-Bench GitHub repository at github.com/agentrebench/AgentRE-Bench. The project has 68 stars and 7 forks, indicating an active community.
AgentRE-Bench is released under the MIT license, making it free to use and modify according to the license terms.