AgentRE-Bench — Agent Tool by agentrebench

Last updated: 2026-07-01 · Indexed by AgentSkillsHub · Auto-synced every 8h

About AgentRE-Bench

AgentRE-Bench A benchmark for evaluating LLM agents on long-horizon reverse engineering tasks with deterministic scoring. Platform: Linux/Unix (ELF x86-64). Windows PE support planned for a future release. AgentRE-Bench gives an LLM agent a compiled ELF binary and a set of Linux static analysis tools (strings, objdump, readelf, etc.), then measures how well it can identify C2 infrastructure, encoding schemes, anti-analysis techniques, and communication protocols — all without human guidance. Why This Benchmark? Why Synthetic? All 13 binaries are compiled from purpose-built C sources with known ground truths. This gives us: Deterministic judging — every field has an exact expected answer, no ambiguity Controlled difficulty progression — from plaintext TCP shells (level 1) to metamorphic droppers with RC4 encryption (level 13) Reproducibility — anyone can compile identical binaries and verify scores Real malware would require subjective expert judgment and introduce licensing, ethics, and reproducibility issues. Synthetic samples eliminate all of that while testing the same analytical capabilities. Why Agentic? Traditional RE benchmarks ask a model a question and check the answer.

Quick Facts

Stars	68
Forks	7
Language	Python
Category	Agent Tool
License	MIT
Quality Score	71.7476568415441/100
Open Issues	1
Last Updated	2026-07-01
Created	2026-02-12
Platforms	python
Est. Tokens	~16k

Compatible Skills

These tools work well together with AgentRE-Bench for enhanced workflows:

Reversecore_MCP — semantic(0.18)+complementary+same_lang+similar_pop+shared_platform (56%)

More Agent Tool Tools

Explore other popular agent tool tools:

View all Agent Tool tools →

Popular Python Agent Tools

TrendRadar ⭐ 60.2k · MCP Server
gpt-researcher ⭐ 27.9k · MCP Server
Scrapling ⭐ 67.2k · MCP Server
serena ⭐ 26.0k · MCP Server
MaxKB ⭐ 21.6k · MCP Server

Frequently Asked Questions

What is AgentRE-Bench?

AgentRE-Bench is AgentRE-Bench is an agentic benchmark that evaluates state-of-the-art models on long-horizon reverse engineering tasks, measuring their ability to analyze binaries, use tooling effectively, and reason. It is categorized as a Agent Tool with 68 GitHub stars.

What programming language is AgentRE-Bench written in?

AgentRE-Bench is primarily written in Python.

How do I install or use AgentRE-Bench?

You can find installation instructions and usage details in the AgentRE-Bench GitHub repository at github.com/agentrebench/AgentRE-Bench. The project has 68 stars and 7 forks, indicating an active community.

What license does AgentRE-Bench use?

AgentRE-Bench is released under the MIT license, making it free to use and modify according to the license terms.

View on GitHub → Browse Agent Tool tools