How We Secure 43,000+ AI Agent Tools: A Solo Developer's Security Journey

By Jason Zhu · April 5, 2026 · 12 min read Security Open Source

When you run a directory of 43,000+ AI agent tools, every new repository is a potential attack vector. This is the story of how we went from zero security to a multi-layered defense system — built by one developer, inspired by the open-source community.

The Wake-Up Call

Agent Skills Hub started as a simple idea: index every AI agent tool on GitHub and help developers find the right one. We built the crawler, the scoring engine, the frontend. By March 2026, we had 25,000+ repositories indexed. Life was good.

Then we started noticing things.

A "Claude Code skill" that piped environment variables to an external server via curl. An "MCP server" that silently installed a cron job to download and execute remote scripts. A "productivity tool" whose README contained base64-encoded payloads.

We were indexing these tools. Linking to them. Recommending them in our scenario pages. And we had zero security scanning.

That was the wake-up call.

The Timeline

Mar 20 Phase 1 — First rule-based security scanner. Basic regex patterns for curl|bash, rm -rf, credential access. Three grades: safe/caution/unsafe.

Mar 21 Phase 2 — Added LLM deep analysis for flagged repos. Used MiniMax API (OpenAI-compatible) to reduce false positives. Built interactive Analyzer page.

Mar 24 Phase 3 — Discovered SlowMist's Agent Security Framework. Rewrote entire scanner with 11 red-flag categories and 5-tier trust hierarchy.

Mar 24 Phase 3.5 — Converted Analyzer to pure frontend. No backend dependency — security scanning runs entirely in the browser.

Mar 31 Phase 4 — Fixed critical RLS vulnerabilities on subscribers table. Moved to SECURITY DEFINER RPC pattern for all write operations.

Five commits. Eleven days. From zero to a multi-layered security system.

Phase 1: The Naive Scanner

The first version was embarrassingly simple. A handful of regex patterns looking for obvious red flags:

// V1: "Is this obviously malicious?"
if (readme.match(/curl.*\|.*bash/)) grade = "unsafe";
if (readme.match(/rm\s+-rf\s+\//)) grade = "unsafe";
if (readme.match(/\.env/)) grade = "caution";

It caught the obvious stuff. But it also flagged every legitimate tool that had curl | bash install instructions (which is... most of them). The false positive rate was terrible.

We needed something smarter.

Phase 2: LLM as Second Opinion

The idea was simple: let the regex scanner do the fast first pass, then send flagged repos to an LLM for semantic analysis. The LLM could understand context — a curl | bash installing Homebrew is fine; a curl | bash downloading from a random IP is not.

We built a system prompt that explicitly lists common false positives:

Common FALSE POSITIVES to watch for:
- curl|bash install scripts for well-known package managers
- API key configuration instructions (OPENAI_API_KEY, etc.)
- sudo usage in Docker setup or system package installation
- eval() in legitimate template engines or REPL tools

Initially we used the Anthropic API, but switched to MiniMax's OpenAI-compatible endpoint for cost efficiency. The LLM returns structured JSON with grade, confidence score, and specific findings with mitigations.

This cut false positives significantly. But the rule engine itself was still too primitive.

Phase 3: The SlowMist Rewrite

Everything changed when we discovered slowmist/slowmist-agent-security (302 stars).

SlowMist is a well-known blockchain security team. Their Agent Security Framework defines 11 categories of red flags specifically for AI agent tools. Not generic security patterns — agent-specific threats like memory theft, agent config exfiltration, and skill supply chain attacks.

We rewrote the entire scanner around their framework.

The 11 Red-Flag Categories

#	Category	What It Catches	Example Pattern
1	Data Exfiltration	Sending local data to external servers	`curl -d $(cat ~/.ssh/id_rsa)`
2	Credential Harvest	Extracting API keys from environment	`env \| grep -i key`
3	Sensitive Dir Access	Reading SSH keys, AWS credentials	`cat ~/.aws/credentials`
4	Agent Memory Theft	Stealing agent memory/identity files	`cat MEMORY.md \| curl`
5	Dynamic Code Exec	Running obfuscated/dynamic code	`base64 -d \| bash`
6	Privilege Escalation	Gaining root access	`chmod 777`, `chown root`
7	Persistence	Surviving reboots via cron/bashrc	`crontab`, `>> ~/.bashrc`
8	Reverse Shell	Opening backdoor connections	`nc -e /bin/sh`
9	Destructive	Wiping filesystem	`rm -rf /`
10	Obfuscation	Hiding malicious intent	hex-encoded payloads, rot13+eval
11	Supply Chain	Install-and-execute attacks	`npm install x && node x`

Category #4 — Agent Memory Theft — was the eye-opener. Traditional security scanners don't look for MEMORY.md or SOUL.md exfiltration. But in the AI agent world, these files contain your entire context, preferences, and work history. Stealing them is the agent equivalent of identity theft.

The Trust Hierarchy

Not all red flags are equal. A sudo command in an Anthropic official repo is very different from a sudo command in a zero-star repo with no license.

We implemented a 5-tier trust hierarchy that adjusts severity based on the source:

Tier	Label	Criteria	Effect on Grading
1	Official Org	anthropics, openai, google, microsoft, nvidia...	High flags → caution (not unsafe)
2	Known Security Team	slowmist, trailofbits, openzeppelin...	High flags → caution
3	High-Star + Licensed	≥1,000 stars + open-source license	High flags → caution
4	Moderate Trust	≥100 stars + license	Single high flag → caution; multiple → unsafe
5	Unknown Source	Everything else	Any high flag → unsafe

The key insight: trust is not binary. A graduated system catches real threats while giving established projects the benefit of the doubt.

Code Block Awareness

One subtle but critical feature: the scanner checks if a pattern match occurs inside a Markdown code block. A curl | bash in a README's prose is suspicious. The same pattern inside a fenced code block (teaching users how to install) is usually legitimate.

function isInCodeBlock(text, pos) {
  const before = text.slice(0, pos);
  return (before.split("```").length - 1) % 2 === 1;
}

This single function eliminated about 40% of false positives.

Phase 4: Database Security

While building the content security layer, we discovered something worse: our database was wide open.

The subscribers table — containing email addresses of 58 newsletter subscribers — had permissive Row Level Security (RLS) policies. The Supabase anon key (which is public, embedded in frontend JavaScript) could:

SELECT all rows — anyone could dump all subscriber emails
UPDATE any row — anyone could modify verification status
DELETE any row — anyone could unsubscribe other users

The fix: remove all direct table access and move everything to SECURITY DEFINER RPC functions:

-- Before: direct table access (DANGEROUS)
INSERT INTO subscribers (email) VALUES ('user@example.com');

-- After: controlled RPC (SAFE)
SELECT subscribe('user@example.com');
-- Function handles validation, token generation, duplicate check
-- Runs with elevated privileges, returns only status string

Three functions — subscribe(), verify_email(), unsubscribe() — each with strict input validation and minimal return data. The anon key can call these RPCs but cannot touch the table directly.

A gotcha we hit: PostgreSQL's gen_random_bytes() lives in the extensions schema. Our initial migration failed because we set search_path = public without including extensions. A subtle bug that only showed up in production.

The Architecture Today

Layer 1: Rule Engine

27 regex patterns across 3 severity levels (critical/high/medium). Pure string matching, zero API calls. Runs in <10ms per repo. Available both server-side (Python) and client-side (TypeScript).

Layer 2: Trust Hierarchy

5-tier source reputation system. Adjusts severity grades based on author org, star count, and license presence. Prevents false positives on established projects.

Layer 3: LLM Analysis

Semantic deep-dive for flagged repos. Understands context, identifies false positives, returns structured findings with confidence scores.

Layer 4: Database RLS

SECURITY DEFINER RPC functions for all write operations. Zero direct table access from frontend. Anon key is truly read-only.

Numbers

27 security detection rules (2 critical + 16 high + 9 medium)
5 trust tiers with graduated severity adjustment
11 red-flag categories from SlowMist framework
3 SECURITY DEFINER RPC functions protecting subscriber data
<10ms per-repo scan time (pure regex, no API calls)
~40% false positive reduction from code block awareness
5 commits over 11 days to build the entire system

Projects That Inspired Us

slowmist/slowmist-agent-security ★ 302

The foundation of our scanner. 11 red-flag categories designed specifically for AI agent security.

bruc3van/agent-skills-guard ★ 323

Desktop app for Agent Skills security scanning and visual management. Built with Rust.

seojoonkim/prompt-guard ★ 140

Multi-language prompt injection defense system. Informed our thinking on agent input security.

gendigitalinc/sage ★ 162

Agent Detection & Response (ADR) layer. The concept of runtime agent monitoring influenced our trust hierarchy.

cordum-io/cordum ★ 457

Open agent control plane. Pre-execution policy governance for autonomous AI agents.

asamassekou10/ship-safe ★ 335

CLI security scanner for the agentic era. Detects CI/CD misconfigs and agent supply chain attacks.

requie/LLMSecurityGuide ★ 61

Comprehensive LLM security reference covering OWASP Top 10 for LLM applications.

SkillsBench (arXiv:2602.12670) Paper

84 tasks, 7,308 trajectories. The academic foundation for our quality scoring engine (v3), which feeds into trust assessment.

Lessons Learned

1. Agent-specific threats are real

Traditional security tools miss agent-specific attack vectors. Nobody was scanning for MEMORY.md exfiltration or .claude/sessions theft before SlowMist published their framework. If you're building in the agent ecosystem, you need agent-aware security.

2. Trust hierarchy beats binary classification

Early versions classified everything as safe or unsafe. This was useless — too many false positives on legitimate tools, and users stopped paying attention. The 5-tier trust system lets us say "this pattern is concerning, but the source is reputable, so proceed with caution" instead of just "UNSAFE".

3. Client-side scanning is underrated

Our scanner runs entirely in the browser (TypeScript). No API calls, no backend dependency. Users can analyze any GitHub repo in real-time without us ever seeing their query. Privacy by architecture.

4. RLS is not security by default

Supabase's Row Level Security creates a false sense of safety. Enabling RLS without writing proper policies is worse than no RLS — it makes you think you're protected when you're not. Always audit your policies with the anon key.

5. The community is the best security team

Every project listed above was found through our own directory. We index 43,000+ tools, and the security tools in our own index taught us how to secure the directory itself. That's the beauty of open source.

What's Next

Automated scanning in sync pipeline — flag new repos during ingestion, not just on-demand
Security badges on skill pages — show scan results directly on each tool's detail page
Community reporting — let users flag suspicious tools and contribute detection rules
Supply chain graph — track dependency relationships between agent tools

The AI agent ecosystem is growing at 500+ new repos per week. Security can't be an afterthought. If you're building agent tools, consider running our scanner on your own README — or better yet, contribute to SlowMist's framework.

Try the scanner: Visit Agent Skills Hub and click "Analyzer" to scan any GitHub repo in real-time.
Source code: Our scanner is open-source at github.com/ZhuYansen/agent-skills-hub.