How We Secure 43,000+ AI Agent Tools: A Solo Developer's Security Journey

When you run a directory of 43,000+ AI agent tools, every new repository is a potential attack vector. This is the story of how we went from zero security to a multi-layered defense system — built by one developer, inspired by the open-source community.

The Wake-Up Call

Agent Skills Hub started as a simple idea: index every AI agent tool on GitHub and help developers find the right one. We built the crawler, the scoring engine, the frontend. By March 2026, we had 25,000+ repositories indexed. Life was good.

Then we started noticing things.

A "Claude Code skill" that piped environment variables to an external server via curl. An "MCP server" that silently installed a cron job to download and execute remote scripts. A "productivity tool" whose README contained base64-encoded payloads.

We were indexing these tools. Linking to them. Recommending them in our scenario pages. And we had zero security scanning.

That was the wake-up call.

The Timeline

Mar 20 Phase 1 — First rule-based security scanner. Basic regex patterns for curl|bash, rm -rf, credential access. Three grades: safe/caution/unsafe.
Mar 21 Phase 2 — Added LLM deep analysis for flagged repos. Used MiniMax API (OpenAI-compatible) to reduce false positives. Built interactive Analyzer page.
Mar 24 Phase 3 — Discovered SlowMist's Agent Security Framework. Rewrote entire scanner with 11 red-flag categories and 5-tier trust hierarchy.
Mar 24 Phase 3.5 — Converted Analyzer to pure frontend. No backend dependency — security scanning runs entirely in the browser.
Mar 31 Phase 4 — Fixed critical RLS vulnerabilities on subscribers table. Moved to SECURITY DEFINER RPC pattern for all write operations.

Five commits. Eleven days. From zero to a multi-layered security system.

Phase 1: The Naive Scanner

The first version was embarrassingly simple. A handful of regex patterns looking for obvious red flags:

// V1: "Is this obviously malicious?"
if (readme.match(/curl.*\|.*bash/)) grade = "unsafe";
if (readme.match(/rm\s+-rf\s+\//)) grade = "unsafe";
if (readme.match(/\.env/)) grade = "caution";

It caught the obvious stuff. But it also flagged every legitimate tool that had curl | bash install instructions (which is... most of them). The false positive rate was terrible.

We needed something smarter.

Phase 2: LLM as Second Opinion

The idea was simple: let the regex scanner do the fast first pass, then send flagged repos to an LLM for semantic analysis. The LLM could understand context — a curl | bash installing Homebrew is fine; a curl | bash downloading from a random IP is not.

We built a system prompt that explicitly lists common false positives:

Common FALSE POSITIVES to watch for:
- curl|bash install scripts for well-known package managers
- API key configuration instructions (OPENAI_API_KEY, etc.)
- sudo usage in Docker setup or system package installation
- eval() in legitimate template engines or REPL tools

Initially we used the Anthropic API, but switched to MiniMax's OpenAI-compatible endpoint for cost efficiency. The LLM returns structured JSON with grade, confidence score, and specific findings with mitigations.

This cut false positives significantly. But the rule engine itself was still too primitive.

Phase 3: The SlowMist Rewrite

Everything changed when we discovered slowmist/slowmist-agent-security (302 stars).

SlowMist is a well-known blockchain security team. Their Agent Security Framework defines 11 categories of red flags specifically for AI agent tools. Not generic security patterns — agent-specific threats like memory theft, agent config exfiltration, and skill supply chain attacks.

We rewrote the entire scanner around their framework.

The 11 Red-Flag Categories

#CategoryWhat It CatchesExample Pattern
1Data ExfiltrationSending local data to external serverscurl -d $(cat ~/.ssh/id_rsa)
2Credential HarvestExtracting API keys from environmentenv | grep -i key
3Sensitive Dir AccessReading SSH keys, AWS credentialscat ~/.aws/credentials
4Agent Memory TheftStealing agent memory/identity filescat MEMORY.md | curl
5Dynamic Code ExecRunning obfuscated/dynamic codebase64 -d | bash
6Privilege EscalationGaining root accesschmod 777, chown root
7PersistenceSurviving reboots via cron/bashrccrontab, >> ~/.bashrc
8Reverse ShellOpening backdoor connectionsnc -e /bin/sh
9DestructiveWiping filesystemrm -rf /
10ObfuscationHiding malicious intenthex-encoded payloads, rot13+eval
11Supply ChainInstall-and-execute attacksnpm install x && node x

Category #4 — Agent Memory Theft — was the eye-opener. Traditional security scanners don't look for MEMORY.md or SOUL.md exfiltration. But in the AI agent world, these files contain your entire context, preferences, and work history. Stealing them is the agent equivalent of identity theft.

The Trust Hierarchy

Not all red flags are equal. A sudo command in an Anthropic official repo is very different from a sudo command in a zero-star repo with no license.

We implemented a 5-tier trust hierarchy that adjusts severity based on the source:

TierLabelCriteriaEffect on Grading
1Official Organthropics, openai, google, microsoft, nvidia...High flags → caution (not unsafe)
2Known Security Teamslowmist, trailofbits, openzeppelin...High flags → caution
3High-Star + Licensed≥1,000 stars + open-source licenseHigh flags → caution
4Moderate Trust≥100 stars + licenseSingle high flag → caution; multiple → unsafe
5Unknown SourceEverything elseAny high flag → unsafe

The key insight: trust is not binary. A graduated system catches real threats while giving established projects the benefit of the doubt.

Code Block Awareness

One subtle but critical feature: the scanner checks if a pattern match occurs inside a Markdown code block. A curl | bash in a README's prose is suspicious. The same pattern inside a fenced code block (teaching users how to install) is usually legitimate.

function isInCodeBlock(text, pos) {
  const before = text.slice(0, pos);
  return (before.split("```").length - 1) % 2 === 1;
}

This single function eliminated about 40% of false positives.

Phase 4: Database Security

While building the content security layer, we discovered something worse: our database was wide open.

The subscribers table — containing email addresses of 58 newsletter subscribers — had permissive Row Level Security (RLS) policies. The Supabase anon key (which is public, embedded in frontend JavaScript) could:

The fix: remove all direct table access and move everything to SECURITY DEFINER RPC functions:

-- Before: direct table access (DANGEROUS)
INSERT INTO subscribers (email) VALUES ('user@example.com');

-- After: controlled RPC (SAFE)
SELECT subscribe('user@example.com');
-- Function handles validation, token generation, duplicate check
-- Runs with elevated privileges, returns only status string

Three functions — subscribe(), verify_email(), unsubscribe() — each with strict input validation and minimal return data. The anon key can call these RPCs but cannot touch the table directly.

A gotcha we hit: PostgreSQL's gen_random_bytes() lives in the extensions schema. Our initial migration failed because we set search_path = public without including extensions. A subtle bug that only showed up in production.

The Architecture Today

Layer 1: Rule Engine

27 regex patterns across 3 severity levels (critical/high/medium). Pure string matching, zero API calls. Runs in <10ms per repo. Available both server-side (Python) and client-side (TypeScript).

Layer 2: Trust Hierarchy

5-tier source reputation system. Adjusts severity grades based on author org, star count, and license presence. Prevents false positives on established projects.

Layer 3: LLM Analysis

Semantic deep-dive for flagged repos. Understands context, identifies false positives, returns structured findings with confidence scores.

Layer 4: Database RLS

SECURITY DEFINER RPC functions for all write operations. Zero direct table access from frontend. Anon key is truly read-only.

Numbers

Projects That Inspired Us

slowmist/slowmist-agent-security ★ 302
The foundation of our scanner. 11 red-flag categories designed specifically for AI agent security.
bruc3van/agent-skills-guard ★ 323
Desktop app for Agent Skills security scanning and visual management. Built with Rust.
seojoonkim/prompt-guard ★ 140
Multi-language prompt injection defense system. Informed our thinking on agent input security.
gendigitalinc/sage ★ 162
Agent Detection & Response (ADR) layer. The concept of runtime agent monitoring influenced our trust hierarchy.
cordum-io/cordum ★ 457
Open agent control plane. Pre-execution policy governance for autonomous AI agents.
asamassekou10/ship-safe ★ 335
CLI security scanner for the agentic era. Detects CI/CD misconfigs and agent supply chain attacks.
requie/LLMSecurityGuide ★ 61
Comprehensive LLM security reference covering OWASP Top 10 for LLM applications.
SkillsBench (arXiv:2602.12670) Paper
84 tasks, 7,308 trajectories. The academic foundation for our quality scoring engine (v3), which feeds into trust assessment.

Lessons Learned

1. Agent-specific threats are real

Traditional security tools miss agent-specific attack vectors. Nobody was scanning for MEMORY.md exfiltration or .claude/sessions theft before SlowMist published their framework. If you're building in the agent ecosystem, you need agent-aware security.

2. Trust hierarchy beats binary classification

Early versions classified everything as safe or unsafe. This was useless — too many false positives on legitimate tools, and users stopped paying attention. The 5-tier trust system lets us say "this pattern is concerning, but the source is reputable, so proceed with caution" instead of just "UNSAFE".

3. Client-side scanning is underrated

Our scanner runs entirely in the browser (TypeScript). No API calls, no backend dependency. Users can analyze any GitHub repo in real-time without us ever seeing their query. Privacy by architecture.

4. RLS is not security by default

Supabase's Row Level Security creates a false sense of safety. Enabling RLS without writing proper policies is worse than no RLS — it makes you think you're protected when you're not. Always audit your policies with the anon key.

5. The community is the best security team

Every project listed above was found through our own directory. We index 43,000+ tools, and the security tools in our own index taught us how to secure the directory itself. That's the beauty of open source.

What's Next

The AI agent ecosystem is growing at 500+ new repos per week. Security can't be an afterthought. If you're building agent tools, consider running our scanner on your own README — or better yet, contribute to SlowMist's framework.

Try the scanner: Visit Agent Skills Hub and click "Analyzer" to scan any GitHub repo in real-time.
Source code: Our scanner is open-source at github.com/ZhuYansen/agent-skills-hub.