SkillGuard: SSL for Agent Skills

Unforgeable cryptographic safety guardrails for OpenClaw agent skills.
Powered by Jolt Atlas zero-knowledge machine learning proofs · Paid via x402 on Base
zkML Proofs Generated:
0
How It Works
Before an agent can transact, it installs skills — packages of code that give it new abilities. This is the agent skill supply chain. If a malicious skill enters the supply chain, every downstream action is compromised: stolen credentials, exfiltrated data, unauthorized transactions. SkillGuard is the cryptographic checkpoint at the boundary.
1
Skill submitted
Developer publishes a skill to ClawHub, or submits data directly via API
2
Features extracted
35 security signals analyzed: shell exec, reverse shells, credential access, obfuscation, entropy, exfiltration, density ratios, interaction terms. Code blocks in SKILL.md are extracted and scanned
3
Classified with proof
Neural network classifies the skill. Jolt Atlas generates a ~53 KB SNARK proof of the computation
4
Anyone verifies
The proof can be independently verified in milliseconds — no trust in SkillGuard required
5
Classification made
ALLOW, FLAG, or DENY. The proof becomes a tamperproof safety certificate for the skill
Like an SSL certificate for agent skills: every classification is backed by a zero-knowledge machine learning proof — a cryptographic receipt that the neural network ran correctly on the claimed inputs. The proof reveals the decision but not the model weights. Anyone can verify. No one can forge.
How the Classifier Works
The classifier is a small neural network that learned to spot dangerous patterns by studying hundreds of examples of safe and malicious skills. It doesn't "understand" code — it counts security-relevant signals and uses learned patterns to decide.
Feature Extraction (35 signals)
Each skill is reduced to 35 numbers: shell exec count, network calls, obfuscation score, reverse shell patterns, credential access, author reputation, download counts, byte entropy, code density ratios, and more. Code blocks inside SKILL.md files are extracted and scanned even when no separate scripts exist. Each value is scaled to 0–128.
Neural Network (4,460 params)
Three layers of neurons (35→56→40→4) mix the features into progressively higher-level patterns. Layer 1 detects basic combinations ("obfuscation + shell exec"). Layer 2 builds compound indicators ("credential stealer" vs "API client"). Layer 3 outputs four scores: SAFE, CAUTION, DANGEROUS, MALICIOUS. Highest wins.
Confidence & Decisions
Raw scores are converted to probabilities via softmax (T=12.8). If the distribution is spread out (high entropy), the model flags it for human review. DANGEROUS/MALICIOUS results need ≥50% confidence for DENY; otherwise they're flagged. SAFE/CAUTION results get ALLOW unless entropy is too high.
Integer Arithmetic & ZK Proofs
All math uses integers (scale=128) instead of floating-point, because the entire forward pass runs inside a zero-knowledge SNARK (Jolt Atlas). This produces a ~53 KB cryptographic proof that the classification was computed correctly. Anyone can verify the proof without trusting SkillGuard or seeing the model weights.
Trained on 690 skill profiles (balanced across 4 classes, including unknown-metadata samples). 93.9% cross-validated accuracy. Adversarially hardened with FGSM perturbations.
Service Status
Checking...
x402 Payment
Price per provable classification
$0.001 USDC
Includes
zkML proof
Network
Base (eip155:8453)
Protocol
HTTP 402 / x402
Try It
🔍
Loading skill directory...
GitHub URL detected
ZKML Proving
Proving Scheme
Jolt / Dory
Model Size
4,460 params (35-56-40-4)
Proofs Generated
0
Proofs Verified
0
Endpoints
MethodPathAuthPrice
POST/api/v1/evaluateAPI key or x402$0.001
POST/api/v1/verifyNoneFree
GET/healthNoneFree
GET/statsNoneFree
GET/openapi.jsonNoneFree
GET/.well-known/ai-plugin.jsonNoneFree
GET/.well-known/llms.txtNoneFree
Live Statistics
-
Total Requests
-
Errors
-
Evaluate
-
By Name
-
Prove
-
Verify
Classifications
Safe
0
Caution
0
Dangerous
0
Malicious
0
Decisions
-
Allow
-
Deny
-
Flag