# Skill Verifier

Master orchestrator for skill verification: routes html_sandbox skills to server-side Playwright sandbox execution, and text skills through the 3-pass classify/scan/analyze pipeline.

## Quick Reference

# Skill Verifier — Verification Toolkit Orchestrator

> **Version:** 2.1.0
> **Purpose:** Master workflow for verifying a SkillSlap skill. Detects the skill's render mode
> and routes to the correct verification pipeline: server-side sandbox execution for `html_sandbox`
> skills, or the manual 3-pass pipeline for `terminal`/`output_render` skills.
> References other toolkit skills by tag: `classifier`, `scanner`, `tester`.

---

## 1. Overview

SkillSlap skills have three **render modes** that determine how they execute and what verification
evidence gets captured:

| Render Mode | What it means | Verification method |
|---|---|---|
| `html_sandbox` | Skill has a self-contained HTML file run in a browser | **System route** — server executes Playwright, captures screenshots + video |
| `output_render` | Skill's agent output is HTML/SVG (skill itself is text) | Manual 3-pass pipeline |
| `terminal` | Text-based instructions, outputs to terminal | Manual 3-pass pipeline |

**Always detect the render mode first** (Step 1) before choosing a pipeline.

---

## 2. Prerequisites

- SkillSlap API access (Bearer token)
- Anthropic API key set in your SkillSlap profile (required for system verification)
- The following toolkit skills (find via `GET /api/skills?tag=toolkit`):
  - **Skill Classifier** (tags: `classifier`, `toolkit`)
  - **Malware Scanner** (tags: `scanner`, `toolkit`)
  - **API Tester** (tags: `tester`, `toolkit`) — optional, for API-type skills

---

## 3. Step 1: Fetch the Skill and Detect Render Mode

```http
GET /api/skills/{id}
Authorization: Bearer <token>
```

Extract: `title`, `description`, `content`, `tags`, `version`, `render_mode`, `content_checksum`

Also fetch the skill's files to check for HTML:

```http
GET /api/skills/{id}/files
Authorization: Bearer <token>
```

**Determine the pipeline to use:**

```
IF render_mode == "html_sandbox"
  OR any file has extension .html or mime_type "text/html":
    → Use Pipeline A: System Verification (Section 4)
ELSE:
    → Use Pipeline B: Manual 3-Pass Verification (Section 5)
```

---

## 4. Pipeline A — System Verification (html_sandbox skills)

Use this for skills with `render_mode: "html_sandbox"` or any attached `.html` file.

**The system verification route handles everything server-side:**
- AI analysis (classify, malware scan, quality scoring) via your Anthropic API key
- Playwright Chromium sandbox execution (isolated, no external network)
- Screenshot capture at 0s / 1s / 3s
- WebM video recording of the full execution
- Thumbnail upload to storage (`verification-screenshots/previews/{id}.png`)
- `render_mode` and `preview_thumbnail_path` updated on the skill automatically
- Verification record created with full `execution_trace` + `demo_execution_trace`

**You do not need to run any of this manually.** Just POST to the system route:

```http
POST /api/skills/{id}/verifications/system
Authorization: Bearer <token>
Content-Type: application/json

{}
```

**Requirements:**
- You must be the skill **owner**
- Your SkillSlap profile must have an **Anthropic API key** configured
  (`PATCH /api/users/profile` with `{ "anthropic_api_key": "sk-ant-..." }`)

**Response (202 Accepted):**
```json
{
  "verification_id": "<uuid>",
  "status": "running",
  "message": "System verification started"
}
```

Poll for completion:

```http
GET /api/skills/{id}/verifications/system/latest
Authorization: Bearer <token>
```

Wait until `status` is `"passed"` or `"failed"`. On `"passed"`:
- The skill's `render_mode` is set to `"html_sandbox"`
- `preview_thumbnail_path` points to the captured screenshot in storage
- `demo_execution_trace` contains `visual_output` steps (screenshots) and a `video_output` step
- The skill card shows the live sandbox iframe on hover + a screenshot thumbnail automatically

**If the skill fails system verification:**
- Check `execution_trace.steps` for `error` steps and JS console errors
- Fix the HTML (reduce complexity, remove external dependencies, fix JS errors)
- Re-run system verification

---

## 5. Pipeline B — Manual 3-Pass Verification (terminal / output_render skills)

Use this for text-based instruction skills and skills that produce output rendered externally.

### Step 1: Classify

Follow the **Skill Classifier** instructions to produce a `SkillClassification`:

```json
{
  "type": "agent_instructions",
  "requirements": { "api_access": false },
  "risk_level": "low",
  "reasoning": "..."
}
```

Record the classification in your execution trace.

### Step 2: Malware Scan

Follow the **Malware Scanner** instructions to produce a `MalwareScanResult`:

```json
{
  "scan_passed": true,
  "risk_level": "safe",
  "findings": [],
  "summary": "No threats detected."
}
```

**If the malware scan fails (risk_level is "high" or "critical"):**
- Stop the pipeline
- Set verification status to `failed`
- Include the malware findings in the `security_scan` field of your submission

### Step 3: Quality Analysis

Score the skill across 5 dimensions (0.0–1.0 each):

- **Clarity** — Instructions are clear and unambiguous
- **Completeness** — Covers all steps, edge cases, prerequisites
- **Security** — Free of security concerns
- **Executability** — An agent/human can follow and produce a result
- **Quality** — Professional formatting, well-structured

**Overall Score Formula:**
```
overall = security × 0.25 + clarity × 0.20 + completeness × 0.20 + executability × 0.20 + quality × 0.15
```

### Step 4: API Testing (Optional)

If classification indicates `api_workflow` and `api_access: true`:
- Follow the **API Tester** instructions
- Parse HTTP examples, execute requests, validate responses

### Step 5: Submit Results

```http
POST /api/skills/{id}/verifications
Authorization: Bearer <token>
Content-Type: application/json

{
  "tier": "community",
  "verification_mode": "local",
  "execution_trace": {
    "version": "1.0",
    "started_at": "<iso>",
    "completed_at": "<iso>",
    "steps": [ ... ],
    "summary": "Verification passed with 85% score"
  },
  "agent_info": {
    "model_name": "<your-model>",
    "model_provider": "<your-provider>",
    "agent_name": "<your-agent-name>",
    "agent_version": "<your-version>"
  }
}
```

---

## 6. Pass/Fail Criteria (Pipeline B)

The verification **passes** if ALL of the following are true:

1. Malware scan passed (`scan_passed: true`)
2. Security score >= 0.5
3. No critical or high security findings
4. Overall weighted score >= 0.5

---

## 7. Execution Trace Step Types

Build a structured trace with these step types:

| Type | Description |
|------|-------------|
| `info` | Informational messages |
| `ai_prompt` | AI model prompt (include model, provider, preview) |
| `ai_response` | AI model response (include tokens, parse success) |
| `api_request` | HTTP request made |
| `api_response` | HTTP response received |
| `assertion` | Pass/fail check |
| `visual_output` | Screenshot (image_data_uri, width, height) |
| `video_output` | Video recording (video_data_uri, mime_type, duration_ms) |
| `error` | Error encountered |

Each step must have a `timestamp` (ISO 8601).

---

## 8. Verification Modes

When submitting (Pipeline B), specify `verification_mode`:

| Mode | Description |
|------|-------------|
| `local` | Agent ran the skill locally on its own machine |
| `remote` | Agent ran the skill on a remote server |
| `sandboxed` | Agent ran the skill in a Docker sandbox |
| `system` | Platform-managed (system route only — use Pipeline A) |

---

## 9. Error Handling

- If any step fails, record an `error` step in the trace
- If AI fails to respond, retry once before marking as failed
- Always submit a verification result, even on failure — the trace is valuable
- Include `error_message` in the verification for human review
- For html_sandbox skills: if system verification fails, check JS errors and simplify the HTML

---

## 10. Generating Playground Assets

Every skill must have a visual asset for its card in the Slap Stack feed. The card media
priority is: **thumbnail → audio → sandbox iframe → terminal trace → dark box**. Your job
after verifying is to ensure the skill has the richest possible asset at the highest priority.

### By skill type:

**`html_sandbox` — canvas games, interactive tools, visualizations**
The live sandbox iframe appears on the card automatically via `render_mode === 'html_sandbox'`
(priority 3). System verification (Section 4) also captures a screenshot → `preview_thumbnail_path`
(priority 1), so these cards get both. No extra work needed after Pipeline A completes.

**Audio skills (`has_audio: true`)**
The `SkillCardAudioVisualizer` renders on the card automatically (priority 2).
No extra work needed.

**AI / text agent skills (`terminal` or `output_render`, invocation_type `agent` or `user`)**
These need a `## Playground` section added to their skill content. The playground is a
self-contained HTML page (no external dependencies) that shows a pre-canned example of
the skill in action — realistic input on the left, styled output on the right. This is
generated once by the agent and baked into the skill. No live AI is needed on the card.

**Agent workflow skills**
Add a `## Playground` section containing a self-contained HTML flowchart or step diagram
showing the workflow visually (e.g. Red → Green → Refactor for TDD Workflow).

**Context / rules skills**
Add a `## Playground` section containing a styled HTML summary card listing the key rules
or conventions the skill enforces.

---

### Generating a `## Playground` section for text skills

**Step 1 — Pick a seed input.** Choose a realistic, concrete input that exercises the
skill's core capability. For a Code Reviewer: a short function with a real bug. For a
PR Description Generator: a sample diff. Keep it small enough to render clearly.

**Step 2 — Run the skill.** Apply the skill to the seed input and capture the actual output.

**Step 3 — Build the HTML.** Wrap input + output in a self-contained dark-theme HTML page.
Requirements:
- No external CDN or network dependencies (all CSS/JS inline)
- Renders well at 600×450px (the sandbox design size)
- Dark background (`#0d1117` or similar), readable contrast
- Syntax highlighting via inline `<style>` (no Prism CDN) or `<pre><code>` blocks
- Shows the skill title and a label like "Example Input / Example Output"
- Must not throw JS errors or require user interaction to render

**Step 4 — Add to skill content.** Append the section:

```markdown
## Playground

<!-- Self-contained demo — no external dependencies -->
<!DOCTYPE html>
<html>
...
</html>
```

**Step 5 — Update the skill** via `update_skill` with the new content including `## Playground`.

**Step 6 — Capture the screenshot.** For `html_sandbox` skills the system route does this
automatically. For text skills with a `## Playground` section, render the HTML locally,
take a screenshot, and upload it via `attach_demo_media` with `type: "image"` and set
`preview_thumbnail_path` to the stored path. This makes the card show your demo as its
primary visual (priority 1, `SkillCardPreview`).

---

### Quality bar for playground HTML

| Requirement | Detail |
|---|---|
| Self-contained | Zero external requests — no CDN, no fonts, no images from URLs |
| Correct dimensions | Designed for 600×450px viewport |
| Dark theme | Background ≤ `#1a1a2e`, text ≥ 60% contrast |
| No interaction required | Renders the demo state immediately on load |
| No JS errors | Clean console — errors break the sandbox iframe |
| Meaningful content | Shows actual input → output, not placeholder lorem ipsum |

