Security & Compliance

Security scanner

Scan prompts for injection, leakage, and safety risks before publish—with streaming results and Refine handoff.

7 min read

The Security scanner analyzes prompt text against categories such as injection, data exfiltration, role confusion, and instruction override. You get a risk score, findings with severity, and recommendations before you share or publish.

Routes:

  • Personal: /dashboard/scanner, pending at /dashboard/scanner/pending, results at /dashboard/scanner/{scanId}
  • Organization: /dashboard/organization/{orgId}/security-scanner with the same pending and detail pattern

Policies may set requireScanBeforePublish so lifecycle approval fails until a passing scan exists.

  1. Step 1

    Paste or load prompt text

    Open the scanner, enter up to 4,000 characters, or prefill from a library prompt via query params.

  2. Step 2

    Choose target model

    Select the model label the scan simulates against (e.g. GPT-4o, Claude). This affects how findings are framed.

  3. Step 3

    Start scan

    Submit to queue a session. You are redirected to a **pending** page while the scan starts, then to the scan detail route.

  4. Step 4

    Watch progress

    Results stream in with progress events. Findings appear as they are detected; the session completes with a summary and risk badge.

  5. Step 5

    Act on results

    Use **Fix with Refine** to open Refine Agent with context, test in **Playground**, or return to the library. Re-scan after edits.

Related articles