Prompt Firewall
The Prompt Firewall is a route-attached inspection layer that scans request payloads for sensitive text and metadata before any provider call. It is intentionally explicit and intentionally limited: a deterministic detection layer for well-defined risk patterns, not a semantic prompt-injection defense.
The Prompt Firewall is one layer of defense within the governed execution path. Use it alongside the project policy engine and provider-side guardrails; it is not a substitute for either.
Strengthen-only model
The Prompt Firewall enforces a non-weakenable platform baseline. Every project starts with the full set of platform-baseline rules in block mode. From there, projects on the right plan tier can add detection rules and tighten detection profiles. They cannot remove or relax baseline rules.
This is a deliberate, locked posture. Three properties hold for every project on every supported route:
- The baseline is enforced. Platform-baseline rules always run in
blockmode for every project on every supported route. - Weakening is silently corrected. Attempts to set a baseline rule to
allow, or to set itsenabledflag tofalse, are silently rewritten back to the baseline. Strengthening configuration that conflicts with the baseline is normalized at write time, so the deployed state is always at or above the floor. - Strengthening is additive. Custom rules and stricter profiles add coverage on top of the baseline. They never replace baseline rules.
The “Option C” label captures this in one phrase: customers can strengthen, never weaken below the platform baseline. Strengthening configuration is a separate authoring surface, gated by plan tier; the baseline is universal.
Per-surface coverage matrix
The Prompt Firewall runs on most public execution surfaces. Coverage is route-scoped — a route either runs the firewall or it does not — and the field categories inspected vary by surface.
| Surface | Firewall runs | Message content | Tool / function arguments | Multimodal references | Metadata fields | Filename / URL fields |
|---|---|---|---|---|---|---|
POST /v1/executions | Yes | Yes | Yes | Yes | Yes | Yes |
POST /v1/execute | Yes | Yes | Yes | Provider-dependent | Provider-dependent | Provider-dependent |
POST /v1/proxy/openai | Yes | Yes | Yes | Yes | Yes | Yes |
POST /v1/proxy/anthropic | Yes | Yes | Yes | Limited | Yes | No |
POST /v1/proxy/google | Yes | Yes | Yes | Yes | Yes | Yes |
POST /v1/proxy/xai | Yes | Yes | Limited | No | No | No |
POST /v1/proxy/meta | Yes | Yes | Limited | Limited | No | No |
POST /v1/permits | No | — | — | — | — | — |
POST /v1/jobs (submission) | Deferred | — | — | — | — | — |
Reading the matrix:
- Yes — the field category is inspected on this surface against every applicable rule.
- No — the field category is not inspected on this surface today.
- Limited — coverage exists but is narrower than the broader-coverage routes.
- Provider-dependent —
POST /v1/executeresolves provider/model first, then runs the corresponding provider-specific extractor; coverage matches the resolved provider’s row. - Deferred —
POST /v1/jobsdoes not run the firewall on submission. Governance and firewall evaluation happen later in the worker execution path. - No (permits) —
POST /v1/permitsis the canonical decision seam, not a provider-dispatch surface. Permit evaluation runs the policy engine but does not run the prompt firewall.
What gets inspected
Inspection happens on category-level field classes drawn from the request payload. The categories are common to every route that runs the firewall; specific extractor field paths are part of the runtime, not the public contract.
The categories Keel inspects on covered routes:
- System, developer, and user message content — the conversation text the model would otherwise see directly.
- Tool and function arguments — text or JSON sent as tool descriptions, tool-use inputs, function-call arguments, or function-response payloads.
- Supported multimodal references — declared text descriptors and asset metadata attached to images, audio, or other multimodal inputs (captions, alt text, transcripts, asset descriptors).
- Request metadata text — text-bearing values supplied alongside the conversation (for example, free-form metadata fields that ride along with the request).
- Filename and URL references — filename and URL fields that surface in the request payload, with extracted-domain detection on URLs.
- Provider-shaped passthrough fields — text-bearing values supplied to forward provider-native arguments or options.
Coverage of each category by surface is shown in the matrix above. Newly introduced text-bearing payload fields may not be inspected until coverage is extended for that route — see Scope and Limits § Content inspection.
Platform baseline rule categories
The platform baseline is enforced at block for every project on every supported route. The detection categories cover well-defined risk patterns and shapes:
| Category | What it covers |
|---|---|
| Credential leakage | Strings shaped like API keys, bearer tokens, custom API-key headers, and PEM-style private key blocks |
| Identifier exposure | Government identifier patterns (e.g. US SSN-shaped sequences) and payment-card-shaped strings |
| Prompt-attack shapes | Common pasteable phrasings used to redirect, override, or exfiltrate system instructions and context secrets |
| Sensitive resource references | Filenames suggesting sensitive content and URL/domain shapes flagged as risky |
These are deterministic pattern detectors, not semantic classifiers. They cover specific shapes the platform commits to blocking; they do not interpret intent. Specific rule identifiers and per-pattern coverage are part of the runtime, not part of the public contract — exact rule lists evolve as we add coverage.
Strengthening: custom rules and profiles
Projects on Business plans and above can strengthen their firewall posture beyond the platform baseline. Strengthening is configured under prompt_firewall_strengthening on the project policy overrides and authored through the project policy surface:
PATCH /v1/projects/{project_id}/policyThis is a policy-authoring surface authenticated with a project owner’s user token, not a runtime surface authenticated with a project API key.
Custom rules
Custom rules are additional regex-based detectors that run alongside the platform baseline. Each custom rule has:
id— lowercase alphanumeric and underscores. Must not collide with platform-baseline rule ids.name— a human-readable label.pattern— a regular expression compiled at validation time.action— alwaysblock. Strengthening can only add blocks; it cannot author allow rules.
Custom rules are validated at write time. An invalid pattern, an id collision with a baseline rule, or any non-block action is rejected with a configuration error rather than silently dropped.
Profiles
profile selects a named strictness level for the firewall:
baseline— the default. Platform-baseline rules atblock.strict— a forward-compatible extension point for additional baseline tightening, currently equivalent tobaselineat runtime.
Existing strict configurations continue to author and validate cleanly when additional strict-mode coverage ships under that profile.
Worked example
Strengthen a project with one custom rule and the strict profile:
curl -X PATCH https://api.keelapi.com/v1/projects/<project_id>/policy \
-H "Authorization: Bearer <user_token>" \
-H "Content-Type: application/json" \
-d '{
"prompt_firewall_strengthening": {
"profile": "strict",
"custom_rules": [
{
"id": "internal_codename",
"name": "Block internal codename",
"pattern": "(?i)project\\s+sunrise",
"action": "block"
}
]
}
}'A subsequent execution that contains project sunrise in any inspected field is blocked. The deny permit carries the matched rule:
{
"decision": "deny",
"reason_code": "prompt_firewall_blocked",
"policy_id": "prompt_firewall_v1",
"policy_version": "1.0.0",
"deny_details": {
"matched_rule_ids": ["internal_codename"],
"field_path": "messages[0].content",
"pattern_ids": ["internal_codename"],
"occurrence_counts": {"internal_codename": 1}
}
}Custom rules and platform baseline rules are evaluated in the same pass. A custom-rule match produces the same deny outcome shape and the same deny-permit record as a platform-rule match.
Decision behavior on a firewall block
When a firewall rule matches, Keel persists a deny permit and stops the request before any provider call is made. The deny permit carries:
reason_code = "prompt_firewall_blocked"policy_id = "prompt_firewall_v1"policy_version = "1.0.0"- a
deny_detailsobject withmatched_rule_ids,field_path,pattern_ids, and per-ruleoccurrence_counts
No provider dispatch occurs. The execution route also fails closed if blocked firewall state somehow reaches the dispatch boundary, so a firewall block is a hard stop regardless of where the block was detected in the route.
When the firewall passes — meaning no configured rule matched the inspected text — Keel does not persist a separate firewall-pass record. Allow matches are logged but do not produce a replayable success event. Only blocks appear in Timeline Replay as standalone firewall events.
prompt_firewall_blocked is part of the firewall denial reason-code system, distinct from the permit reason-code lexicon documented in Errors › Permit reason codes. See Errors for per-surface error envelope shapes and how firewall denials surface across permit, executions, execute, and proxy routes.
Operational boundaries
A few operational details worth knowing when reasoning about firewall behavior in production:
- Idempotency replay. Successful idempotency replays on replay-capable proxy routes can return without a fresh firewall pass, because the replayed response is the prior response.
POST /v1/executionsandPOST /v1/executeuse a different reservation path fromPOST /v1/proxy/*. - Allow events are not persisted. Only firewall blocks are persisted as standalone events. A passing inspection does not produce a separate evidentiary record beyond the permit and the execution record themselves.
- Async submission is deferred.
POST /v1/jobsdoes not run the firewall on submission. The firewall runs in the worker execution path when the job dispatches.
What this surface does and does not claim
- The firewall is an inbound, deterministic detection layer for well-defined risk patterns. It is not a semantic prompt-injection defense and does not score open-ended adversarial content.
- The firewall does not OCR images, transcribe audio or video, or analyze pixels. Non-text media is only covered when upstream callers already supply extracted text fields.
- The firewall does not score model outputs or scan provider responses.
- Route coverage is explicit, not universal. Routes outside the per-surface matrix above should be treated as not running firewall inspection.
- The firewall is one defense layer. It does not replace the project policy engine, output filtering on your side, or provider-side safety controls.
For the broader content-inspection scope and how the firewall fits into the full non-claim model, see Scope and Limits § Content inspection.
Plan tier availability
| Capability | Plan |
|---|---|
Platform-baseline rules at block on every supported route | Every plan |
| Strengthening — custom rules and profiles | Business and Enterprise |
Below the strengthening gate, projects still receive full platform-baseline protection. The gate affects custom-rule authoring, not baseline coverage.
For the full per-tier feature matrix, see Plans & Entitlements.
Related pages
- Security — the broader security boundary
- Errors — error envelope shape and firewall denial reason codes
- Permits — the deny-permit shape that firewall blocks emit
- Executions — the execution surface that runs the firewall
- Plans & Entitlements — full plan-tier entitlement matrix