Skip to content

AI Security for Apps (formerly Firewall for AI) can detect when an LLM prompt touches on unsafe or unwanted subjects. There are two layers of topic detection:

  • Default unsafe topics — A built-in set of safety categories that detect harmful content such as violent crimes, hate speech, and sexual content.
  • Custom topics — Topics you define to match your organization's specific policies, such as "competitors" or "financial advice".

Default unsafe topics

When AI Security for Apps is enabled, it automatically evaluates prompts against a set of default unsafe topic categories and populates two fields:

Default unsafe topic categories

CategoryDescription
S1Violent crimes
S2Non-violent crimes
S3Sex-related crimes
S4Child sexual exploitation
S5Defamation
S6Specialized advice
S7Privacy
S8Intellectual property
S9Indiscriminate weapons
S10Hate
S11Suicide and self-harm
S12Sexual content
S13Elections
S14Code interpreter abuse

Example rules — default unsafe topics

Block any prompt with unsafe content

  • When incoming requests match:

    FieldOperatorValue
    LLM Unsafe topic detectedequalsTrue

    Expression when using the editor:
    (cf.llm.prompt.unsafe_topic_detected)

  • Action: Block

Block only specific unsafe categories

  • When incoming requests match:

    FieldOperatorValue
    LLM Unsafe topic categoriesis inS1: Violent Crimes S10: Hate

    Expression when using the editor:
    (any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))

  • Action: Block


Custom topics

Custom topic detection lets you define your own topics and AI Security for Apps will score each prompt against them. You can then use these scores in custom rules or rate limiting rules to block, challenge, or log matching requests.

This capability uses a zero-shot classification model that evaluates prompts at runtime. No model training is required.

How custom topics work

  1. You define a list of up to 20 custom topics via the dashboard or API. Each topic consists of:
    • A label — Used in rule expressions and analytics
    • A topic string — The descriptive text the model uses to classify prompts