Lattice
Workspace
All systems
Consumer platform · EU, US-Federal, BR · Owner: Jordan Vega

Aurora Content Moderation Pipeline

Multi-stage classifier and LLM judge for moderating user-generated content across hate speech, harassment, and CSAM categories.

System characterization

Lifecycle stage
monitoring
Autonomy level
autonomous
Security exposure
critical
Indigenous data
No
Affected populations
End users posting content, End users viewing content, Trust & safety reviewers
Created
2025-09-04

Open framework items

  • Evade ML model
    Active adversarial population; daily new evasion patterns observed.
    High
  • LLM jailbreak
    LLM-judge has been jailbroken in red-team exercises; monitoring tightened.
    High
  • Transparency and explainability
    Transparency reports published; appeals data not yet disaggregated.
    Moderate
  • Recourse and redress
    Appeals path exists; SLA review in progress.
    Moderate
  • Manage
    Active risk responses; continuous monitoring resourced.
    Moderate

Command palette

Search frameworks, systems, glossary, and pages