All systems
Consumer platform · EU, US-Federal, BR · Owner: Jordan Vega
Aurora Content Moderation Pipeline
Multi-stage classifier and LLM judge for moderating user-generated content across hate speech, harassment, and CSAM categories.
System characterization
- Lifecycle stage
- monitoring
- Autonomy level
- autonomous
- Security exposure
- critical
- Indigenous data
- No
- Affected populations
- End users posting content, End users viewing content, Trust & safety reviewers
- Created
- 2025-09-04
Open framework items
- HighEvade ML modelActive adversarial population; daily new evasion patterns observed.
- HighLLM jailbreakLLM-judge has been jailbroken in red-team exercises; monitoring tightened.
- ModerateTransparency and explainabilityTransparency reports published; appeals data not yet disaggregated.
- ModerateRecourse and redressAppeals path exists; SLA review in progress.
- ModerateManageActive risk responses; continuous monitoring resourced.