Ai-safety

Published on
April 15, 2026
The Zero-Trust Gap in LLMs, How Encoders Can Protect Your AI
llms nlp ai-safety fine-tuning modern-bert
Published on
March 23, 2025
PangolinGuard - ModernBERT Model for AI Guardrails
llms nlp ai-safety fine-tuning modern-bert hugging-face-transformers deep-dive
LLM-based applications face security challenges in form of prompt injections and jailbreaks. This project reviews the key architectural improvements underpinning ModernBERT, and implements fine-tuning for discriminating malicious prompts. PangolinGuard closely approximates the performance of Claude 3.7 on a mixed benchmark, while maintaining low latency (< 40ms).
Published on
December 19, 2024
MINERVA - A Multi-Agent LLM System for Digital Scam Protection
llms agents function-calling ai-safety autogen rdi-berkeley ai-engineering
As highlighted by the FBI, digital scams cause devastating impacts across society. MINERVA is an AutoGen implementation of seven agents that helps users identify scam attempts, achieving higher accuracy than baseline prompt methods (88.3% vs. 69.5%).

The Zero-Trust Gap in LLMs, How Encoders Can Protect Your AI