- Published on
Ai-safety
All Posts
- llms (8)
- ai-engineering (5)
ai-safety (3)
- deep-dive (3)
- nlp (3)
- rag (2)
- function-calling (2)
- agents (2)
- fine-tuning (2)
- modern-bert (2)
- embeddings (2)
- agentic-ai (1)
- design-patterns (1)
- agents-orchestration (1)
- in-context-learning (1)
- openai (1)
- autogen (1)
- rdi-berkeley (1)
- model-context-protocol (1)
- deep-learning (1)
- multilayer-perceptron (1)
- backpropagation (1)
- gradient-descent (1)
- hugging-face-transformers (1)
- knowledge-graphs (1)
- langchain (1)
- neo4j (1)
- quantization (1)
- clustering (1)
- topic-modeling (1)
- innovation (1)
- research (1)
- ai-ethics (1)
- hackathon (1)
- conference (1)
- reasoning-models (1)
- scientific-discovery (1)
- ai-risks (1)
- spatial-intelligence (1)
- machine-consciousness (1)
- Published on
LLM-based applications face security challenges in form of prompt injections and jailbreaks. This project reviews the key architectural improvements underpinning ModernBERT, and implements fine-tuning for discriminating malicious prompts. PangolinGuard closely approximates the performance of Claude 3.7 on a mixed benchmark, while maintaining low latency (< 40ms).- Published on
As highlighted by the FBI, digital scams cause devastating impacts across society. MINERVA is an AutoGen implementation of seven agents that helps users identify scam attempts, achieving higher accuracy than baseline prompt methods (88.3% vs. 69.5%).