Nlp

Published on
March 23, 2025
PangolinGuard - ModernBERT Model for AI Guardrails
llms nlp ai-safety fine-tuning modern-bert hugging-face-transformers deep-dive
LLM-based applications face critical security challenges in form of prompt injections and jailbreaks. This project dives into the key architectural improvements underpinning ModernBERT, and demonstrates how to implement fine-tuning for discriminating malicious prompts. Our model closely approximates the performance of Claude 3.7 and Gemini Flash 2.0 on a mixed benchmark (NotInject, BIPIA, Wildguard-Benign, and PINT), while maintaining low latency (<40ms).
Published on
July 22, 2024
Taxonomy Completion with Embedding Quantization and an LLM-based Pipeline - A Case Study in Computational Linguistics
llms nlp embeddings quantization clustering topic-modeling deep-dive
The ever-growing volume of research publications necessitates efficient methods for structuring such knowledge. This project implements an end-to-end automated solution using Machine Learning (UMAP, HDBSCAN), Embedding Quantization, Prompt Engineering, and an LLM pipeline to classify a dataset of 25,000 arXiv publications under a novel taxonomy of classes.

PangolinGuard - ModernBERT Model for AI Guardrails