- Published on
The ever-growing volume of research publications necessitates efficient methods for structuring such knowledge. This project implements an end-to-end automated solution using Machine Learning (UMAP, HDBSCAN), Embedding Quantization, Prompt Engineering, and an LLM pipeline to classify a dataset of 25,000 arXiv publications under a novel taxonomy of classes.