- Published on
The ever-growing volume of research publications necessitates efficient methods for structuring academic knowledge. We present an end-to-end automated solution using Machine Learning (UMAP, HDBSCAN), Embedding Quantization, Prompt Engineering, and an LLM pipeline to classify a dataset of 25,000 research publications (arXiv) under a novel taxonomy of classes.