- Published on
The ever-growing volume of research publications necessitates efficient methods for structuring academic knowledge. We present an end-to-end automated solution using embedding quantization and an LLM pipeline. Our case study starts with a dataset of 25,000 arXiv publications from Computational Linguistics, which we organize under a novel taxonomy of classes.