Library to facilitate pruning of LLMs based on context
In AI research, the optimization of Large Language Models (LLMs) remains a significant challenge, crucial for advancing the field’s practical applications and sustainability. Building upon the foundational work of Professor Song Han’s lab at MIT, this codebase consoldiates work done as an MIT 6.5940 Project, introducing a novel approach in developing Mini-GPTs via contextual pruning. Our methodology strategically prunes the computational architecture of traditional LLMs, like Phi-1.5, focusing on retaining core functionalities while drastically reducing model sizes. We employed the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles. Contextual pruning is a promising method for building domain-specific LLMs, and this research is a building block towards future development with more hardware compute, refined fine-tuning, and quantization.
Authors: Tim Valicenti, Justice Vidal, and Ritik Patnaik