MiniCache: KV Cache Compression in Depth Dimension for Large Language ModelsApr 1, 2024ยทAkide Liu,Jing Liu,Zizheng Pan,Yefei He,Gholamreza Haffari,Bohan Zhuangยท 0 min read CiteTypeJournal articlePublicationNeurIPS 2024Last updated on Apr 1, 2024 ← ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification May 25, 2024EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models Sep 30, 2023 →