Skip to content

Latest commit

 

History

History
56 lines (35 loc) · 2.89 KB

audio_lm.md

File metadata and controls

56 lines (35 loc) · 2.89 KB

Audio Language Model

Papers

  • State-Space Large Audio Language Models, arXiv, 2411.15685, arxiv, pdf, cication: -1

    Saurabhchand Bhati, Yuan Gong, Leonid Karlinsky, ..., Rogerio Feris, James Glass

  • 🌟 Scaling Speech-Text Pre-training with Synthetic Interleaved Data, arXiv, 2411.17607, arxiv, pdf, cication: -1

    Aohan Zeng, Zhengxiao Du, Mingdao Liu, ..., Yuxiao Dong, Jie Tang

  • 🌟 A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models, arXiv, 2411.08742, arxiv, pdf, cication: -1

    Dingdong Wang, Mingyu Cui, Dongchao Yang, ..., Xueyuan Chen, Helen Meng

  • Roadmap towards Superhuman Speech Understanding using Large Language Models, arXiv, 2410.13268, arxiv, pdf, cication: -1

    Fan Bu, Yuhao Zhang, Xidong Wang, ..., Qun Liu, Haizhou Li

Survey

  • Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks, arXiv, 2411.05361, arxiv, pdf, cication: -1

    Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, ..., Shinji Watanabe, Hung-yi Lee

Evaluation

  • What Do Speech Foundation Models Not Learn About Speech?, arXiv, 2410.12948, arxiv, pdf, cication: -1

    Abdul Waheed, Hanin Atwany, Bhiksha Raj, ..., Rita Singh

  • Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models, arXiv, 2410.23861, arxiv, pdf, cication: -1

    Hao Yang, Lizhen Qu, Ehsan Shareghi, ..., Gholamreza Haffari

  • MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark, arXiv, 2410.19168, arxiv, pdf, cication: -1

    S Sakshi, Utkarsh Tyagi, Sonal Kumar, ..., Sreyan Ghosh, Dinesh Manocha · (sakshi113.github)

Projects

Toolkits

Misc

Misc