-
State-Space Large Audio Language Models,
arXiv, 2411.15685
, arxiv, pdf, cication: -1Saurabhchand Bhati, Yuan Gong, Leonid Karlinsky, ..., Rogerio Feris, James Glass
-
🌟 Scaling Speech-Text Pre-training with Synthetic Interleaved Data,
arXiv, 2411.17607
, arxiv, pdf, cication: -1Aohan Zeng, Zhengxiao Du, Mingdao Liu, ..., Yuxiao Dong, Jie Tang
-
🌟 A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models,
arXiv, 2411.08742
, arxiv, pdf, cication: -1Dingdong Wang, Mingyu Cui, Dongchao Yang, ..., Xueyuan Chen, Helen Meng
-
Roadmap towards Superhuman Speech Understanding using Large Language Models,
arXiv, 2410.13268
, arxiv, pdf, cication: -1Fan Bu, Yuhao Zhang, Xidong Wang, ..., Qun Liu, Haizhou Li
-
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks,
arXiv, 2411.05361
, arxiv, pdf, cication: -1Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, ..., Shinji Watanabe, Hung-yi Lee
-
What Do Speech Foundation Models Not Learn About Speech?,
arXiv, 2410.12948
, arxiv, pdf, cication: -1Abdul Waheed, Hanin Atwany, Bhiksha Raj, ..., Rita Singh
-
Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models,
arXiv, 2410.23861
, arxiv, pdf, cication: -1Hao Yang, Lizhen Qu, Ehsan Shareghi, ..., Gholamreza Haffari
-
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark,
arXiv, 2410.19168
, arxiv, pdf, cication: -1S Sakshi, Utkarsh Tyagi, Sonal Kumar, ..., Sreyan Ghosh, Dinesh Manocha · (sakshi113.github)