Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs.SD
arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Sound

Authors and titles for recent submissions

  • Mon, 6 Oct 2025
  • Fri, 3 Oct 2025
  • Thu, 2 Oct 2025
  • Wed, 1 Oct 2025
  • Tue, 30 Sep 2025

See today's new changes

Total of 128 entries : 1-50 51-100 101-128
Showing up to 50 entries per page: fewer | more | all

Mon, 6 Oct 2025 (showing 16 of 16 entries )

[1] arXiv:2510.02995 [pdf, html, other]
Title: AudioToolAgent: An Agentic Framework for Audio-Language Models
Gijs Wijngaard, Elia Formisano, Michel Dumontier
Subjects: Sound (cs.SD)
[2] arXiv:2510.02916 [pdf, html, other]
Title: SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos
Amir Dellali, Luca A. Lanzendörfer, Florian Grötschla, Roger Wattenhofer
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[3] arXiv:2510.02915 [pdf, html, other]
Title: WavInWav: Time-domain Speech Hiding via Invertible Neural Network
Wei Fan, Kejiang Chen, Xiangkun Wang, Weiming Zhang, Nenghai Yu
Comments: 13 pages, 5 figures, project page: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2510.02864 [pdf, html, other]
Title: Forensic Similarity for Speech Deepfakes
Viola Negroni, Davide Salvi, Daniele Ugo Leonzio, Paolo Bestagini, Stefano Tubaro
Comments: Submitted @ IEEE OJSP
Subjects: Sound (cs.SD)
[5] arXiv:2510.02848 [pdf, other]
Title: Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech
Hieu-Nghia Huynh-Nguyen, Huynh Nguyen Dang, Ngoc-Son Nguyen, Van Nguyen
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[6] arXiv:2510.02597 [pdf, html, other]
Title: TART: A Comprehensive Tool for Technique-Aware Audio-to-Tab Guitar Transcription
Akshaj Gupta, Andrea Guzman, Anagha Badriprasad, Hwi Joo Park, Upasana Puranik, Robin Netzorg, Jiachen Lian, Gopala Krishna Anumanchipalli
Subjects: Sound (cs.SD)
[7] arXiv:2510.02500 [pdf, html, other]
Title: Latent Multi-view Learning for Robust Environmental Sound Representations
Sivan Sing, Julia Wilkins, Magdalena Fuentes, Juan Pablo Bello
Comments: Accepted to DCASE 2025 Workshop. 4+1 pages, 2 figures, 2 tables
Subjects: Sound (cs.SD)
[8] arXiv:2510.02401 [pdf, html, other]
Title: Linear RNNs for autoregressive generation of long music samples
Konrad Szewczyk, Daniel Gallo Fernández, James Townsend
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9] arXiv:2510.02382 [pdf, html, other]
Title: Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering
Xuemai Xie, Xianrui Wang, Liyuan Zhang, Yichen Yang, Shoji Makino
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:2510.03117 (cross-list from cs.CV) [pdf, html, other]
Title: Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan, Xihua Wang, Zhengfeng Lai, Xin Cheng, Peng Zhang, XiaoJiang Liu, Ruihua Song, Meng Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[11] arXiv:2510.03115 (cross-list from cs.CL) [pdf, html, other]
Title: Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
Jacobo Romero-Díaz, Gerard I. Gállego, Oriol Pareras, Federico Costa, Javier Hernando, Cristina España-Bonet
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[12] arXiv:2510.03093 (cross-list from cs.CL) [pdf, html, other]
Title: Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
Oriol Pareras, Gerard I. Gállego, Federico Costa, Cristina España-Bonet, Javier Hernando
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
[13] arXiv:2510.03025 (cross-list from eess.AS) [pdf, html, other]
Title: CVSM: Contrastive Vocal Similarity Modeling
Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
Comments: 13 pages, 3 tables, 8 figures. Submitted article at IEEE Trans. on Audio, Speech and Language Proc. (pre-print version)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14] arXiv:2510.02672 (cross-list from eess.AS) [pdf, html, other]
Title: STSM-FiLM: A FiLM-Conditioned Neural Architecture for Time-Scale Modification of Speech
Dyah A. M. G. Wisnu, Ryandhimas E. Zezario, Stefano Rini, Fo-Rui Li, Yan-Tsung Peng, Hsin-Min Wang, Yu Tsao
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15] arXiv:2510.02398 (cross-list from eess.AS) [pdf, html, other]
Title: When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
Shree Harsha Bokkahalli Satish, Gustav Eje Henter, Éva Székely
Comments: 16 pages, 5 figures, To Appear in SPECOM 2025
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2510.02320 (cross-list from eess.AS) [pdf, html, other]
Title: WEE-Therapy: A Mixture of Weak Encoders Framework for Psychological Counseling Dialogue Analysis
Yongqi Kang, Yong Zhao
Comments: 5 pages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Fri, 3 Oct 2025 (showing 19 of 19 entries )

[17] arXiv:2510.02187 [pdf, html, other]
Title: High-Fidelity Speech Enhancement via Discrete Audio Tokens
Luca A. Lanzendörfer, Frédéric Berdoz, Antonis Asonitis, Roger Wattenhofer
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18] arXiv:2510.02171 [pdf, html, other]
Title: Go witheFlow: Real-time Emotion Driven Audio Effects Modulation
Edmund Dervakos, Spyridon Kantarelis, Vassilis Lyberatos, Jason Liartis, Giorgos Stamou
Comments: Accepted at NeurIPS Creative AI Track 2025: Humanity
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19] arXiv:2510.02110 [pdf, other]
Title: SoundReactor: Frame-level Online Video-to-Audio Generation
Koichi Saito, Julian Tanke, Christian Simon, Masato Ishii, Kazuki Shimada, Zachary Novack, Zhi Zhong, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[20] arXiv:2510.01968 [pdf, html, other]
Title: Multi-bit Audio Watermarking
Luca A. Lanzendörfer, Kyle Fearne, Florian Grötschla, Roger Wattenhofer
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2510.01963 [pdf, html, other]
Title: Bias beyond Borders: Global Inequalities in AI-Generated Music
Ahmet Solak, Florian Grötschla, Luca A. Lanzendörfer, Roger Wattenhofer
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[22] arXiv:2510.01958 [pdf, other]
Title: Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U-Nets for Improved Cross-Corpus Speech Enhancement
Nikolai Lund Kühne, Jesper Jensen, Jan Østergaard, Zheng-Hua Tan
Comments: Submitted to IEEE for possible publication
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23] arXiv:2510.01903 [pdf, html, other]
Title: MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression
Jingyi Li, Zhiyuan Zhao, Yunfei Liu, Lijian Lin, Ye Zhu, Jiahao Wu, Qiuqiang Kong, Yu Li
Comments: 9 pages, 4 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2510.01891 [pdf, html, other]
Title: HRTFformer: A Spatially-Aware Transformer for Personalized HRTF Upsampling in Immersive Audio Rendering
Xuyi Hu, Jian Li, Shaojie Zhang, Stefan Goetz, Lorenzo Picinali, Ozgur B. Akan, Aidan O. T. Hogg
Comments: 10 pages and 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[25] arXiv:2510.01812 [pdf, html, other]
Title: SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment
Yuxun Tang, Lan Liu, Wenhao Feng, Yiwen Zhao, Jionghao Han, Yifeng Yu, Jiatong Shi, Qin Jin
Comments: 4 pages, 5 figures;
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[26] arXiv:2510.01722 [pdf, html, other]
Title: Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari
Comments: In Proceedings of the 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2025)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[27] arXiv:2510.01462 [pdf, html, other]
Title: RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines
Ahmed Adel Attia, Jing Liu, Carol Espy Wilson
Comments: arXiv admin note: substantial text overlap with arXiv:2506.09206
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[28] arXiv:2510.02181 (cross-list from cs.HC) [pdf, html, other]
Title: EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning
Liang-Yuan Wu, Dhruv Jain
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2510.02158 (cross-list from cs.CR) [pdf, html, other]
Title: Mirage Fools the Ear, Mute Hides the Truth: Precise Targeted Adversarial Attacks on Polyphonic Sound Event Detection Systems
Junjie Su, Weifei Jin, Yuxin Cao, Derui Wang, Kai Ye, Jie Hao
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD)
[30] arXiv:2510.02066 (cross-list from cs.CL) [pdf, html, other]
Title: Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
Siddhant Arora, Jinchuan Tian, Hayato Futami, Jiatong Shi, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2510.02044 (cross-list from cs.CL) [pdf, html, other]
Title: Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Siddhant Arora, Haidar Khan, Kai Sun, Xin Luna Dong, Sajal Choudhary, Seungwhan Moon, Xinyuan Zhang, Adithya Sagar, Surya Teja Appini, Kaushik Patnaik, Sanat Sharma, Shinji Watanabe, Anuj Kumar, Ahmed Aly, Yue Liu, Florian Metze, Zhaojiang Lin
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32] arXiv:2510.01860 (cross-list from eess.AS) [pdf, html, other]
Title: SLAP: Learning Speaker and Health-Related Representations from Natural Language Supervision
Angelika Ando, Auguste Crabeil, Adrien Lesage, Rachid Riad
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[33] arXiv:2510.01698 (cross-list from cs.IR) [pdf, html, other]
Title: TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling
Seungheon Doh, Keunwoo Choi, Juhan Nam
Comments: Accepted for publication at The Workshop on AI for Music, Neural Information Processing Systems (NeurIPS-AI4Music)
Subjects: Information Retrieval (cs.IR); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2510.01284 (cross-list from cs.MM) [pdf, html, other]
Title: Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation
Chetwin Low, Weimin Wang, Calder Katyal
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2510.01254 (cross-list from cs.CL) [pdf, html, other]
Title: Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
Shree Harsha Bokkahalli Satish, Gustav Eje Henter, Éva Székely
Comments: 5 pages, 2 Figures, Submitted to IEEE ICASSP 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 2 Oct 2025 (showing first 15 of 29 entries )

[36] arXiv:2510.01109 [pdf, html, other]
Title: NLDSI-BWE: Non Linear Dynamical Systems-Inspired Multi Resolution Discriminators for Speech Bandwidth Extension
Tarikul Islam Tamiti, Anomadarshi Barua
Subjects: Sound (cs.SD)
[37] arXiv:2510.01082 [pdf, html, other]
Title: HVAC-EAR: Eavesdropping Human Speech Using HVAC Systems
Tarikul Islam Tamiti, Biraj Joshi, Rida Hasan, Anomadarshi Barua
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR)
[38] arXiv:2510.00981 [pdf, html, other]
Title: FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Jiaqi Li, Yao Qian, Yuxuan Hu, Leying Zhang, Xiaofei Wang, Heng Lu, Manthan Thakker, Jinyu Li, Sheng Zhao, Zhizheng Wu
Subjects: Sound (cs.SD)
[39] arXiv:2510.00743 [pdf, html, other]
Title: From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling
Yifei Cao, Changhao Jiang, Jiabao Zhuang, Jiajun Sun, Ming Zhang, Zhiheng Xi, Hui Li, Shihan Dou, Yuran Wang, Yunke Zhang, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[40] arXiv:2510.00657 [pdf, html, other]
Title: XPPG-PCA: Reference-free automatic speech severity evaluation with principal components
Bence Mark Halpern, Thomas B. Tienkamp, Teja Rebernik, Rob J.J.H. van Son, Sebastiaan A.H.J. de Visscher, Max J.H. Witjes, Defne Abur, Tomoki Toda
Comments: 14 pages, 4 figures. Author Accepted Manuscript version of the IEEE Selected Topics in Signal Processing with the same title
Subjects: Sound (cs.SD)
[41] arXiv:2510.00639 [pdf, html, other]
Title: Reference-free automatic speech severity evaluation using acoustic unit language modelling
Bence Mark Halpern, Tomoki Toda
Comments: 5 pages. Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops
Journal-ref: In Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops (pp. 1-5) (2024)
Subjects: Sound (cs.SD)
[42] arXiv:2510.00628 [pdf, html, other]
Title: Hearing the Order: Investigating Selection Bias in Large Audio-Language Models
Yu-Xiang Lin, Chen-An Li, Sheng-Lun Wei, Po-Chun Chen, Hsin-Hsi Chen, Hung-yi Lee
Comments: The first two authors contributed equally. Submitted to ICASSP 2026
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[43] arXiv:2510.00626 [pdf, html, other]
Title: When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
Chen-An Li, Tzu-Han Lin, Hung-yi Lee
Comments: 5 pages; submitted to ICASSP 2026
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[44] arXiv:2510.00522 [pdf, html, other]
Title: ARIONet: An Advanced Self-supervised Contrastive Representation Network for Birdsong Classification and Future Frame Prediction
Md. Abdur Rahman, Selvarajah Thuseethan, Kheng Cher Yeo, Reem E. Mohamed, Sami Azam
Subjects: Sound (cs.SD)
[45] arXiv:2510.00485 [pdf, html, other]
Title: PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
Yujia Xiao, Liumeng Xue, Lei He, Xinyi Chen, Aemon Yat Fei Chiu, Wenjie Tian, Shaofei Zhang, Qiuqiang Kong, Xinfa Zhu, Wei Xue, Tan Lee
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[46] arXiv:2510.00395 [pdf, html, other]
Title: SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
Jiaye Tan, Haonan Luo, Linfeng Song, Shuaiqi Chen, Yishan Lyu, Zian Zhong, Roujia Wang, Daniel Jiang, Haoran Zhang, Jiaming Bai, Haoran Cheng, Q. Vera Liao, Hao-Wen Dong
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47] arXiv:2510.00356 [pdf, html, other]
Title: Dereverberation Using Binary Residual Masking with Time-Domain Consistency
Daniel G. Williams
Comments: 6 pages, 1 figure
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48] arXiv:2510.00264 [pdf, html, other]
Title: Low Resource Audio Codec Challenge Baseline Systems
Yusuf Ziya Isik, Rafał Łaganowski
Comments: Low-Resource Audio Codec Challenge 2025
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[49] arXiv:2510.00052 [pdf, html, other]
Title: A Recall-First CNN for Sleep Apnea Screening from Snoring Audio
Anushka Mallick, Afiya Noorain, Ashwin Menon, Ashita Solanki, Keertan Balaji
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[50] arXiv:2510.00030 [pdf, html, other]
Title: Temporal-Aware Iterative Speech Model for Dementia Detection
Chukwuemeka Ugwu, Oluwafemi Oyeleke
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Total of 128 entries : 1-50 51-100 101-128
Showing up to 50 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack