Sound

Authors and titles for October 2025

Total of 64 entries : 1-50 51-64

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2510.00006 [pdf, other]: Title: Unpacking Musical Symbolism in Online Communities: Content-Based and Network-Centric Approaches

Kajwan Ziaoddini

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Computers and Society (cs.CY); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[2] arXiv:2510.00030 [pdf, html, other]: Title: Temporal-Aware Iterative Speech Model for Dementia Detection

Chukwuemeka Ugwu, Oluwafemi Oyeleke

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3] arXiv:2510.00052 [pdf, html, other]: Title: A Recall-First CNN for Sleep Apnea Screening from Snoring Audio

Anushka Mallick, Afiya Noorain, Ashwin Menon, Ashita Solanki, Keertan Balaji

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2510.00264 [pdf, html, other]: Title: Low Resource Audio Codec Challenge Baseline Systems

Yusuf Ziya Isik, Rafał Łaganowski

Comments: Low-Resource Audio Codec Challenge 2025

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[5] arXiv:2510.00356 [pdf, html, other]: Title: Dereverberation Using Binary Residual Masking with Time-Domain Consistency

Daniel G. Williams

Comments: 6 pages, 1 figure

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2510.00395 [pdf, html, other]: Title: SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

Jiaye Tan, Haonan Luo, Linfeng Song, Shuaiqi Chen, Yishan Lyu, Zian Zhong, Roujia Wang, Daniel Jiang, Haoran Zhang, Jiaming Bai, Haoran Cheng, Q. Vera Liao, Hao-Wen Dong

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7] arXiv:2510.00485 [pdf, html, other]: Title: PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation

Yujia Xiao, Liumeng Xue, Lei He, Xinyi Chen, Aemon Yat Fei Chiu, Wenjie Tian, Shaofei Zhang, Qiuqiang Kong, Xinfa Zhu, Wei Xue, Tan Lee

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[8] arXiv:2510.00522 [pdf, html, other]: Title: ARIONet: An Advanced Self-supervised Contrastive Representation Network for Birdsong Classification and Future Frame Prediction

Md. Abdur Rahman, Selvarajah Thuseethan, Kheng Cher Yeo, Reem E. Mohamed, Sami Azam

Subjects: Sound (cs.SD)
[9] arXiv:2510.00626 [pdf, html, other]: Title: When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models

Chen-An Li, Tzu-Han Lin, Hung-yi Lee

Comments: 5 pages; submitted to ICASSP 2026

Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[10] arXiv:2510.00628 [pdf, html, other]: Title: Hearing the Order: Investigating Selection Bias in Large Audio-Language Models

Yu-Xiang Lin, Chen-An Li, Sheng-Lun Wei, Po-Chun Chen, Hsin-Hsi Chen, Hung-yi Lee

Comments: The first two authors contributed equally. Submitted to ICASSP 2026

Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[11] arXiv:2510.00639 [pdf, html, other]: Title: Reference-free automatic speech severity evaluation using acoustic unit language modelling

Bence Mark Halpern, Tomoki Toda

Comments: 5 pages. Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops

Journal-ref: In Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops (pp. 1-5) (2024)

Subjects: Sound (cs.SD)
[12] arXiv:2510.00657 [pdf, html, other]: Title: XPPG-PCA: Reference-free automatic speech severity evaluation with principal components

Bence Mark Halpern, Thomas B. Tienkamp, Teja Rebernik, Rob J.J.H. van Son, Sebastiaan A.H.J. de Visscher, Max J.H. Witjes, Defne Abur, Tomoki Toda

Comments: 14 pages, 4 figures. Author Accepted Manuscript version of the IEEE Selected Topics in Signal Processing with the same title

Subjects: Sound (cs.SD)
[13] arXiv:2510.00743 [pdf, html, other]: Title: From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling

Yifei Cao, Changhao Jiang, Jiabao Zhuang, Jiajun Sun, Ming Zhang, Zhiheng Xi, Hui Li, Shihan Dou, Yuran Wang, Yunke Zhang, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[14] arXiv:2510.00981 [pdf, html, other]: Title: FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates

Jiaqi Li, Yao Qian, Yuxuan Hu, Leying Zhang, Xiaofei Wang, Heng Lu, Manthan Thakker, Jinyu Li, Sheng Zhao, Zhizheng Wu

Subjects: Sound (cs.SD)
[15] arXiv:2510.01082 [pdf, html, other]: Title: HVAC-EAR: Eavesdropping Human Speech Using HVAC Systems

Tarikul Islam Tamiti, Biraj Joshi, Rida Hasan, Anomadarshi Barua

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR)
[16] arXiv:2510.01109 [pdf, html, other]: Title: NLDSI-BWE: Non Linear Dynamical Systems-Inspired Multi Resolution Discriminators for Speech Bandwidth Extension

Tarikul Islam Tamiti, Anomadarshi Barua

Subjects: Sound (cs.SD)
[17] arXiv:2510.01462 [pdf, html, other]: Title: RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines

Ahmed Adel Attia, Jing Liu, Carol Espy Wilson

Comments: arXiv admin note: substantial text overlap with arXiv:2506.09206

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18] arXiv:2510.01722 [pdf, html, other]: Title: Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari

Comments: In Proceedings of the 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2025)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19] arXiv:2510.01812 [pdf, html, other]: Title: SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment

Yuxun Tang, Lan Liu, Wenhao Feng, Yiwen Zhao, Jionghao Han, Yifeng Yu, Jiatong Shi, Qin Jin

Comments: 4 pages, 5 figures;

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2510.01891 [pdf, html, other]: Title: HRTFformer: A Spatially-Aware Transformer for Personalized HRTF Upsampling in Immersive Audio Rendering

Xuyi Hu, Jian Li, Shaojie Zhang, Stefan Goetz, Lorenzo Picinali, Ozgur B. Akan, Aidan O. T. Hogg

Comments: 10 pages and 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[21] arXiv:2510.01903 [pdf, html, other]: Title: MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Jingyi Li, Zhiyuan Zhao, Yunfei Liu, Lijian Lin, Ye Zhu, Jiahao Wu, Qiuqiang Kong, Yu Li

Comments: 9 pages, 4 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2510.01958 [pdf, other]: Title: Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U-Nets for Improved Cross-Corpus Speech Enhancement

Nikolai Lund Kühne, Jesper Jensen, Jan Østergaard, Zheng-Hua Tan

Comments: Submitted to IEEE for possible publication

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23] arXiv:2510.01963 [pdf, html, other]: Title: Bias beyond Borders: Global Inequalities in AI-Generated Music

Ahmet Solak, Florian Grötschla, Luca A. Lanzendörfer, Roger Wattenhofer

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[24] arXiv:2510.01968 [pdf, html, other]: Title: Multi-bit Audio Watermarking

Luca A. Lanzendörfer, Kyle Fearne, Florian Grötschla, Roger Wattenhofer

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:2510.02110 [pdf, other]: Title: SoundReactor: Frame-level Online Video-to-Audio Generation

Koichi Saito, Julian Tanke, Christian Simon, Masato Ishii, Kazuki Shimada, Zachary Novack, Zhi Zhong, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[26] arXiv:2510.02171 [pdf, html, other]: Title: Go witheFlow: Real-time Emotion Driven Audio Effects Modulation

Edmund Dervakos, Spyridon Kantarelis, Vassilis Lyberatos, Jason Liartis, Giorgos Stamou

Comments: Accepted at NeurIPS Creative AI Track 2025: Humanity

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[27] arXiv:2510.02187 [pdf, html, other]: Title: High-Fidelity Speech Enhancement via Discrete Audio Tokens

Luca A. Lanzendörfer, Frédéric Berdoz, Antonis Asonitis, Roger Wattenhofer

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[28] arXiv:2510.02382 [pdf, html, other]: Title: Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering

Xuemai Xie, Xianrui Wang, Liyuan Zhang, Yichen Yang, Shoji Makino

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2510.02401 [pdf, html, other]: Title: Linear RNNs for autoregressive generation of long music samples

Konrad Szewczyk, Daniel Gallo Fernández, James Townsend

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30] arXiv:2510.02500 [pdf, html, other]: Title: Latent Multi-view Learning for Robust Environmental Sound Representations

Sivan Sing, Julia Wilkins, Magdalena Fuentes, Juan Pablo Bello

Comments: Accepted to DCASE 2025 Workshop. 4+1 pages, 2 figures, 2 tables

Subjects: Sound (cs.SD)
[31] arXiv:2510.02597 [pdf, html, other]: Title: TART: A Comprehensive Tool for Technique-Aware Audio-to-Tab Guitar Transcription

Akshaj Gupta, Andrea Guzman, Anagha Badriprasad, Hwi Joo Park, Upasana Puranik, Robin Netzorg, Jiachen Lian, Gopala Krishna Anumanchipalli

Subjects: Sound (cs.SD)
[32] arXiv:2510.02848 [pdf, other]: Title: Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech

Hieu-Nghia Huynh-Nguyen, Huynh Nguyen Dang, Ngoc-Son Nguyen, Van Nguyen

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[33] arXiv:2510.02864 [pdf, html, other]: Title: Forensic Similarity for Speech Deepfakes

Viola Negroni, Davide Salvi, Daniele Ugo Leonzio, Paolo Bestagini, Stefano Tubaro

Comments: Submitted @ IEEE OJSP

Subjects: Sound (cs.SD)
[34] arXiv:2510.02915 [pdf, html, other]: Title: WavInWav: Time-domain Speech Hiding via Invertible Neural Network

Wei Fan, Kejiang Chen, Xiangkun Wang, Weiming Zhang, Nenghai Yu

Comments: 13 pages, 5 figures, project page: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:2510.02916 [pdf, html, other]: Title: SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos

Amir Dellali, Luca A. Lanzendörfer, Florian Grötschla, Roger Wattenhofer

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[36] arXiv:2510.02995 [pdf, html, other]: Title: AudioToolAgent: An Agentic Framework for Audio-Language Models

Gijs Wijngaard, Elia Formisano, Michel Dumontier

Subjects: Sound (cs.SD)
[37] arXiv:2510.00050 (cross-list from cs.MM) [pdf, html, other]: Title: Object-AVEdit: An Object-level Audio-Visual Editing Model

Youquan Fu, Ruiyang Si, Hongfa Wang, Dongzhan Zhou, Jiacheng Sun, Ping Luo, Di Hu, Hongyuan Zhang, Xuelong Li

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:2510.00180 (cross-list from eess.AS) [pdf, html, other]: Title: DiffAU: Diffusion-Based Ambisonics Upscaling

Amit Milstein, Nir Shlezinger, Boaz Rafaely

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[39] arXiv:2510.00218 (cross-list from eess.AS) [pdf, html, other]: Title: Descriptor:: Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR)

Rahul Vijaykumar, Ajan Ahmed, John Parker, Dinesh Pendyala, Aidan Collins, Stephanie Schuckers, Masudul H. Imtiaz

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:2510.00238 (cross-list from eess.AS) [pdf, html, other]: Title: Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering

Armin Gerami, Ramani Duraiswami

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[41] arXiv:2510.00256 (cross-list from eess.AS) [pdf, html, other]: Title: Subjective quality evaluation of personalized own voice reconstruction systems

Mattes Ohlenbusch, Christian Rollwage, Simon Doclo, Jan Rennies

Comments: Submitted to Acta Acustica

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[42] arXiv:2510.00313 (cross-list from eess.AS) [pdf, html, other]: Title: Post-Training Quantization for Audio Diffusion Transformers

Tanmay Khandelwal, Magdalena Fuentes

Comments: 5 pages, 4 figures, accepted at IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:2510.00346 (cross-list from eess.AS) [pdf, html, other]: Title: Learning Domain-Robust Bioacoustic Representations for Mosquito Species Classification with Contrastive Learning and Distribution Alignment

Yuanbo Hou, Zhaoyi Liu, Xin Shen, Stephen Roberts

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[44] arXiv:2510.00582 (cross-list from cs.CL) [pdf, html, other]: Title: SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation

Sangmin Lee, Woongjib Choi, Jihyun Kim, Hong-Goo Kang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD)
[45] arXiv:2510.00771 (cross-list from eess.AS) [pdf, html, other]: Title: UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching

Woongjib Choi, Sangmin Lee, Hyungseob Lim, Hong-Goo Kang

Comments: Submitted to ICASSP 2026

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[46] arXiv:2510.00952 (cross-list from eess.AS) [pdf, html, other]: Title: CL-UZH submission to the NIST SRE 2024 Speaker Recognition Evaluation

Aref Farhadipour, Shiran Liu, Masoumeh Chapariniya, Valeriia Perepelytsia, Srikanth Madikeri, Teodora Vukovic, Volker Dellwo

Comments: CL-UZH submission for the NIST SRE 2024 Evaluation plan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47] arXiv:2510.00982 (cross-list from eess.AS) [pdf, html, other]: Title: Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting

Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Comments: Accepted for ASRU 2025

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[48] arXiv:2510.01157 (cross-list from cs.CL) [pdf, html, other]: Title: Backdoor Attacks Against Speech Language Models

Alexandrine Fortier, Thomas Thebaud, Jesús Villalba, Najim Dehak, Patrick Cardinal

Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Sound (cs.SD)
[49] arXiv:2510.01176 (cross-list from cs.GR) [pdf, html, other]: Title: Audio Driven Real-Time Facial Animation for Social Telepresence

Jiye Lee, Chenghui Li, Linh Tran, Shih-En Wei, Jason Saragih, Alexander Richard, Hanbyul Joo, Shaojie Bai

Comments: SIGGRAPH Asia 2025. Project page: this https URL

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[50] arXiv:2510.01254 (cross-list from cs.CL) [pdf, html, other]: Title: Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs

Shree Harsha Bokkahalli Satish, Gustav Eje Henter, Éva Székely

Comments: 5 pages, 2 Figures, Submitted to IEEE ICASSP 2026

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Total of 64 entries : 1-50 51-64

Showing up to 50 entries per page: fewer | more | all