Multimedia

Authors and titles for recent submissions

See today's new changes

Total of 34 entries

Showing up to 50 entries per page: fewer | more | all

[1] arXiv:2512.03521 [pdf, html, other]: Title: Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation

Xiaosen Lyu, Jiayu Xiong, Yuren Chen, Wanlong Wang, Xiaoqing Dai, Jing Wang

Comments: Accepted to AAAI 2026

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[2] arXiv:2512.03087 [pdf, html, other]: Title: When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI

Yanhui Li, Qi Zhou, Zhihong Xu, Huizhong Guo, Wenhai Wang, Dongxia Wang

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[3] arXiv:2512.03566 (cross-list from cs.CV) [pdf, html, other]: Title: GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models

Hao Sun, Lei Fan, Donglin Di, Shaohui Liu

Comments: Accepted by ACM MM Asia2026

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

[4] arXiv:2512.02584 [pdf, html, other]: Title: Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction

Xiang Yuan, Xinrong Chen, Haochen Li, Hang Yang, Guanyu Wang, Weiping Li, Tong Mo

Comments: Accepted by 2025 IEEE International Conference on Multimedia and Expo

Subjects: Multimedia (cs.MM)
[5] arXiv:2512.02533 [pdf, html, other]: Title: PopSim: Social Network Simulation for Social Media Popularity Prediction

Yijun Liu, Wu Liu, Xiaoyan Gu, Allen He, Weiping Wang, Yongdong Zhang

Subjects: Multimedia (cs.MM)
[6] arXiv:2512.02906 (cross-list from cs.CV) [pdf, html, other]: Title: MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding

Fan Yang, Kaihao Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[7] arXiv:2512.02792 (cross-list from cs.CV) [pdf, html, other]: Title: HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, Weili Guan

Comments: Accepted by ACM MM 2025

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[8] arXiv:2512.02652 (cross-list from cs.SD) [pdf, html, other]: Title: Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

Hong-Jie You, Jie-Jing Shao, Xiao-Wen Yang, Lin-Han Jia, Lan-Zhe Guo, Yu-Feng Li

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[9] arXiv:2512.02650 (cross-list from cs.CV) [pdf, html, other]: Title: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Junwon Lee, Juhan Nam, Jiyoung Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[10] arXiv:2512.01442 [pdf, html, other]: Title: PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis

Heng Xie, Kang Zhu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Ruibo Fu, Changsheng Li

Comments: AAAI 2026 accepted

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[11] arXiv:2512.01267 [pdf, html, other]: Title: ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation

Yuezhang Peng, Yuxin Liu, Yao Li, Sheng Wang, Fei Wen, Xie Chen

Comments: 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[12] arXiv:2512.00928 [pdf, html, other]: Title: Augmenting Intra-Modal Understanding in MLLMs for Robust Multimodal Keyphrase Generation

Jiajun Cao, Qinggang Zhang, Yunbo Tang, Zhishang Xiang, Chang Yang, Jinsong Su

Subjects: Multimedia (cs.MM)
[13] arXiv:2512.00883 [pdf, html, other]: Title: Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound

Jiahua Wang, Shannan Yan, Leqi Zheng, Jialong Wu, Yaoxin Mao

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[14] arXiv:2512.01603 (cross-list from cs.CL) [pdf, html, other]: Title: MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

Yuezhang Peng, Chonghao Cai, Ziang Liu, Shuai Fan, Sheng Jiang, Hua Xu, Yuxin Liu, Qiguang Chen, Kele Xu, Yao Li, Sheng Wang, Libo Qin, Xie Chen

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[15] arXiv:2512.00537 (cross-list from cs.HC) [pdf, other]: Title: Speculating on the Role of Media Architecture in Post-disaster Rebuilding and Recovery: Insights from Architects and Interaction Designers

Berk Goksenin Tan, Oguzhan Ozcan

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Emerging Technologies (cs.ET); Multimedia (cs.MM)
[16] arXiv:2512.00451 (cross-list from cs.SD) [pdf, html, other]: Title: STCTS: Generative Semantic Compression for Ultra-Low Bitrate Speech via Explicit Text-Prosody-Timbre Decomposition

Siyu Wang, Haitao Li, Donglai Zhu

Comments: The complete source code and online speech reconstruction demo is publicly available at this https URL

Subjects: Sound (cs.SD); Multimedia (cs.MM)
[17] arXiv:2512.00120 (cross-list from cs.SD) [pdf, html, other]: Title: Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment

Jiaying Hong, Ting Zhu, Thanet Markchom, Huizhi Liang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[18] arXiv:2512.00115 (cross-list from cs.SD) [pdf, html, other]: Title: MoLT: Mixture of Layer-Wise Tokens for Efficient Audio-Visual Learning

Kyeongha Rho, Hyeongkeun Lee, Jae Won Cho, Joon Son Chung

Comments: 10 pages, 5 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

[19] arXiv:2511.22576 [pdf, html, other]: Title: A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization

Janak Kapuriya, Ali Hatami, Paul Buitelaar

Subjects: Multimedia (cs.MM)
[20] arXiv:2511.22463 [pdf, html, other]: Title: Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation

Xinyi Che, Wenbo Wang, Jian Guan, Qijun Zhao

Comments: 10 pages, 1 figure

Subjects: Multimedia (cs.MM)
[21] arXiv:2511.22447 [pdf, html, other]: Title: Angle-Optimized Partial Disentanglement for Multimodal Emotion Recognition in Conversation

Xinyi Che, Wenbo Wang, Yuanbo Hou, Mingjie Xie, Qijun Zhao, Jian Guan

Comments: 10 pages, 7 figures

Subjects: Multimedia (cs.MM)
[22] arXiv:2511.22229 [pdf, html, other]: Title: VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task

Yuyue Wang, Xin Cheng, Yihan Wu, Xihua Wang, Jinchuan Tian, Ruihua Song

Comments: MM Asia 2025

Subjects: Multimedia (cs.MM)
[23] arXiv:2511.21780 [pdf, html, other]: Title: 3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation

Yaoru Li, Heyu Si, Federico Landi, Pilar Oplustil Gallegos, Ioannis Koutsoumpas, O. Ricardo Cortez Vazquez, Ruiju Fu, Qi Guo, Xin Jin, Shunyu Liu, Mingli Song

Subjects: Multimedia (cs.MM); Sound (cs.SD)
[24] arXiv:2511.21698 [pdf, html, other]: Title: TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement

Zhiyong Ma, Jiahao Chen, Qingyuan Chuai, Zhengping Li

Comments: Submitted to ICASSP2026

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[25] arXiv:2511.21694 [pdf, html, other]: Title: A Survey of Information Disorder on Video-Sharing Platforms

Meiyu Li, Wei Ai, Naeemul Hassan

Comments: Accepted by 2025 IEEE International Conference on Content-Based Multimedia Indexing

Subjects: Multimedia (cs.MM); Computers and Society (cs.CY)
[26] arXiv:2511.21693 [pdf, html, other]: Title: Designing a Multimodal Viewer for Piano Performance Analysis -- a Pedagogy-First Approach

Joonhyung Bae, Hyeyoon Cho, Kirak Kim, Dawon Park, Taegyun Kwon, Yoon-Seok Choi, Hyeon Hur, Shigeru Kai, Yohei Wada, Satoshi Obata, Akira Maezawa, Jaebum Park, Jonghwa Park, Juhan Nam

Subjects: Multimedia (cs.MM)
[27] arXiv:2511.22805 (cross-list from cs.CV) [pdf, html, other]: Title: From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

Yiming Chen, Junlin Han, Tianyi Bai, Shengbang Tong, Filippos Kokkinos, Philip Torr

Comments: Project page with codes/datasets/models: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[28] arXiv:2511.22715 (cross-list from cs.CV) [pdf, html, other]: Title: ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

Alberto Compagnoni, Marco Morini, Sara Sarto, Federico Cocchi, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[29] arXiv:2511.22055 (cross-list from cs.CV) [pdf, html, other]: Title: OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

Jing Hao, Yuci Liang, Lizhuo Lin, Yuxuan Fan, Wenkai Zhou, Kaixin Guo, Zanting Ye, Yanpeng Sun, Xinyu Zhang, Yanqi Yang, Qiankun Li, Hao Tang, James Kit-Hon Tsoi, Linlin Shen, Kuo Feng Hung

Comments: 47 pages, 42 figures, 13 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[30] arXiv:2511.22046 (cross-list from cs.NI) [pdf, html, other]: Title: AutoRec: Accelerating Loss Recovery for Live Streaming in a Multi-Supplier Market

Tong Li, Xu Yan, Bo Wu, Cheng Luo, Fuyu Wang, Jiuxiang Zhu, Haoyi Fang, Xinle Du, Ke Xu

Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM); Image and Video Processing (eess.IV)

[31] arXiv:2511.21244 [pdf, html, other]: Title: PixelatedScatter: Arbitrary-level Visual Abstraction for Large-scale Multiclass Scatterplots

Ziheng Guo, Tianxiang Wei, Zeyu Li, Lianghao Zhang, Sisi Li, Jiawan Zhang

Subjects: Multimedia (cs.MM)
[32] arXiv:2511.21146 [pdf, html, other]: Title: AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control

Xinyue Guo, Xiaoran Yang, Lipan Zhang, Jianxuan Yang, Zhao Wang, Jian Luan

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[33] arXiv:2511.20732 [pdf, html, other]: Title: Prompt-Aware Adaptive Elastic Weight Consolidation for Continual Learning in Medical Vision-Language Models

Ziyuan Gao, Philippe Morel

Comments: Accepted by 32nd International Conference on MultiMedia Modeling (MMM 2026)

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[34] arXiv:2511.20961 (cross-list from cs.NI) [pdf, html, other]: Title: Performance Evaluation of Low-Latency Live Streaming of MPEG-DASH UHD video over Commercial 5G NSA/SA Network

Kasidis Arunruangsirilert, Bo Wei, Hang Song, Jiro Katto

Comments: 2022 International Conference on Computer Communications and Networks (ICCCN), 25-28 July 2022, Honolulu, HI, USA

Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Total of 34 entries

Showing up to 50 entries per page: fewer | more | all

Multimedia

Authors and titles for recent submissions

Thu, 4 Dec 2025 (showing 3 of 3 entries )

Wed, 3 Dec 2025 (showing 6 of 6 entries )

Tue, 2 Dec 2025 (showing 9 of 9 entries )

Mon, 1 Dec 2025 (showing 12 of 12 entries )

Thu, 27 Nov 2025 (showing 4 of 4 entries )