Skip to content

Alignment

Alignment

Publish Date Title Authors PDF Code
2025-02-20 Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework Yuming Yang et.al. 2502.14864v1 null
2025-02-20 AVD2: Accident Video Diffusion for Accident Video Description Cheng Li et.al. 2502.14801v1 null
2025-02-20 HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States Yilei Jiang et.al. 2502.14744v1 null
2025-02-20 WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models Yifu Chen et.al. 2502.14727v1 null
2025-02-20 ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors Yuguo Yin et.al. 2502.14627v1 null
2025-02-20 Dynamic Preference-based Multi-modal Trip Planning of Public Transport and Shared Mobility Yimeng Zhang et.al. 2502.14528v1 null
2025-02-20 Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well Chengyu Fang et.al. 2502.14471v1 null
2025-02-20 Visual and Auditory Aesthetic Preferences Across Cultures Harin Lee et.al. 2502.14439v1 null
2025-02-20 MedFuncta: Modality-Agnostic Representations Based on Efficient Neural Fields Paul Friedrich et.al. 2502.14401v1 link
2025-02-20 SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images Yichi Zhang et.al. 2502.14351v1 null
2025-02-20 Bolide infrasound signal morphology and yield estimates: A case study of two events detected by a dense acoustic sensor network Trevor C. Wilson et.al. 2502.14232v1 null
2025-02-20 SleepGMUformer: A gated multimodal temporal neural network for sleep staging Chenjun Zhao et.al. 2502.14227v1 null
2025-02-20 Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition Tianyi Shang et.al. 2502.14195v1 link
2025-02-20 A modal logic translation of the AGM axioms for belief revision Giacomo Bonanno et.al. 2502.14176v1 null
2025-02-19 Additive Enrichment from Coderelictions Jean-Simon Pacaud Lemay et.al. 2502.14134v1 null
2025-02-19 Object-centric Binding in Contrastive Language-Image Pretraining Rim Assouel et.al. 2502.14113v1 null
2025-02-19 Triad: Vision Foundation Model for 3D Magnetic Resonance Imaging Shansong Wang et.al. 2502.14064v1 null
2025-02-19 Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition Jingwang Huang et.al. 2502.13954v1 link
2025-02-19 A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models Hao Huang et.al. 2502.13942v1 null
2025-02-19 Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition Idris Hamoud et.al. 2502.13883v1 null
2025-02-19 MEX: Memory-efficient Approach to Referring Multi-Object Tracking Huu-Thien Tran et.al. 2502.13875v1 null
2025-02-19 Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model Hang Yin et.al. 2502.13838v1 null
2025-02-19 Building Age Estimation: A New Multi-Modal Benchmark Dataset and Community Challenge Nikolaos Dionelis et.al. 2502.13818v1 null
2025-02-19 Exploring Embodied Emotional Communication: A Human-oriented Review of Mediated Social Touch Liwen He et.al. 2502.13816v1 null
2025-02-19 From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education Yi-Fan Zhang et.al. 2502.13789v1 null
2025-02-19 GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking Florian Schneider et.al. 2502.13766v1 null
2025-02-19 Cascading CMA-ES Instances for Generating Input-diverse Solution Batches Maria Laura Santoni et.al. 2502.13730v1 link
2025-02-19 Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method Juyuan Zhang et.al. 2502.13725v1 null
2025-02-19 Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields Taewoo Kim et.al. 2502.13716v1 link
2025-02-19 TALKPLAY: Multimodal Music Recommendation with Large Language Models Seungheon Doh et.al. 2502.13713v2 null