2025-04-24 |
Quadratic Interest Network for Multimodal Click-Through Rate Prediction |
Honghao Li et.al. |
2504.17699v1 |
null |
2025-04-24 |
Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction |
Yuanchang Ye et.al. |
2504.17671v1 |
null |
2025-04-24 |
RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network |
Boyue Xu et.al. |
2504.17595v1 |
null |
2025-04-24 |
A compact laser-plasma source for high-repetition-rate bi-modal X-ray and electron imaging |
Angana Mondal et.al. |
2504.17560v1 |
null |
2025-04-24 |
A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task |
Jiaqi Deng et.al. |
2504.17547v1 |
null |
2025-04-24 |
An introduction to R package mvs |
Wouter van Loon et.al. |
2504.17546v1 |
null |
2025-04-24 |
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs |
Tiancheng Gu et.al. |
2504.17432v1 |
null |
2025-04-24 |
Evaluating and Mitigating Bias in AI-Based Medical Text Generation |
Xiuying Chen et.al. |
2504.17279v1 |
null |
2025-04-24 |
Contrastive Learning for Continuous Touch-Based Authentication |
Mengyu Qiao et.al. |
2504.17271v1 |
null |
2025-04-24 |
Towards Generalized and Training-Free Text-Guided Semantic Manipulation |
Yu Hong et.al. |
2504.17269v1 |
null |
2025-04-24 |
Symbolic Representation for Any-to-Any Generative Tasks |
Jiaqi Chen et.al. |
2504.17261v1 |
null |
2025-04-24 |
Multi-Modal Traffic Analysis: Integrating Time-Series Forecasting, Accident Prediction, and Image Classification |
Nivedita M et.al. |
2504.17232v1 |
null |
2025-04-24 |
Visual and textual prompts for enhancing emotion recognition in video |
Zhifeng Wang et.al. |
2504.17224v1 |
null |
2025-04-24 |
Towards Generalizable Deepfake Detection with Spatial-Frequency Collaborative Learning and Hierarchical Cross-Modal Fusion |
Mengyu Qiao et.al. |
2504.17223v1 |
null |
2025-04-24 |
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing |
Kevin Lane et.al. |
2504.17177v1 |
null |
2025-04-24 |
Improving Human-Autonomous Vehicle Interaction in Complex Systems |
Robert Kaufman et.al. |
2504.17170v1 |
null |
2025-04-24 |
PhysioSync: Temporal and Cross-Modal Contrastive Learning Inspired by Physiological Synchronization for EEG-Based Emotion Recognition |
Kai Cui et.al. |
2504.17163v1 |
null |
2025-04-23 |
Observation of Double Hysteresis in CoFe$_2$O$_4$/MnFe$_2$O$_4$ Core/Shell Nanoparticles and Its Contribution to AC Heat Induction |
Jie Wang et.al. |
2504.16904v1 |
null |
2025-04-23 |
Exploring zero-shot structure-based protein fitness prediction |
Arnav Sharma et.al. |
2504.16886v1 |
null |
2025-04-23 |
Decoupled Global-Local Alignment for Improving Compositional Understanding |
Xiaoxing Hu et.al. |
2504.16801v1 |
null |
2025-04-23 |
4D Multimodal Co-attention Fusion Network with Latent Contrastive Alignment for Alzheimer's Diagnosis |
Yuxiang Wei et.al. |
2504.16798v1 |
null |
2025-04-23 |
Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation |
Lakshita Agarwal et.al. |
2504.16788v1 |
null |
2025-04-23 |
Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism |
Lakshita Agarwal et.al. |
2504.16774v1 |
null |
2025-04-23 |
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism |
Lakshita Agarwal et.al. |
2504.16761v1 |
null |
2025-04-23 |
An Expressive Coalgebraic Modal Logic for Cellular Automata |
Henning Basold et.al. |
2504.16735v1 |
null |
2025-04-23 |
A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification |
Wenping Ma et.al. |
2504.16665v1 |
null |
2025-04-23 |
Online and feasible presentability: from trees to modal algebras |
Nikolay Bazhenov et.al. |
2504.16663v1 |
null |
2025-04-23 |
WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks |
Younggeol Cho et.al. |
2504.16655v1 |
null |
2025-04-23 |
MMHCL: Multi-Modal Hypergraph Contrastive Learning for Recommendation |
Xu Guo et.al. |
2504.16576v1 |
null |
2025-04-23 |
Transformers for Complex Query Answering over Knowledge Hypergraphs |
Hong Ting Tsang et.al. |
2504.16537v1 |
null |