2024-11-29 |
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark |
Joseph Heyward et.al. |
2411.19941v1 |
null |
2024-11-29 |
Dynamic EEG-fMRI mapping: Revealing the relationship between brain connectivity and cognitive state |
Guiran Liu et.al. |
2411.19922v1 |
null |
2024-11-29 |
Handling irresolvable conflicts in the Semantic Web: an RDF-based conflict-tolerant version of the Deontic Traditional Scheme |
Livio Robaldo et.al. |
2411.19918v1 |
link |
2024-11-29 |
Nonparametric Estimation for a Log-concave Distribution Function with Interval-censored Data |
Chi Wing Chu et.al. |
2411.19878v1 |
null |
2024-11-29 |
SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection |
Philipp Wolters et.al. |
2411.19860v1 |
null |
2024-11-29 |
SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition |
Fangze Fu et.al. |
2411.19822v1 |
null |
2024-11-29 |
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives |
Armin Saghafian et.al. |
2411.19787v1 |
link |
2024-11-29 |
MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks |
Yiming Wu et.al. |
2411.19786v1 |
null |
2024-11-29 |
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos |
Tiantian Geng et.al. |
2411.19772v1 |
null |
2024-11-29 |
JetFormer: An Autoregressive Generative Model of Raw Images and Text |
Michael Tschannen et.al. |
2411.19722v1 |
null |
2024-11-29 |
Multimodal Whole Slide Foundation Model for Pathology |
Tong Ding et.al. |
2411.19666v1 |
link |
2024-11-29 |
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings |
Qiong Wu et.al. |
2411.19628v1 |
link |
2024-11-29 |
Self-Supervised Denoiser Framework |
Emilien Valat et.al. |
2411.19593v1 |
null |
2024-11-29 |
Enhancing AI microscopy for foodborne bacterial classification via adversarial domain adaptation across optical and biological variability |
Siddhartha Bhattacharya et.al. |
2411.19514v1 |
null |
2024-11-29 |
Interleaved-Modal Chain-of-Thought |
Jun Gao et.al. |
2411.19488v1 |
null |
2024-11-29 |
Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis |
Ruoqi Wang et.al. |
2411.19475v1 |
null |
2024-11-29 |
Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing |
Hosu Lee et.al. |
2411.19460v1 |
null |
2024-11-29 |
Adaptive Interactive Segmentation for Multimodal Medical Imaging via Selection Engine |
Zhi Li et.al. |
2411.19447v1 |
link |
2024-11-28 |
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections |
Mohamed Fazli Imam et.al. |
2411.19346v1 |
link |
2024-11-28 |
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs |
Anirudh Phukan et.al. |
2411.19187v1 |
null |
2024-11-28 |
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos |
Prithviraj Banerjee et.al. |
2411.19167v1 |
null |
2024-11-28 |
On Moving Object Segmentation from Monocular Video with Transformers |
Christian Homeyer et.al. |
2411.19141v1 |
null |
2024-11-28 |
Headache to Overstock? Promoting Long-tail Items through Debiased Product Bundling |
Shuo Xu et.al. |
2411.19107v1 |
null |
2024-11-28 |
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors |
Guangshun Wei et.al. |
2411.19036v1 |
null |
2024-11-28 |
Perception of Visual Content: Differences Between Humans and Foundation Models |
Nardiena A. Pratama et.al. |
2411.18968v1 |
null |
2024-11-28 |
Second harmonic generation with 48% conversion efficiency from cavity polygon modes in a monocrystalline lithium niobate microdisk resonator |
Chao Sun et.al. |
2411.18870v1 |
null |
2024-11-28 |
CrossTracker: Robust Multi-modal 3D Multi-Object Tracking via Cross Correction |
Lipeng Gu et.al. |
2411.18850v1 |
null |
2024-11-27 |
Stratified Non-Negative Tensor Factorization |
Alexander Sietsema et.al. |
2411.18805v1 |
null |
2024-11-27 |
MRI Breast tissue segmentation using nnU-Net for biomechanical modeling |
Melika Pooyan et.al. |
2411.18784v1 |
null |
2024-11-27 |
Decoding Non-Linearity and Complexity: Deep Tabular Learning Approaches for Materials Science |
Vahid Attari et.al. |
2411.18717v1 |
null |