Skip to content

Multi modal

Multi-modal

Publish Date Title Authors PDF Code
2025-01-16 Distilling Multi-modal Large Language Models for Autonomous Driving Deepti Hegde et.al. 2501.09757v1 null
2025-01-16 The Goofus & Gallant Story Corpus for Practical Value Alignment Md Sultan Al Nahian et.al. 2501.09707v1 null
2025-01-16 Cueless EEG imagined speech for subject identification: dataset and benchmarks Ali Derakhshesh et.al. 2501.09700v1 link
2025-01-16 Data mining the functional architecture of the brain's circuitry Adam S. Charles et.al. 2501.09684v1 null
2025-01-16 Authenticated Delegation and Authorized AI Agents Tobin South et.al. 2501.09674v1 null
2025-01-16 Fluholoscopy. Compact and Simple Platform Combining Fluorescence and Holographic Microscopy David Alonso et.al. 2501.09639v1 null
2025-01-16 LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading Kuan-Ming Liu et.al. 2501.09636v1 null
2025-01-16 Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning Donghuo Zeng et.al. 2501.09608v1 null
2025-01-16 Fabrication of Mode-Matched, Low-Loss Optical Resonators by Combination of FIB-Milling and CO$_2$ Laser Ablation Patrick Maier et.al. 2501.09577v1 null
2025-01-16 A Multi-agent System for Hybrid Optimization Eric S. Fraga et.al. 2501.09563v1 null
2025-01-16 Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis Tingxuan Chen et.al. 2501.09555v1 null
2025-01-16 Resolution enhancement in quantitative phase microscopy: a review Vicente Mico et.al. 2501.09548v1 null
2025-01-16 AdaFV: Accelerating VLMs with Self-Adaptive Cross-Modality Attention Mixture Jiayi Han et.al. 2501.09532v1 null
2025-01-16 Self-interfering high harmonic beam arrays driven by Hermite-Gaussian beams David D. Schmidt et.al. 2501.09507v1 null
2025-01-16 Multimodal Marvels of Deep Learning in Medical Diagnosis: A Comprehensive Review of COVID-19 Detection Md Shofiqul Islama et.al. 2501.09506v1 link
2025-01-16 Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis Qize Yang et.al. 2501.09502v1 null
2025-01-16 VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization Zixun Fang et.al. 2501.09499v1 null
2025-01-16 Optimal taxes and subsidies to incentivize modal shift for inner-city freight transport Krissada Tundulyasaree et.al. 2501.09467v1 null
2025-01-16 AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring Xinyi Wang et.al. 2501.09428v1 null
2025-01-16 Joint Transmission and Deblurring: A Semantic Communication Approach Using Events Pujing Yang et.al. 2501.09396v1 null
2025-01-16 PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning Xianghu Yue et.al. 2501.09352v1 null
2025-01-16 LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport Kyeongha Rho et.al. 2501.09291v1 link
2025-01-16 Graded Courrent PDL Chun-Yu Lin et.al. 2501.09285v1 null
2025-01-16 Text Semantics to Flexible Design: A Residential Layout Generation Method Based on Stable Diffusion Model Zijin Qiu et.al. 2501.09279v1 null
2025-01-16 OpticFusion: Multi-Modal Neural Implicit 3D Reconstruction of Microstructures by Fusing White Light Interferometry and Optical Microscopy Shuo Chen et.al. 2501.09259v1 link
2025-01-15 Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures Pengru Deng et.al. 2501.09203v1 null
2025-01-15 Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging Jarett Dewbury et.al. 2501.09185v1 null
2025-01-15 Beyond Speaker Identity: Text Guided Target Speech Extraction Mingyue Huo et.al. 2501.09169v1 null
2025-01-15 A Non-autoregressive Model for Joint STT and TTS Vishal Sunder et.al. 2501.09104v1 null
2025-01-15 Vision Foundation Models for Computed Tomography Suraj Pai et.al. 2501.09001v1 null