2025-01-16 |
Distilling Multi-modal Large Language Models for Autonomous Driving |
Deepti Hegde et.al. |
2501.09757v1 |
null |
2025-01-16 |
The Goofus & Gallant Story Corpus for Practical Value Alignment |
Md Sultan Al Nahian et.al. |
2501.09707v1 |
null |
2025-01-16 |
Cueless EEG imagined speech for subject identification: dataset and benchmarks |
Ali Derakhshesh et.al. |
2501.09700v1 |
link |
2025-01-16 |
Data mining the functional architecture of the brain's circuitry |
Adam S. Charles et.al. |
2501.09684v1 |
null |
2025-01-16 |
Authenticated Delegation and Authorized AI Agents |
Tobin South et.al. |
2501.09674v1 |
null |
2025-01-16 |
Fluholoscopy. Compact and Simple Platform Combining Fluorescence and Holographic Microscopy |
David Alonso et.al. |
2501.09639v1 |
null |
2025-01-16 |
LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading |
Kuan-Ming Liu et.al. |
2501.09636v1 |
null |
2025-01-16 |
Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning |
Donghuo Zeng et.al. |
2501.09608v1 |
null |
2025-01-16 |
Fabrication of Mode-Matched, Low-Loss Optical Resonators by Combination of FIB-Milling and CO$_2$ Laser Ablation |
Patrick Maier et.al. |
2501.09577v1 |
null |
2025-01-16 |
A Multi-agent System for Hybrid Optimization |
Eric S. Fraga et.al. |
2501.09563v1 |
null |
2025-01-16 |
Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis |
Tingxuan Chen et.al. |
2501.09555v1 |
null |
2025-01-16 |
Resolution enhancement in quantitative phase microscopy: a review |
Vicente Mico et.al. |
2501.09548v1 |
null |
2025-01-16 |
AdaFV: Accelerating VLMs with Self-Adaptive Cross-Modality Attention Mixture |
Jiayi Han et.al. |
2501.09532v1 |
null |
2025-01-16 |
Self-interfering high harmonic beam arrays driven by Hermite-Gaussian beams |
David D. Schmidt et.al. |
2501.09507v1 |
null |
2025-01-16 |
Multimodal Marvels of Deep Learning in Medical Diagnosis: A Comprehensive Review of COVID-19 Detection |
Md Shofiqul Islama et.al. |
2501.09506v1 |
link |
2025-01-16 |
Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis |
Qize Yang et.al. |
2501.09502v1 |
null |
2025-01-16 |
VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization |
Zixun Fang et.al. |
2501.09499v1 |
null |
2025-01-16 |
Optimal taxes and subsidies to incentivize modal shift for inner-city freight transport |
Krissada Tundulyasaree et.al. |
2501.09467v1 |
null |
2025-01-16 |
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring |
Xinyi Wang et.al. |
2501.09428v1 |
null |
2025-01-16 |
Joint Transmission and Deblurring: A Semantic Communication Approach Using Events |
Pujing Yang et.al. |
2501.09396v1 |
null |
2025-01-16 |
PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning |
Xianghu Yue et.al. |
2501.09352v1 |
null |
2025-01-16 |
LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport |
Kyeongha Rho et.al. |
2501.09291v1 |
link |
2025-01-16 |
Graded Courrent PDL |
Chun-Yu Lin et.al. |
2501.09285v1 |
null |
2025-01-16 |
Text Semantics to Flexible Design: A Residential Layout Generation Method Based on Stable Diffusion Model |
Zijin Qiu et.al. |
2501.09279v1 |
null |
2025-01-16 |
OpticFusion: Multi-Modal Neural Implicit 3D Reconstruction of Microstructures by Fusing White Light Interferometry and Optical Microscopy |
Shuo Chen et.al. |
2501.09259v1 |
link |
2025-01-15 |
Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures |
Pengru Deng et.al. |
2501.09203v1 |
null |
2025-01-15 |
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging |
Jarett Dewbury et.al. |
2501.09185v1 |
null |
2025-01-15 |
Beyond Speaker Identity: Text Guided Target Speech Extraction |
Mingyue Huo et.al. |
2501.09169v1 |
null |
2025-01-15 |
A Non-autoregressive Model for Joint STT and TTS |
Vishal Sunder et.al. |
2501.09104v1 |
null |
2025-01-15 |
Vision Foundation Models for Computed Tomography |
Suraj Pai et.al. |
2501.09001v1 |
null |