2025-02-20 |
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework |
Yuming Yang et.al. |
2502.14864v1 |
null |
2025-02-20 |
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation |
Yue Yang et.al. |
2502.14846v1 |
null |
2025-02-20 |
Dynamic Concepts Personalization from Single Videos |
Rameen Abdal et.al. |
2502.14844v1 |
null |
2025-02-20 |
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models |
Shangqing Tu et.al. |
2502.14834v1 |
null |
2025-02-20 |
Improving the Diffusability of Autoencoders |
Ivan Skorokhodov et.al. |
2502.14831v1 |
null |
2025-02-20 |
Turning on the Light: Polymorphism-Induced Photoluminescence in Cysteine Crystals |
Debarshi Banerjee et.al. |
2502.14826v1 |
null |
2025-02-20 |
FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis |
Fadillah Maani et.al. |
2502.14807v1 |
null |
2025-02-20 |
A Survey on Text-Driven 360-Degree Panorama Generation |
Hai Wang et.al. |
2502.14799v1 |
null |
2025-02-20 |
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features |
Michael Tschannen et.al. |
2502.14786v1 |
null |
2025-02-20 |
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting |
Abhijit Mishra et.al. |
2502.14780v1 |
null |
2025-02-20 |
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models |
Hongji Yang et.al. |
2502.14779v1 |
null |
2025-02-20 |
Harnessing PDF Data for Improving Japanese Large Multimodal Models |
Jeonghun Baek et.al. |
2502.14778v1 |
null |
2025-02-20 |
MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders |
Maya Varma et.al. |
2502.14753v1 |
null |
2025-02-20 |
AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers |
Wen-Fan Wang et.al. |
2502.14747v1 |
null |
2025-02-20 |
H$α$ Variability of AB Aur b with the Hubble Space Telescope: Probing the Nature of a Protoplanet Candidate with Accretion Light Echoes |
Brendan P. Bowler et.al. |
2502.14736v1 |
null |
2025-02-20 |
Model-based time super-sampling of turbulent flow field sequences |
Qihong Lorena Li-Hu et.al. |
2502.14722v1 |
null |
2025-02-20 |
TRUSWorthy: Toward Clinically Applicable Deep Learning for Confident Detection of Prostate Cancer in Micro-Ultrasound |
Mohamed Harmanani et.al. |
2502.14707v1 |
null |
2025-02-20 |
Constraints on optical and near-infrared variability in the localisation of the long-period radio transient GLEAM-X J1627-52 |
J. D. Lyman et.al. |
2502.14688v1 |
null |
2025-02-20 |
MAGO-SP: Detection and Correction of Water-Fat Swaps in Magnitude-Only VIBE MRI |
Robert Graf et.al. |
2502.14659v1 |
null |
2025-02-20 |
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization |
Zheyuan Zhang et.al. |
2502.14638v1 |
null |
2025-02-20 |
Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion |
Jiangyuan Liu et.al. |
2502.14616v1 |
null |
2025-02-20 |
A Millimeter-Wave Photometric Camera for Long-Range Imaging Through Optical Obscurants Using Kinetic Inductance Detectors |
Jack Sayers et.al. |
2502.14607v1 |
null |
2025-02-20 |
Emergent Goldstone flat bands and spontaneous symmetry breaking with type-B Goldstone modes |
Huan-Qiang Zhou et.al. |
2502.14605v1 |
null |
2025-02-20 |
Vision Foundation Models in Medical Image Analysis: Advances and Challenges |
Pengchen Liang et.al. |
2502.14584v1 |
null |
2025-02-20 |
Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining |
Wonhyeok Choi et.al. |
2502.14573v1 |
null |
2025-02-20 |
Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling |
Eric Egli et.al. |
2502.14553v1 |
null |
2025-02-20 |
Modeling Tidal Streams and Tidal Tails around Galaxies Using Deep Wendelstein Imaging Data |
Jan-Niklas Pippert et.al. |
2502.14531v1 |
null |
2025-02-20 |
Accelerated X-Ray Fluorescence Computed Tomography via Multi-Pencil-Beam Excitation |
Ryder M. Schmidt et.al. |
2502.14524v1 |
null |
2025-02-20 |
PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models |
Yu Meng et.al. |
2502.14504v1 |
null |
2025-02-20 |
LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera |
Weiyi Xiong et.al. |
2502.14503v1 |
null |