Skip to content

Image Caption

Image Caption

Publish Date Title Authors PDF Code
2025-02-20 Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework Yuming Yang et.al. 2502.14864v1 null
2025-02-20 Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Yue Yang et.al. 2502.14846v1 null
2025-02-20 Dynamic Concepts Personalization from Single Videos Rameen Abdal et.al. 2502.14844v1 null
2025-02-20 LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models Shangqing Tu et.al. 2502.14834v1 null
2025-02-20 Improving the Diffusability of Autoencoders Ivan Skorokhodov et.al. 2502.14831v1 null
2025-02-20 Turning on the Light: Polymorphism-Induced Photoluminescence in Cysteine Crystals Debarshi Banerjee et.al. 2502.14826v1 null
2025-02-20 FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis Fadillah Maani et.al. 2502.14807v1 null
2025-02-20 A Survey on Text-Driven 360-Degree Panorama Generation Hai Wang et.al. 2502.14799v1 null
2025-02-20 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Michael Tschannen et.al. 2502.14786v1 null
2025-02-20 ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting Abhijit Mishra et.al. 2502.14780v1 null
2025-02-20 DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models Hongji Yang et.al. 2502.14779v1 null
2025-02-20 Harnessing PDF Data for Improving Japanese Large Multimodal Models Jeonghun Baek et.al. 2502.14778v1 null
2025-02-20 MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders Maya Varma et.al. 2502.14753v1 null
2025-02-20 AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers Wen-Fan Wang et.al. 2502.14747v1 null
2025-02-20 H$α$ Variability of AB Aur b with the Hubble Space Telescope: Probing the Nature of a Protoplanet Candidate with Accretion Light Echoes Brendan P. Bowler et.al. 2502.14736v1 null
2025-02-20 Model-based time super-sampling of turbulent flow field sequences Qihong Lorena Li-Hu et.al. 2502.14722v1 null
2025-02-20 TRUSWorthy: Toward Clinically Applicable Deep Learning for Confident Detection of Prostate Cancer in Micro-Ultrasound Mohamed Harmanani et.al. 2502.14707v1 null
2025-02-20 Constraints on optical and near-infrared variability in the localisation of the long-period radio transient GLEAM-X J1627-52 J. D. Lyman et.al. 2502.14688v1 null
2025-02-20 MAGO-SP: Detection and Correction of Water-Fat Swaps in Magnitude-Only VIBE MRI Robert Graf et.al. 2502.14659v1 null
2025-02-20 NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Zheyuan Zhang et.al. 2502.14638v1 null
2025-02-20 Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion Jiangyuan Liu et.al. 2502.14616v1 null
2025-02-20 A Millimeter-Wave Photometric Camera for Long-Range Imaging Through Optical Obscurants Using Kinetic Inductance Detectors Jack Sayers et.al. 2502.14607v1 null
2025-02-20 Emergent Goldstone flat bands and spontaneous symmetry breaking with type-B Goldstone modes Huan-Qiang Zhou et.al. 2502.14605v1 null
2025-02-20 Vision Foundation Models in Medical Image Analysis: Advances and Challenges Pengchen Liang et.al. 2502.14584v1 null
2025-02-20 Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining Wonhyeok Choi et.al. 2502.14573v1 null
2025-02-20 Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling Eric Egli et.al. 2502.14553v1 null
2025-02-20 Modeling Tidal Streams and Tidal Tails around Galaxies Using Deep Wendelstein Imaging Data Jan-Niklas Pippert et.al. 2502.14531v1 null
2025-02-20 Accelerated X-Ray Fluorescence Computed Tomography via Multi-Pencil-Beam Excitation Ryder M. Schmidt et.al. 2502.14524v1 null
2025-02-20 PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models Yu Meng et.al. 2502.14504v1 null
2025-02-20 LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera Weiyi Xiong et.al. 2502.14503v1 null