2025-02-20 |
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts |
Sara Ghaboura et.al. |
2502.14865v1 |
null |
2025-02-20 |
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework |
Yuming Yang et.al. |
2502.14864v1 |
null |
2025-02-20 |
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation |
Yue Yang et.al. |
2502.14846v1 |
null |
2025-02-20 |
Dynamic Concepts Personalization from Single Videos |
Rameen Abdal et.al. |
2502.14844v1 |
null |
2025-02-20 |
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models |
Shangqing Tu et.al. |
2502.14834v1 |
null |
2025-02-20 |
Improving the Diffusability of Autoencoders |
Ivan Skorokhodov et.al. |
2502.14831v1 |
null |
2025-02-20 |
Turning on the Light: Polymorphism-Induced Photoluminescence in Cysteine Crystals |
Debarshi Banerjee et.al. |
2502.14826v1 |
null |
2025-02-20 |
Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning |
Oren Yuval et.al. |
2502.14808v1 |
null |
2025-02-20 |
FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis |
Fadillah Maani et.al. |
2502.14807v1 |
null |
2025-02-20 |
A Survey on Text-Driven 360-Degree Panorama Generation |
Hai Wang et.al. |
2502.14799v1 |
null |
2025-02-20 |
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features |
Michael Tschannen et.al. |
2502.14786v1 |
null |
2025-02-20 |
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting |
Abhijit Mishra et.al. |
2502.14780v1 |
null |
2025-02-20 |
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models |
Hongji Yang et.al. |
2502.14779v1 |
null |
2025-02-20 |
Harnessing PDF Data for Improving Japanese Large Multimodal Models |
Jeonghun Baek et.al. |
2502.14778v1 |
null |
2025-02-20 |
Sparse Activations as Conformal Predictors |
Margarida M. Campos et.al. |
2502.14773v1 |
null |
2025-02-20 |
MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders |
Maya Varma et.al. |
2502.14753v1 |
null |
2025-02-20 |
AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers |
Wen-Fan Wang et.al. |
2502.14747v1 |
null |
2025-02-20 |
Robust Information Selection for Hypothesis Testing with Misclassification Penalties |
Jayanth Bhargav et.al. |
2502.14738v1 |
null |
2025-02-20 |
H$α$ Variability of AB Aur b with the Hubble Space Telescope: Probing the Nature of a Protoplanet Candidate with Accretion Light Echoes |
Brendan P. Bowler et.al. |
2502.14736v1 |
null |
2025-02-20 |
Model-based time super-sampling of turbulent flow field sequences |
Qihong Lorena Li-Hu et.al. |
2502.14722v1 |
null |
2025-02-20 |
Internal Incoherency Scores for Constraint-based Causal Discovery Algorithms |
Sofia Faltenbacher et.al. |
2502.14719v1 |
null |
2025-02-20 |
TRUSWorthy: Toward Clinically Applicable Deep Learning for Confident Detection of Prostate Cancer in Micro-Ultrasound |
Mohamed Harmanani et.al. |
2502.14707v1 |
null |
2025-02-20 |
Constraints on optical and near-infrared variability in the localisation of the long-period radio transient GLEAM-X J1627-52 |
J. D. Lyman et.al. |
2502.14688v1 |
null |
2025-02-20 |
Beyond the Surface: Uncovering Implicit Locations with LLMs for Personalized Local News |
Gali Katz et.al. |
2502.14660v1 |
null |
2025-02-20 |
MAGO-SP: Detection and Correction of Water-Fat Swaps in Magnitude-Only VIBE MRI |
Robert Graf et.al. |
2502.14659v1 |
null |
2025-02-20 |
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization |
Zheyuan Zhang et.al. |
2502.14638v1 |
null |
2025-02-20 |
Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion |
Jiangyuan Liu et.al. |
2502.14616v1 |
null |
2025-02-20 |
A Millimeter-Wave Photometric Camera for Long-Range Imaging Through Optical Obscurants Using Kinetic Inductance Detectors |
Jack Sayers et.al. |
2502.14607v1 |
null |
2025-02-20 |
Emergent Goldstone flat bands and spontaneous symmetry breaking with type-B Goldstone modes |
Huan-Qiang Zhou et.al. |
2502.14605v1 |
null |
2025-02-20 |
Noisy Test-Time Adaptation in Vision-Language Models |
Chentao Cao et.al. |
2502.14604v1 |
null |