2025-02-20 |
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework |
Yuming Yang et.al. |
2502.14864v1 |
null |
2025-02-20 |
Derived invariants of gentle orders |
Wassilij Gnedin et.al. |
2502.14852v1 |
null |
2025-02-20 |
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models |
Shangqing Tu et.al. |
2502.14834v1 |
null |
2025-02-20 |
Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs |
Danni Liu et.al. |
2502.14830v1 |
null |
2025-02-20 |
Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison |
Aiswarya Baby et.al. |
2502.14827v1 |
null |
2025-02-20 |
eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables |
Luis Antonio GutiƩrrez Guanilo et.al. |
2502.14820v1 |
null |
2025-02-20 |
FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis |
Fadillah Maani et.al. |
2502.14807v1 |
null |
2025-02-20 |
A Survey on Text-Driven 360-Degree Panorama Generation |
Hai Wang et.al. |
2502.14799v1 |
null |
2025-02-20 |
Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration |
Pengxiang Ding et.al. |
2502.14795v1 |
null |
2025-02-20 |
Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing |
Yoel Levy et.al. |
2502.14789v1 |
null |
2025-02-20 |
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features |
Michael Tschannen et.al. |
2502.14786v1 |
null |
2025-02-20 |
Tracking and Assigning Jobs to a Markov Machine |
Subhankar Banerjee et.al. |
2502.14783v1 |
null |
2025-02-20 |
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting |
Abhijit Mishra et.al. |
2502.14780v1 |
null |
2025-02-20 |
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models |
Hongji Yang et.al. |
2502.14779v1 |
null |
2025-02-20 |
Multi-Objective Causal Bayesian Optimization |
Shriya Bhatija et.al. |
2502.14755v1 |
null |
2025-02-20 |
AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers |
Wen-Fan Wang et.al. |
2502.14747v1 |
null |
2025-02-20 |
YOLOv12: A Breakdown of the Key Architectural Features |
Mujadded Al Rabbani Alif et.al. |
2502.14740v1 |
null |
2025-02-20 |
Robust Information Selection for Hypothesis Testing with Misclassification Penalties |
Jayanth Bhargav et.al. |
2502.14738v1 |
null |
2025-02-20 |
FLIGHT: Facility Location Integrating Generalized, Holistic Theory of Welfare |
Avyukta Manjunatha Vummintala et.al. |
2502.14732v1 |
null |
2025-02-20 |
Beyond Performance Scores: Directed Functional Connectivity as a Brain-Based Biomarker for Motor Skill Learning and Retention |
Anil Kamat et.al. |
2502.14731v1 |
null |
2025-02-20 |
Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes |
Lukas Rauch et.al. |
2502.14721v1 |
null |
2025-02-20 |
Instrumented mouthguards in elite sports: Validity and head acceleration event (HAE) incidence in NCAA American Football |
Mario Rotundo et.al. |
2502.14710v1 |
null |
2025-02-20 |
Machine learning assisted tracking of magnetic objects using quantum diamond magnetometry |
Fernando Meneses et.al. |
2502.14683v1 |
null |
2025-02-20 |
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO |
Alan Dao et.al. |
2502.14669v1 |
null |
2025-02-20 |
InstructAgent: Building User Controllable Recommender via LLM Agent |
Wujiang Xu et.al. |
2502.14662v1 |
null |
2025-02-20 |
Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling |
Daniil Medyakov et.al. |
2502.14648v1 |
null |
2025-02-20 |
Length-Controlled Margin-Based Preference Optimization without Reference Model |
Gengxu Li et.al. |
2502.14643v1 |
null |
2025-02-20 |
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization |
Zheyuan Zhang et.al. |
2502.14638v1 |
null |
2025-02-20 |
Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion |
Jiangyuan Liu et.al. |
2502.14616v1 |
null |
2025-02-20 |
Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing |
Raihana Ferdous et.al. |
2502.14606v1 |
null |