Skip to content

Vision Transformer

Vision Transformer

Publish Date Title Authors PDF Code
2025-01-16 Distilling Multi-modal Large Language Models for Autonomous Driving Deepti Hegde et.al. 2501.09757v1 null
2025-01-16 SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Sumit Chaturvedi et.al. 2501.09756v1 null
2025-01-16 Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Philippe Hansen-Estruch et.al. 2501.09755v1 null
2025-01-16 Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues Youngjoon Jang et.al. 2501.09754v1 null
2025-01-16 SRE-Conv: Symmetric Rotation Equivariant Convolution for Biomedical Image Classification Yuexi Du et.al. 2501.09753v1 link
2025-01-16 FAST: Efficient Action Tokenization for Vision-Language-Action Models Karl Pertsch et.al. 2501.09747v1 null
2025-01-16 Improvement of Data Analytics Techniques in Reflection High Energy Electron Diffraction to Enable Machine Learning Patrick T. Gemperline et.al. 2501.09743v1 link
2025-01-16 ComplexVAD: Detecting Interaction Anomalies in Video Furkan Mumcu et.al. 2501.09733v1 null
2025-01-16 Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Nanye Ma et.al. 2501.09732v1 null
2025-01-16 Generating particle physics Lagrangians with transformers Yong Sheng Koay et.al. 2501.09729v1 null
2025-01-16 A Simple Aerial Detection Baseline of Multimodal Language Models Qingyun Li et.al. 2501.09720v1 link
2025-01-16 FLOL: Fast Baselines for Real-World Low-Light Enhancement Juan C. Benito et.al. 2501.09718v1 null
2025-01-16 Practical Continual Forgetting for Pre-trained Vision Models Hongbo Zhao et.al. 2501.09705v1 link
2025-01-16 Infinity norm bounds for the inverse of Nekrasov matrices using scaling matrices Héctor Orera et.al. 2501.09704v1 null
2025-01-16 Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Zhihe Yang et.al. 2501.09695v1 null
2025-01-16 Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation Jiho Choi et.al. 2501.09688v1 null
2025-01-16 Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark Alexis Roger et.al. 2501.09672v1 null
2025-01-16 Unitary Expressions: A Necessary Abstraction for Extensible Quantum Programming Languages and Systems Ed Younis et.al. 2501.09667v1 null
2025-01-16 Approaching optimal microwave-acoustic transduction on lithium niobate using SQUID arrays A. Hugot et.al. 2501.09661v1 null
2025-01-16 A Survey of Research in Large Language Models for Electronic Design Automation Jingyu Pan et.al. 2501.09655v1 null
2025-01-16 NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes Nathaniel S. Keplinger et.al. 2501.09646v1 link
2025-01-16 Supersolid dipolar phases in planar geometry: effects of tilted polarization Daniel Lima et.al. 2501.09641v1 null
2025-01-16 Unified Face Matching and Physical-Digital Spoofing Attack Detection Arun Kunwar et.al. 2501.09635v1 null
2025-01-16 Optimal paths and dynamical symmetry breaking in the current fluctuations of driven diffusive media Pablo I. Hurtado et.al. 2501.09629v1 null
2025-01-16 WMamba: Wavelet-based Mamba for Face Forgery Detection Siran Peng et.al. 2501.09617v1 null
2025-01-16 Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning Donghuo Zeng et.al. 2501.09608v1 null
2025-01-16 From Scarcity to Capability: Empowering Fake News Detection in Low-Resource Languages with LLMs Hrithik Majumdar Shibu et.al. 2501.09604v1 link
2025-01-16 Mesh2SLAM in VR: A Fast Geometry-Based SLAM Framework for Rapid Prototyping in Virtual Reality Applications Carlos Augusto Pinheiro de Sousa et.al. 2501.09600v1 null
2025-01-16 Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures Pratyush Dhingra et.al. 2501.09588v1 null
2025-01-16 Sequential PatchCore: Anomaly Detection for Surface Inspection using Synthetic Impurities Runzhou Mao et.al. 2501.09579v1 null