Skip to content

Vision Transformer

Vision Transformer

Publish Date Title Authors PDF Code
2024-11-29 T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Shukang Yin et.al. 2411.19951v2 link
2024-11-29 AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos Yuze He et.al. 2411.19950v1 null
2024-11-29 DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation Zhiqiang Shen et.al. 2411.19946v1 link
2024-11-29 Free-form Generation Enhances Challenging Clothed Human Modeling Hang Ye et.al. 2411.19942v1 null
2024-11-29 Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark Joseph Heyward et.al. 2411.19941v1 null
2024-11-29 VLSBench: Unveiling Visual Leakage in Multimodal Safety Xuhao Hu et.al. 2411.19939v1 null
2024-11-29 It's Quick to be Square: Fast Quadratisation for Quantum Toolchains Lukas Schmidbauer et.al. 2411.19934v1 null
2024-11-29 On Domain-Specific Post-Training for Multimodal Large Language Models Daixuan Cheng et.al. 2411.19930v1 null
2024-11-29 SIMS: Simulating Human-Scene Interactions with Real World Script Planning Wenjia Wang et.al. 2411.19921v1 null
2024-11-29 Quantifying the synthetic and real domain gap in aerial scene understanding Alina Marcu et.al. 2411.19913v1 null
2024-11-29 $C^{3}$-NeRF: Modeling Multiple Scenes via Conditional-cum-Continual Neural Radiance Fields Prajwal Singh et.al. 2411.19903v1 null
2024-11-29 GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting Zixuan Chen et.al. 2411.19895v2 link
2024-11-29 FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation Chang Won Lee et.al. 2411.19888v1 null
2024-11-29 SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection Philipp Wolters et.al. 2411.19860v1 null
2024-11-29 Towards Class-wise Robustness Analysis Tejaswini Medi et.al. 2411.19853v1 null
2024-11-29 A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications Liqiang Zhang Ye Tian Dongyan Wei et.al. 2411.19845v1 null
2024-11-29 Scaling Transformers for Low-Bitrate High-Quality Speech Coding Julian D Parker et.al. 2411.19842v1 link
2024-11-29 Parallel Stacked Aggregated Network for Voice Authentication in IoT-Enabled Smart Devices Awais Khan et.al. 2411.19841v1 null
2024-11-29 Feedback-driven object detection and iterative model improvement Sönke Tenckhoff et.al. 2411.19835v1 link
2024-11-29 SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens Chi Su et.al. 2411.19824v1 null
2024-11-29 Gaussian multi-target filtering with target dynamics driven by a stochastic differential equation Ángel F. García-Fernández et.al. 2411.19814v1 null
2024-11-29 The Rayleigh-Taylor instability in a binary quantum fluid Yanda Geng et.al. 2411.19807v1 null
2024-11-29 Linear methods for non-linear inverse problems Geerten Koers et.al. 2411.19797v1 null
2024-11-29 MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks Yiming Wu et.al. 2411.19786v1 null
2024-11-29 PerLA: Perceptive 3D Language Assistant Guofeng Mei et.al. 2411.19774v1 null
2024-11-29 LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos Tiantian Geng et.al. 2411.19772v1 null
2024-11-29 LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References Shuguo Jiang et.al. 2411.19758v1 null
2024-11-29 Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models Kaican Li et.al. 2411.19757v1 link
2024-11-29 DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering Yihao Wang et.al. 2411.19756v1 null
2024-11-29 Explicit error bounds of the SE and DE formulas for integrals with logarithmic and algebraic singularity Tomoaki Okayama et.al. 2411.19755v1 null