2024-11-29 |
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs |
Shukang Yin et.al. |
2411.19951v2 |
link |
2024-11-29 |
AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos |
Yuze He et.al. |
2411.19950v1 |
null |
2024-11-29 |
DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation |
Zhiqiang Shen et.al. |
2411.19946v1 |
link |
2024-11-29 |
Free-form Generation Enhances Challenging Clothed Human Modeling |
Hang Ye et.al. |
2411.19942v1 |
null |
2024-11-29 |
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark |
Joseph Heyward et.al. |
2411.19941v1 |
null |
2024-11-29 |
VLSBench: Unveiling Visual Leakage in Multimodal Safety |
Xuhao Hu et.al. |
2411.19939v1 |
null |
2024-11-29 |
It's Quick to be Square: Fast Quadratisation for Quantum Toolchains |
Lukas Schmidbauer et.al. |
2411.19934v1 |
null |
2024-11-29 |
On Domain-Specific Post-Training for Multimodal Large Language Models |
Daixuan Cheng et.al. |
2411.19930v1 |
null |
2024-11-29 |
SIMS: Simulating Human-Scene Interactions with Real World Script Planning |
Wenjia Wang et.al. |
2411.19921v1 |
null |
2024-11-29 |
Quantifying the synthetic and real domain gap in aerial scene understanding |
Alina Marcu et.al. |
2411.19913v1 |
null |
2024-11-29 |
$C^{3}$-NeRF: Modeling Multiple Scenes via Conditional-cum-Continual Neural Radiance Fields |
Prajwal Singh et.al. |
2411.19903v1 |
null |
2024-11-29 |
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting |
Zixuan Chen et.al. |
2411.19895v2 |
link |
2024-11-29 |
FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation |
Chang Won Lee et.al. |
2411.19888v1 |
null |
2024-11-29 |
SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection |
Philipp Wolters et.al. |
2411.19860v1 |
null |
2024-11-29 |
Towards Class-wise Robustness Analysis |
Tejaswini Medi et.al. |
2411.19853v1 |
null |
2024-11-29 |
A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications |
Liqiang Zhang Ye Tian Dongyan Wei et.al. |
2411.19845v1 |
null |
2024-11-29 |
Scaling Transformers for Low-Bitrate High-Quality Speech Coding |
Julian D Parker et.al. |
2411.19842v1 |
link |
2024-11-29 |
Parallel Stacked Aggregated Network for Voice Authentication in IoT-Enabled Smart Devices |
Awais Khan et.al. |
2411.19841v1 |
null |
2024-11-29 |
Feedback-driven object detection and iterative model improvement |
Sönke Tenckhoff et.al. |
2411.19835v1 |
link |
2024-11-29 |
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens |
Chi Su et.al. |
2411.19824v1 |
null |
2024-11-29 |
Gaussian multi-target filtering with target dynamics driven by a stochastic differential equation |
Ángel F. García-Fernández et.al. |
2411.19814v1 |
null |
2024-11-29 |
The Rayleigh-Taylor instability in a binary quantum fluid |
Yanda Geng et.al. |
2411.19807v1 |
null |
2024-11-29 |
Linear methods for non-linear inverse problems |
Geerten Koers et.al. |
2411.19797v1 |
null |
2024-11-29 |
MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks |
Yiming Wu et.al. |
2411.19786v1 |
null |
2024-11-29 |
PerLA: Perceptive 3D Language Assistant |
Guofeng Mei et.al. |
2411.19774v1 |
null |
2024-11-29 |
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos |
Tiantian Geng et.al. |
2411.19772v1 |
null |
2024-11-29 |
LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References |
Shuguo Jiang et.al. |
2411.19758v1 |
null |
2024-11-29 |
Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models |
Kaican Li et.al. |
2411.19757v1 |
link |
2024-11-29 |
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering |
Yihao Wang et.al. |
2411.19756v1 |
null |
2024-11-29 |
Explicit error bounds of the SE and DE formulas for integrals with logarithmic and algebraic singularity |
Tomoaki Okayama et.al. |
2411.19755v1 |
null |