ArXiv Computer Vision research for Thursday, June 13, 2024.
00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques
03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation
05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era
06:41: Auto-Vocabulary Segmentation for LiDAR Points
07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
10:23: Fine-Grained Domain Generalization with Feature Structuralization
12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution
14:13: ReMI: A Dataset for Reasoning with Multiple Images
15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection
17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition
18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval
20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification
24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024
25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns
26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals
27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction
29:28: Comparison Visual Instruction Tuning
30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV
33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models
36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning
37:30: Parameter-Efficient Active Learning for Foundational models
38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation
40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases
42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans
44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis
46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation
48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression
50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models