liangdong-xjtu / ArXivQA-LD

WIP - Automated Question Answering for ArXiv Papers with Large Language Models

Home Page:https://huggingface.co/datasets/taesiri/arxiv_qa

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

List of Papers

2023

October 2023

  • Improved Baselines with Visual Instruction Tuning - [2310.03744] [QA].
  • Aligning Text-to-Image Diffusion Models with Reward Backpropagation - [2310.03739] [QA].
  • Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency - [2310.03734] [QA].
  • MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning - [2310.03731] [QA].
  • HeaP: Hierarchical Policies for Web Actions using LLMs - [2310.03720] [QA].
  • A Long Way to Go: Investigating Length Correlations in RLHF - [2310.03716] [QA].
  • DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines - [2310.03714] [QA].
  • Agent Instructs Large Language Models to be General Zero-Shot Reasoners - [2310.03710] [QA].
  • Drag View: Generalizable Novel View Synthesis with Unposed Imagery - [2310.03704] [QA].
  • Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion - [2310.03502] [QA].
  • FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation - [2310.03214] [QA].
  • Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning - [2310.03094] [QA].
  • Retrieval meets Long Context Large Language Models - [2310.03025] [QA].
  • How FaR Are Large Language Models From Agents with Theory-of-Mind? - [2310.03051] [QA].
  • EcoAssistant: Using LLM Assistant More Affordably and Accurately - [2310.03046] [QA].
  • MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts - [2310.02255] [QA].
  • MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens - [2310.02239] [QA].
  • Think before you speak: Training Language Models With Pause Tokens - [2310.02226] [QA].
  • What do we learn from a large-scale study of pre-trained visual representations in sim and real environments? - [2310.02219] [QA].
  • Language Models Represent Space and Time - [2310.02207] [QA].
  • Large Language Models Cannot Self-Correct Reasoning Yet - [2310.01798] [QA].
  • Can large language models provide useful feedback on research papers? A large-scale empirical analysis - [2310.01783] [QA].
  • ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms - [2310.01755] [QA].
  • Large Language Models as Analogical Reasoners - [2310.01714] [QA].
  • ImagenHub: Standardizing the evaluation of conditional image generation models - [2310.01596] [QA].
  • SmartPlay : A Benchmark for LLMs as Intelligent Agents - [2310.01557] [QA].
  • Neutrinos from muon-rich ultra high energy electromagnetic cascades: The MUNHECA code - [2310.01510] [QA].
  • DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model - [2310.01412] [QA].
  • Conditional Diffusion Distillation - [2310.01407] [QA].
  • Representation Engineering: A Top-Down Approach to AI Transparency - [2310.01405] [QA].
  • RA-DIT: Retrieval-Augmented Dual Instruction Tuning - [2310.01352] [QA].
  • Label Supervised LLaMA Finetuning - [2310.01208] [QA].
  • Enable Language Models to Implicitly Learn Self-Improvement From Data - [2310.00898] [QA].
  • (Dynamic) Prompting might be all you need to repair Compressed LLMs - [2310.00867] [QA].
  • Analyzing and Mitigating Object Hallucination in Large Vision-Language Models - [2310.00754] [QA].
  • RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models - [2310.00746] [QA].
  • FELM: Benchmarking Factuality Evaluation of Large Language Models - [2310.00741] [QA].
  • UniAudio: An Audio Foundation Model Toward Universal Audio Generation - [2310.00704] [QA].

September 2023

  • PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis - [2310.00426] [QA].
  • AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ - [2310.00367] [QA].
  • Efficient Streaming Language Models with Attention Sinks - [2309.17453] [QA].
  • The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) - [2309.17421] [QA].
  • Directly Fine-Tuning Diffusion Models on Differentiable Rewards - [2309.17400] [QA].
  • GAIA-1: A Generative World Model for Autonomous Driving - [2309.17080] [QA].
  • Demystifying CLIP Data - [2309.16671] [QA].
  • RealFill: Reference-Driven Generation for Authentic Image Completion - [2309.16668] [QA].
  • DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation - [2309.16653] [QA].
  • ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning - [2309.16650] [QA].
  • Deep Geometrized Cartoon Line Inbetweening - [2309.16643] [QA].
  • Qwen Technical Report - [2309.16609] [QA].
  • Vision Transformers Need Registers - [2309.16588] [QA].
  • Text-to-3D using Gaussian Splatting - [2309.16585] [QA].
  • GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond - [2309.16583] [QA].
  • MotionLM: Multi-Agent Motion Forecasting as Language Modeling - [2309.16534] [QA].
  • CCEdit: Creative and Controllable Video Editing via Diffusion Models - [2309.16496] [QA].
  • Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation - [2309.16429] [QA].
  • AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models - [2309.16414] [QA].
  • Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning - [2309.16351] [QA].
  • Language models in molecular discovery - [2309.16235] [QA].
  • AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model - [2309.16058] [QA].
  • Effective Long-Context Scaling of Foundation Models - [2309.16039] [QA].
  • Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation - [2309.15818] [QA].
  • Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack - [2309.15807] [QA].
  • Aperture Diffraction for Compact Snapshot Spectral Imaging - [2309.16372] [QA].
  • Borges and AI - [2310.01425] [QA].
  • Jointly Training Large Autoregressive Multimodal Models - [2309.15564] [QA].
  • Finite Scalar Quantization: VQ-VAE Made Simple - [2309.15505] [QA].
  • Graph Neural Prompting with Large Language Models - [2309.15427] [QA].
  • NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions - [2309.15426] [QA].
  • DECO: Dense Estimation of 3D Human-Scene Contact In The Wild - [2309.15273] [QA].
  • VPA: Fully Test-Time Visual Prompt Adaptation - [2309.15251] [QA].
  • Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition - [2309.15223] [QA].
  • LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models - [2309.15103] [QA].
  • Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models - [2309.15098] [QA].
  • VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning - [2309.15091] [QA].
  • RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation - [2309.15082] [QA].
  • Large Language Model Alignment: A Survey - [2309.15025] [QA].
  • Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation - [2309.14786] [QA].
  • QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models - [2309.14717] [QA].
  • NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space - [2309.14616] [QA].
  • Efficient Post-training Quantization with FP8 Formats - [2309.14592] [QA].
  • CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss - [2309.14580] [QA].
  • Aligning Large Multimodal Models with Factually Augmented RLHF - [2309.14525] [QA].
  • DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models - [2309.14509] [QA].
  • Extreme Parkour with Legged Robots - [2309.14341] [QA].
  • Electronic properties, correlated topology and Green's function zeros - [2309.14340] [QA].
  • DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention - [2309.14327] [QA].
  • Physics of Language Models: Part 3.2, Knowledge Manipulation - [2309.14402] [QA].
  • Small-scale proxies for large-scale Transformer training instabilities - [2309.14322] [QA].
  • Tiled Multiplane Images for Practical 3D Photography - [2309.14291] [QA].
  • Only 5% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation - [2309.14174] [QA].
  • May I Ask a Follow-up Question? Understanding the Benefits of Conversations in Neural Network Explainability - [2309.13965] [QA].
  • VidChapters-7M: Video Chapters at Scale - [2309.13952] [QA].
  • Impact of Human-AI Interaction on User Trust and Reliance in AI-Assisted Qualitative Coding - [2309.13858] [QA].
  • Evaluating Cognitive Maps and Planning in Large Language Models with CogEval - [2309.15129] [QA].
  • Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve - [2309.13638] [QA].
  • LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning - [2309.13556] [QA].
  • MediViSTA-SAM: Zero-shot Medical Video Analysis with Spatio-temporal SAM Adaptation - [2309.13539] [QA].
  • Attention Is All You Need For Blind Room Volume Estimation - [2309.13504] [QA].
  • Learning Invariant Representations with a Nonparametric Nadaraya-Watson Head - [2309.13377] [QA].
  • MLPST: MLP is All You Need for Spatio-Temporal Prediction - [2309.13363] [QA].
  • Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test - [2309.13356] [QA].
  • Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic - [2309.13339] [QA].
  • Calibrating LLM-Based Evaluator - [2309.13308] [QA].
  • Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks - [2309.13256] [QA].
  • Spatial-frequency channels, shape bias, and adversarial robustness - [2309.13190] [QA].
  • E(2)-Equivariant Graph Planning for Navigation - [2309.13043] [QA].
  • MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation - [2309.13042] [QA].
  • Robotic Offline RL from Internet Videos via Value-Function Pre-Training - [2309.13041] [QA].
  • NeRRF: 3D Reconstruction and View Synthesis for Transparent and Specular Objects with Neural Refractive-Reflective Fields - [2309.13039] [QA].
  • Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception? - [2309.13038] [QA].
  • GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators - [2309.13037] [QA].
  • PyPose v0.6: The Imperative Programming Interface for Robotics - [2309.13035] [QA].
  • Memory-augmented conformer for improved end-to-end long-form ASR - [2309.13029] [QA].
  • Graph Neural Network for Stress Predictions in Stiffened Panels Under Uniform Loading - [2309.13022] [QA].
  • A Hybrid Deep Learning-based Approach for Optimal Genotype by Environment Selection - [2309.13021] [QA].
  • Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model - [2309.13018] [QA].
  • Understanding Deep Gradient Leakage via Inversion Influence Functions - [2309.13016] [QA].
  • Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design - [2309.13015] [QA].
  • Performance Analysis of UNet and Variants for Medical Image Segmentation - [2309.13013] [QA].
  • ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs - [2309.13007] [QA].
  • Deep3DSketch+: Rapid 3D Modeling from Single Free-hand Sketches - [2309.13006] [QA].
  • Pursuing Counterfactual Fairness via Sequential Autoencoder Across Domains - [2309.13005] [QA].
  • Expressive variational quantum circuits provide inherent privacy in federated learning - [2309.13002] [QA].
  • Audience-specific Explanations for Machine Translation - [2309.12998] [QA].
  • Point Cloud Network: An Order of Magnitude Improvement in Linear Layer Parameter Count - [2309.12996] [QA].
  • Deep learning probability flows and entropy production rates in active matter - [2309.12991] [QA].
  • License Plate Recognition Based On Multi-Angle View Model - [2309.12972] [QA].
  • Higher-order Graph Convolutional Network with Flower-Petals Laplacians on Simplicial Complexes - [2309.12971] [QA].
  • PI-RADS v2 Compliant Automated Segmentation of Prostate Zones Using co-training Motivated Multi-task Dual-Path CNN - [2309.12970] [QA].
  • Detect Every Thing with Few Examples - [2309.12969] [QA].
  • Nested Event Extraction upon Pivot Element Recogniton - [2309.12960] [QA].
  • On Data Fabrication in Collaborative Vehicular Perception: Attacks and Countermeasures - [2309.12955] [QA].
  • Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation - [2309.12943] [QA].
  • Trusta: Reasoning about Assurance Cases with Formal Methods and Large Language Models - [2309.12941] [QA].
  • Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models - [2309.12940] [QA].
  • Frustrated with Code Quality Issues? LLMs can Help! - [2309.12938] [QA].
  • Evolving Spiking Neural Networks to Mimic PID Control for Autonomous Blimps - [2309.12937] [QA].
  • TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts - [2309.12934] [QA].
  • CodePlan: Repository-level Coding using LLMs and Planning - [2309.12499] [QA].
  • DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion - [2309.12424] [QA].
  • LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent - [2309.12311] [QA].
  • LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models - [2309.12307] [QA].
  • PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation - [2309.12303] [QA].
  • The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" - [2309.12288] [QA].
  • MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models - [2309.12284] [QA].
  • Boolformer: Symbolic Regression of Logic Functions with Transformers - [2309.12207] [QA].
  • LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset - [2309.11998] [QA].
  • MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion - [2309.11847] [QA].
  • A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models - [2309.11674] [QA].
  • BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model - [2309.11568] [QA].
  • A Large-scale Dataset for Audio-Language Representation Learning - [2309.11500] [QA].
  • DreamLLM: Synergistic Multimodal Comprehension and Creation - [2309.11499] [QA].
  • FreeU: Free Lunch in Diffusion U-Net - [2309.11497] [QA].
  • Chain-of-Verification Reduces Hallucination in Large Language Models - [2309.11495] [QA].
  • SCREWS: A Modular Framework for Reasoning with Revisions - [2309.13075] [QA].
  • Kosmos-2.5: A Multimodal Literate Model - [2309.11419] [QA].
  • OpenChat: Advancing Open-source Language Models with Mixed-Quality Data - [2309.11235] [QA].
  • The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute - [2309.11197] [QA].
  • AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration - [2309.11170] [QA].
  • Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation - [2309.11160] [QA].
  • More complex encoder is not all you need - [2309.11139] [QA].
  • Contrastive Pseudo Learning for Open-World DeepFake Attribution - [2309.11132] [QA].
  • Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation - [2309.11081] [QA].
  • Weak Supervision for Label Efficient Visual Bug Detection - [2309.11077] [QA].
  • The Topology and Geometry of Neural Representations - [2309.11028] [QA].
  • Controllable Dynamic Appearance for Neural 3D Portraits - [2309.11009] [QA].
  • RMT: Retentive Networks Meet Vision Transformers - [2309.11523] [QA].
  • LMDX: Language Model-based Document Information Extraction and Localization - [2309.10952] [QA].
  • End-to-End Speech Recognition Contextualization with Large Language Models - [2309.10917] [QA].
  • SlimPajama-DC: Understanding Data Combinations for LLM Training - [2309.10818] [QA].
  • Sound Source Localization is All about Cross-Modal Alignment - [2309.10724] [QA].
  • OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch - [2309.10706] [QA].
  • Language Modeling Is Compression - [2309.10668] [QA].
  • NDDepth: Normal-Distance Assisted Monocular Depth Estimation - [2309.10592] [QA].
  • FoleyGen: Visually-Guided Audio Generation - [2309.10537] [QA].
  • AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration - [2309.10438] [QA].
  • PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training - [2309.10400] [QA].
  • Baichuan 2: Open Large-scale Language Models - [2309.10305] [QA].
  • 360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting - [2309.10279] [QA].
  • Stabilizing RLHF through Advantage Model and Selective Rehearsal - [2309.10202] [QA].
  • Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions - [2309.10150] [QA].
  • Unified Coarse-to-Fine Alignment for Video-Text Retrieval - [2309.10091] [QA].
  • Multimodal Foundation Models: From Specialists to General-Purpose Assistants - [2309.10020] [QA].
  • MindAgent: Emergent Gaming Interaction - [2309.09971] [QA].
  • Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees - [2309.09968] [QA].
  • An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models - [2309.09958] [QA].
  • Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering - [2309.09724] [QA].
  • CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation - [2309.09709] [QA].
  • Adapting Large Language Models via Reading Comprehension - [2309.09530] [QA].
  • LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models - [2309.09506] [QA].
  • Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation - [2309.09501] [QA].
  • CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages - [2309.09400] [QA].
  • Augmenting text for spoken language understanding with Large Language Models - [2309.09390] [QA].
  • Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles - [2309.09369] [QA].
  • OWL: A Large Language Model for IT Operations - [2309.09298] [QA].
  • LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation - [2309.09294] [QA].
  • Contrastive Decoding Improves Reasoning in Large Language Models - [2309.09117] [QA].
  • Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) - [2309.08968] [QA].
  • Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? - [2309.08963] [QA].
  • Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca - [2309.08958] [QA].
  • PDFTriage: Question Answering over Long, Structured Documents - [2309.08872] [QA].
  • S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs - [2309.08827] [QA].
  • Stack-and-Delay: a new codebook pattern for music generation - [2309.08804] [QA].
  • Enhance audio generation controllability through representation similarity regularization - [2309.08773] [QA].
  • BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus - [2309.08690] [QA].
  • Sparse Autoencoders Find Highly Interpretable Features in Language Models - [2309.08600] [QA].
  • Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes - [2309.08588] [QA].
  • Compositional Foundation Models for Hierarchical Planning - [2309.08587] [QA].
  • Replacing softmax with ReLU in Vision Transformers - [2309.08586] [QA].
  • Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers - [2309.08532] [QA].
  • Scaling Laws for Sparsely-Connected Foundation Models - [2309.08520] [QA].
  • Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata - [2309.08491] [QA].
  • Deformable Neural Radiance Fields using RGB and Event Cameras - [2309.08416] [QA].
  • Cure the headache of Transformers via Collinear Constrained Attention - [2309.08646] [QA].
  • Investigating Answerability of LLMs for Long-Form Question Answering - [2309.08210] [QA].
  • LASER: LLM Agent with State-Space Exploration for Web Navigation - [2309.08172] [QA].
  • Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding - [2309.08168] [QA].
  • RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue - [2309.08156] [QA].
  • Retrieval-Augmented Text-to-Audio Generation - [2309.08051] [QA].
  • Leveraging Contextual Information for Effective Entity Salience Detection - [2309.07990] [QA].
  • Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models - [2309.07986] [QA].
  • A Data Source for Reasoning Embodied Agents - [2309.07974] [QA].
  • Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping - [2309.07970] [QA].
  • ALWOD: Active Learning for Weakly-Supervised Object Detection - [2309.07914] [QA].
  • Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning - [2309.07911] [QA].
  • TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting - [2309.07910] [QA].
  • Generative Image Dynamics - [2309.07906] [QA].
  • Ambiguity-Aware In-Context Learning with Large Language Models - [2309.07900] [QA].
  • Agents: An Open-source Framework for Autonomous Language Agents - [2309.07870] [QA].
  • The Rise and Potential of Large Language Model Based Agents: A Survey - [2309.07864] [QA].
  • TextBind: Multi-turn Interleaved Multimodal Instruction-following - [2309.08637] [QA].
  • OmnimatteRF: Robust Omnimatte with 3D Background Modeling - [2309.07749] [QA].
  • Efficiently Robustify Pre-trained Models - [2309.07499] [QA].
  • EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization - [2309.07471] [QA].
  • Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? - [2309.07462] [QA].
  • Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts - [2309.07430] [QA].
  • Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance - [2309.07403] [QA].
  • AudioSR: Versatile Audio Super-resolution at Scale - [2309.07314] [QA].
  • Pretraining on the Test Set Is All You Need - [2309.08632] [QA].
  • All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks - [2309.07250] [QA].
  • Text-Guided Generation and Editing of Compositional 3D Avatars - [2309.07125] [QA].
  • RAIN: Your Language Models Can Align Themselves without Finetuning - [2309.07124] [QA].
  • Tree-Structured Shading Decomposition - [2309.07122] [QA].
  • SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection - [2309.07084] [QA].
  • Efficient Reinforcement Learning for Jumping Monopods - [2309.07038] [QA].
  • DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models - [2309.06933] [QA].
  • MagiCapture: High-Resolution Multi-Concept Portrait Customization - [2309.06895] [QA].
  • Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? - [2309.06891] [QA].
  • Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly - [2309.06810] [QA].
  • Dynamic NeRFs for Soccer Scenes - [2309.06802] [QA].
  • Cognitive Mirage: A Review of Hallucinations in Large Language Models - [2309.06794] [QA].
  • MPI-Flow: Learning Realistic Optical Flow with Multiplane Images - [2309.06714] [QA].
  • VLSlice: Interactive Vision-and-Language Slice Discovery - [2309.06703] [QA].
  • Generalizable Neural Fields as Partially Observed Neural Processes - [2309.06660] [QA].
  • Statistical Rejection Sampling Improves Preference Optimization - [2309.06657] [QA].
  • A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale - [2309.06497] [QA].
  • Learning Disentangled Avatars with Hybrid 3D Representations - [2309.06441] [QA].
  • LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning - [2309.06440] [QA].
  • InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation - [2309.06380] [QA].
  • Recovering from Privacy-Preserving Masking with Large Language Models - [2309.08628] [QA].
  • Modality Unifying Network for Visible-Infrared Person Re-Identification - [2309.06262] [QA].
  • Efficient Memory Management for Large Language Model Serving with PagedAttention - [2309.06180] [QA].
  • AstroLLaMA: Towards Specialized Foundation Models in Astronomy - [2309.06126] [QA].
  • Uncovering mesa-optimization algorithms in Transformers - [2309.05858] [QA].
  • Large Language Models for Compiler Optimization - [2309.07062] [QA].
  • SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors - [2309.05810] [QA].
  • PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models - [2309.05793] [QA].
  • Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips - [2309.05663] [QA].
  • Large Language Model for Science: A Study on P vs. NP - [2309.05689] [QA].
  • UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase - [2309.05573] [QA].
  • ITI-GEN: Inclusive Text-to-Image Generation - [2309.05569] [QA].
  • NExT-GPT: Any-to-Any Multimodal LLM - [2309.05519] [QA].
  • Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs - [2309.05516] [QA].
  • Textbooks Are All You Need II: phi-1.5 technical report - [2309.05463] [QA].
  • Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning - [2309.05444] [QA].
  • Class-Incremental Grouping Network for Continual Audio-Visual Learning - [2309.05281] [QA].
  • Multi3DRefer: Grounding Text Description to Multiple 3D Objects - [2309.05251] [QA].
  • Does Writing with Language Models Reduce Content Diversity? - [2309.05196] [QA].
  • Towards Viewpoint Robustness in Bird's Eye View Segmentation - [2309.05192] [QA].
  • Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color - [2309.05148] [QA].
  • 3D Implicit Transporter for Temporally Consistent Keypoint Discovery - [2309.05098] [QA].
  • Multi-view Self-supervised Disentanglement for General Image Denoising - [2309.05049] [QA].
  • Mitigating Word Bias in Zero-shot Prompt-based Classifiers - [2309.04992] [QA].
  • Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation - [2309.04946] [QA].
  • Effective Real Image Editing with Accelerated Iterative Diffusion Inversion - [2309.04907] [QA].
  • Leveraging Large Language Models for Exploiting ASR Uncertainty - [2309.04842] [QA].
  • Neurons in Large Language Models: Dead, N-gram, Positional - [2309.04827] [QA].
  • Towards Real-World Burst Image Super-Resolution: Benchmark and Method - [2309.04803] [QA].
  • VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis - [2309.04800] [QA].
  • Towards Robust Model Watermark via Reducing Parametric Vulnerability - [2309.04777] [QA].
  • SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning - [2309.04766] [QA].
  • When to Learn What: Model-Adaptive Data Augmentation Curriculum - [2309.04747] [QA].
  • FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning - [2309.04663] [QA].
  • MADLAD-400: A Multilingual And Document-Level Large Audited Dataset - [2309.04662] [QA].
  • Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf - [2309.04658] [QA].
  • Dynamic Mesh-Aware Radiance Fields - [2309.04581] [QA].
  • When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale - [2309.04564] [QA].
  • Examining Autoexposure for Challenging Scenes - [2309.04542] [QA].
  • Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving - [2309.04422] [QA].
  • DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields - [2309.04410] [QA].
  • Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts - [2309.04354] [QA].
  • The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion - [2309.04509] [QA].
  • From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting - [2309.04269] [QA].
  • Towards Practical Capture of High-Fidelity Relightable Avatars - [2309.04247] [QA].
  • Unsupervised Object Localization with Representer Point Selection - [2309.04172] [QA].
  • NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus - [2309.04146] [QA].
  • Evaluation and Mitigation of Agnosia in Multimodal Large Language Models - [2309.04041] [QA].
  • CDFSL-V: Cross-Domain Few-Shot Learning for Videos - [2309.03989] [QA].
  • LanSER: Language-Model Supported Speech Emotion Recognition - [2309.03978] [QA].
  • ImageBind-LLM: Multi-modality Instruction Tuning - [2309.03905] [QA].
  • Tracking Anything with Decoupled Video Segmentation - [2309.03903] [QA].
  • Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction - [2309.03900] [QA].
  • The Making and Breaking of Camouflage - [2309.03899] [QA].
  • ProPainter: Improving Propagation and Transformer for Video Inpainting - [2309.03897] [QA].
  • InstructDiffusion: A Generalist Modeling Interface for Vision Tasks - [2309.03895] [QA].
  • A Function Interpretation Benchmark for Evaluating Interpretability Methods - [2309.03886] [QA].
  • DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models - [2309.03883] [QA].
  • On Large Language Models' Selection Bias in Multi-Choice Questions - [2309.03882] [QA].
  • FLM-101B: An Open LLM and How to Train It with $100K Budget - [2309.03852] [QA].
  • Panoramas from Photons - [2309.03811] [QA].
  • SimNP: Learning Self-Similarity Priors Between Neural Points - [2309.03809] [QA].
  • Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption - [2309.03729] [QA].
  • Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory - [2309.03696] [QA].
  • Large-Scale Automatic Audiobook Creation - [2309.03926] [QA].
  • Evaluating ChatGPT as a Recommender System: A Rigorous Approach - [2309.03613] [QA].
  • Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning - [2309.03598] [QA].
  • Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model - [2309.03550] [QA].
  • Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation - [2309.03549] [QA].
  • Temporal Collection and Distribution for Referring Video Object Segmentation - [2309.03473] [QA].
  • SyncDreamer: Generating Multiview-consistent Images from a Single-view Image - [2309.03453] [QA].
  • Large Language Models as Optimizers - [2309.03409] [QA].
  • Distribution-Aware Prompt Tuning for Vision-Language Models - [2309.03406] [QA].
  • Robotic Table Tennis: A Case Study into a High Speed Learning System - [2309.03315] [QA].
  • Matcha-TTS: A fast TTS architecture with conditional flow matching - [2309.03199] [QA].
  • Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields - [2309.03185] [QA].
  • SLiMe: Segment Like Me - [2309.03179] [QA].
  • ResFields: Residual Neural Fields for Spatiotemporal Signals - [2309.03160] [QA].
  • MyoDex: A Generalizable Prior for Dexterous Manipulation - [2309.03130] [QA].
  • Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction - [2309.02965] [QA].
  • GPT Can Solve Mathematical Problems Without a Calculator - [2309.03241] [QA].
  • Zero-Resource Hallucination Prevention for Large Language Models - [2309.02654] [QA].
  • Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning - [2309.02591] [QA].
  • Physically Grounded Vision-Language Models for Robotic Manipulation - [2309.02561] [QA].
  • A skeletonization algorithm for gradient-based optimization - [2309.02527] [QA].
  • GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction - [2309.02436] [QA].
  • Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach - [2309.02429] [QA].
  • Cognitive Architectures for Language Agents - [2309.02427] [QA].
  • EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding - [2309.02423] [QA].
  • Doppelgangers: Learning to Disambiguate Images of Similar Structures - [2309.02420] [QA].
  • Generating Realistic Images from In-the-wild Sounds - [2309.02405] [QA].
  • Prototype-based Dataset Comparison - [2309.02401] [QA].
  • Explaining grokking through circuit efficiency - [2309.02390] [QA].
  • CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning - [2309.02301] [QA].
  • Making Large Language Models Better Reasoners with Alignment - [2309.02144] [QA].
  • Multi-label affordance mapping from egocentric vision - [2309.02120] [QA].
  • Iterative Superquadric Recomposition of 3D Objects from Multiple Views - [2309.02102] [QA].
  • Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples - [2309.02041] [QA].
  • Data-Juicer: A One-Stop Data Processing System for Large Language Models - [2309.02033] [QA].
  • RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image - [2309.02020] [QA].
  • NICE: CVPR 2023 Challenge on Zero-shot Image Captioning - [2309.01961] [QA].
  • Empowering Low-Light Image Enhancer through Customized Learnable Priors - [2309.01958] [QA].
  • Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations - [2309.01858] [QA].
  • One Wide Feedforward is All You Need - [2309.01826] [QA].
  • Are Emergent Abilities in Large Language Models just In-Context Learning? - [2309.01809] [QA].
  • An Empirical Analysis for Zero-Shot Multi-Label Classification on COVID-19 CT Scans and Uncurated Reports - [2309.01740] [QA].
  • Mask-Attention-Free Transformer for 3D Instance Segmentation - [2309.01692] [QA].
  • AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion - [2309.01624] [QA].
  • Raw Data Is All You Need: Virtual Axle Detector with Enhanced Receptive Field - [2309.01574] [QA].
  • A Blackbox Model Is All You Need to Breach Privacy: Smart Grid Forecasting Models as a Use Case - [2309.01523] [QA].
  • Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification - [2309.01420] [QA].
  • Memory augment is All You Need for image restoration - [2309.01377] [QA].
  • EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity - [2309.01296] [QA].
  • SOAR: Scene-debiasing Open-set Action Recognition - [2309.01265] [QA].
  • Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning - [2309.01246] [QA].
  • LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models - [2309.01155] [QA].
  • EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment - [2309.01151] [QA].
  • Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration - [2309.01131] [QA].
  • CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection - [2309.01093] [QA].
  • Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning - [2309.01083] [QA].
  • ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models - [2309.00986] [QA].
  • eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models - [2309.00964] [QA].
  • Two-in-One Depth: Bridging the Gap Between Monocular and Binocular Self-supervised Depth Estimation - [2309.00933] [QA].
  • Domain Generalization via Balancing Training Difficulty and Model Capability - [2309.00844] [QA].
  • Few shot font generation via transferring similarity guided global style and quantization local style - [2309.00827] [QA].
  • Instability of the solitary waves for the Generalized Benjamin-Bona-Mahony Equation - [2309.0791] [QA].
  • Contrastive Feature Masking Open-Vocabulary Vision Transformer - [2309.00775] [QA].
  • Learning Shared Safety Constraints from Multi-task Demonstrations - [2309.00711] [QA].
  • Searching for a Leptophilic Z' and a 3-3-1 symmetry at CLIC - [2309.0681] [QA].
  • Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following - [2309.00615] [QA].
  • CityDreamer: Compositional Generative Model of Unbounded 3D Cities - [2309.00610] [QA].
  • Rieger, Schwabe, Suess-de Vries: The Sunny Beats of Resonance - [2309.0666] [QA].
  • VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation - [2309.00398] [QA].
  • FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning - [2309.00363] [QA].
  • Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior - [2309.00359] [QA].
  • RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback - [2309.00267] [QA].
  • A Massively Parallel Dynamic Programming for Approximate Rectangle Escape Problem - [2309.0242] [QA].
  • Object-Centric Multiple Object Tracking - [2309.00233] [QA].
  • Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation - [2309.00216] [QA].
  • Pseudo-magnetic fields in square lattices - [2309.0212] [QA].
  • Empirical Modeling of Variance in Medium Frequency R-Mode Time-of-Arrival Measurements - [2309.0202] [QA].

August 2023

  • Block occurrences in the binary expansion - [2309.0142] [QA].
  • YaRN: Efficient Context Window Extension of Large Language Models - [2309.00071] [QA].
  • SoDaCam: Software-defined Cameras via Single-Photon Imaging - [2309.00066] [QA].
  • FACET: Fairness in Computer Vision Evaluation Benchmark - [2309.00035] [QA].
  • PointLLM: Empowering Large Language Models to Understand Point Clouds - [2308.16911] [QA].
  • StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation - [2308.16909] [QA].
  • InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion - [2308.16905] [QA].
  • Transformers as Support Vector Machines - [2308.16898] [QA].
  • EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild - [2308.16894] [QA].
  • GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields - [2308.16891] [QA].
  • TouchStone: Evaluating Vision-Language Models by Language Models - [2308.16890] [QA].
  • The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants - [2308.16884] [QA].
  • SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation - [2308.16876] [QA].
  • Coarse-to-Fine Amodal Segmentation with Shape Prior - [2308.16825] [QA].
  • Can Programming Languages Boost Each Other via Instruction Tuning? - [2308.16824] [QA].
  • Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models - [2308.16777] [QA].
  • Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images - [2308.16758] [QA].
  • Parsing is All You Need for Accurate Gait Recognition in the Wild - [2308.16739] [QA].
  • ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation - [2308.16689] [QA].
  • Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images - [2308.16582] [QA].
  • MVDream: Multi-view Diffusion for 3D Generation - [2308.16512] [QA].
  • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations - [2308.16505] [QA].
  • PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction - [2308.16477] [QA].
  • Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models - [2308.16463] [QA].
  • Improving Lens Flare Removal with General Purpose Pipeline and Multiple Light Sources Recovery - [2308.16460] [QA].
  • BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge - [2308.16458] [QA].
  • Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff - [2308.16454] [QA].
  • Emergence of Segmentation with Minimalistic White-Box Transformers - [2308.16271] [QA].
  • Active Neural Mapping - [2308.16246] [QA].
  • Learning Vision-based Pursuit-Evasion Robot Policies - [2308.16185] [QA].
  • SAM-Med2D - [2308.16184] [QA].
  • MMVP: Motion-Matrix-based Video Prediction - [2308.16154] [QA].
  • LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models - [2308.16137] [QA].
  • Response: Emergent analogical reasoning in large language models - [2308.16118] [QA].
  • Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion - [2308.16083] [QA].
  • RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation - [2308.15975] [QA].
  • WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model - [2308.15962] [QA].
  • LLaSM: Large Language and Speech Model - [2308.15930] [QA].
  • Reconstructing Groups of People with Hypergraph Relational Reasoning - [2308.15844] [QA].
  • Introducing Language Guidance in Prompt-based Continual Learning - [2308.15827] [QA].
  • WeatherBench 2: A benchmark for the next generation of data-driven global weather models - [2308.15560] [QA].
  • Canonical Factors for Hybrid Neural Fields - [2308.15461] [QA].
  • Shatter and Gather: Learning Referring Image Segmentation with Text Supervision - [2308.15512] [QA].
  • Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation - [2308.15367] [QA].
  • CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation - [2308.15226] [QA].
  • Evaluation and Analysis of Hallucination in Large Vision-Language Models - [2308.15126] [QA].
  • Learning to Upsample by Learning to Sample - [2308.15085] [QA].
  • Class Prior-Free Positive-Unlabeled Learning with Taylor Variational Loss for Hyperspectral Remote Sensing Imagery - [2308.15081] [QA].
  • Exploring Model Transferability through the Lens of Potential Energy - [2308.15074] [QA].
  • Pose-Free Neural Radiance Fields via Implicit Pose Regularization - [2308.15049] [QA].
  • Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models - [2308.15022] [QA].
  • Vision Grid Transformer for Document Layout Analysis - [2308.14978] [QA].
  • LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks - [2308.14972] [QA].
  • Vector Search with OpenAI Embeddings: Lucene Is All You Need - [2308.14963] [QA].
  • Read-only Prompt Optimization for Vision-Language Few-shot Learning - [2308.14960] [QA].
  • NSF: Neural Surface Fields for Human Modeling from Monocular Depth - [2308.14847] [QA].
  • CLNeRF: Continual Learning Meets NeRF - [2308.14816] [QA].
  • Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond - [2308.14753] [QA].
  • AI Deception: A Survey of Examples, Risks, and Potential Solutions - [2308.14752] [QA].
  • R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras - [2308.14713] [QA].
  • S-TREK: Sequential Translation and Rotation Equivariant Keypoints for local feature extraction - [2308.14598] [QA].
  • Referring Image Segmentation Using Text Supervision - [2308.14575] [QA].
  • LAC: Latent Action Composition for Skeleton-based Action Segmentation - [2308.14500] [QA].
  • Priority-Centric Human Motion Generation in Discrete Latent Space - [2308.14480] [QA].
  • Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor - [2308.14383] [QA].
  • ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models - [2308.14353] [QA].
  • DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation - [2308.14346] [QA].
  • Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection - [2308.14286] [QA].
  • HoloFusion: Towards Photo-realistic 3D Generative Modeling - [2308.14244] [QA].
  • High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net - [2308.14221] [QA].
  • Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks - [2308.14153] [QA].
  • Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers - [2308.14152] [QA].
  • Semi-Supervised Learning in the Few-Shot Zero-Shot Scenario - [2308.14119] [QA].
  • MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records - [2308.14089] [QA].
  • 4D Myocardium Reconstruction with Decoupled Motion and Shape Model - [2308.14083] [QA].
  • Reconstructing Interacting Hands with Interaction Prior from Monocular Images - [2308.14082] [QA].
  • Nonrigid Object Contact Estimation With Regional Unwrapping Transformer - [2308.14074] [QA].
  • Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection - [2308.14061] [QA].
  • Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation - [2308.14023] [QA].
  • Calibrating Panoramic Depth Estimation for Practical Localization and Mapping - [2308.14005] [QA].
  • LDL: Line Distance Functions for Panoramic Localization - [2308.13989] [QA].
  • Prior-guided Source-free Domain Adaptation for Human Pose Estimation - [2308.13954] [QA].
  • Late Stopping: Avoiding Confidently Learning from Mislabeled Examples - [2308.13862] [QA].
  • Beyond One-to-One: Rethinking the Referring Image Segmentation - [2308.13853] [QA].
  • Point-Query Quadtree for Crowd Counting, Localization, and More - [2308.13814] [QA].
  • ORES: Open-vocabulary Responsible Visual Synthesis - [2308.13785] [QA].
  • Generalized Lightness Adaptation with Channel Selective Normalization - [2308.13783] [QA].
  • MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree - [2308.13735] [QA].
  • ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning - [2308.13724] [QA].
  • Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation - [2308.13505] [QA].
  • A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance - [2308.13504] [QA].
  • Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers - [2308.13494] [QA].
  • Leveraging Knowledge and Reinforcement Learning for Enhanced Reliability of Language Models - [2308.13467] [QA].
  • Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models - [2308.13437] [QA].
  • Nougat: Neural Optical Understanding for Academic Documents - [2308.13418] [QA].
  • SoTaNa: The Open-Source Software Development Assistant - [2308.13416] [QA].
  • Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning - [2308.13411] [QA].
  • Relighting Neural Radiance Fields with Shadow and Highlight Hints - [2308.13404] [QA].
  • Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs - [2308.13387] [QA].
  • Distribution-Aligned Diffusion for Human Mesh Recovery - [2308.13369] [QA].
  • ConSlide: Asynchronous Hierarchical Interaction Transformer with Breakup-Reorganize Rehearsal for Continual Whole Slide Image Analysis - [2308.13324] [QA].
  • SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation - [2308.13323] [QA].
  • A Game of Bundle Adjustment -- Learning Efficient Convergence - [2308.13270] [QA].
  • Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation - [2308.13266] [QA].
  • Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map - [2308.13245] [QA].
  • Black-box Unsupervised Domain Adaptation with Bi-directional Atkinson-Shiffrin Memory - [2308.13236] [QA].
  • ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking - [2308.13229] [QA].
  • MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning - [2308.13218] [QA].
  • IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization - [2308.13168] [QA].
  • Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model - [2308.13164] [QA].
  • SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research - [2308.13149] [QA].
  • OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models - [2308.13137] [QA].
  • MLLM-DataEngine: An Iterative Refinement Approach for MLLM - [2308.13566] [QA].
  • Preserving Modality Structure Improves Multi-Modal Learning - [2308.13077] [QA].
  • NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes - [2308.12967] [QA].
  • Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation - [2308.12968] [QA].
  • Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities - [2308.12966] [QA].
  • Dense Text-to-Image Generation with Attention Modulation - [2308.12964] [QA].
  • MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models - [2308.12963] [QA].
  • Motion-Guided Masking for Spatiotemporal Representation Learning - [2308.12962] [QA].
  • Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment - [2308.12960] [QA].
  • Code Llama: Open Foundation Models for Code - [2308.12950] [QA].
  • Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? - [2308.12898] [QA].
  • Boosting Semantic Segmentation from the Perspective of Explicit Class Embeddings - [2308.12894] [QA].
  • ToonTalker: Cross-Domain Face Reenactment - [2308.12866] [QA].
  • Fast Adversarial Training with Smooth Convergence - [2308.12857] [QA].
  • On Offline Evaluation of 3D Object Detection for Autonomous Driving - [2308.12779] [QA].
  • LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition - [2308.12774] [QA].
  • VIGC: Visual Instruction Generation and Correction - [2308.12714] [QA].
  • A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions - [2308.12700] [QA].
  • PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation - [2308.12604] [QA].
  • Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation - [2308.12595] [QA].
  • Self-supervised Learning of Implicit Shape Representation with Dense Correspondence for Deformable Objects - [2308.12590] [QA].
  • Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation - [2308.12587] [QA].
  • Hyperbolic Audio-visual Zero-shot Learning - [2308.12558] [QA].
  • Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking - [2308.12549] [QA].
  • CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias - [2308.12539] [QA].
  • Masked Autoencoders are Efficient Class Incremental Learners - [2308.12510] [QA].
  • CGMI: Configurable General Multi-Agent Interaction Framework - [2308.12503] [QA].
  • With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning - [2308.12383] [QA].
  • Vision Transformer Adapters for Generalizable Multitask Learning - [2308.12372] [QA].
  • AdVerb: Visually Guided Audio Dereverberation - [2308.12370] [QA].
  • Continual Zero-Shot Learning through Semantically Guided Generative Random Walks - [2308.12366] [QA].
  • Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation - [2308.12350] [QA].
  • Improving Generative Model-based Unfolding with Schrödinger Bridges - [2308.12351] [QA].
  • CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images - [2308.12288] [QA].
  • Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models - [2308.12272] [QA].
  • Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning - [2308.12219] [QA].
  • SG-Former: Self-guided Transformer with Evolving Token Reallocation - [2308.12216] [QA].
  • CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No - [2308.12213] [QA].
  • Curriculum Learning with Adam: The Devil Is in the Wrong Details - [2308.12202] [QA].
  • Sign Language Translation with Iterative Prototype - [2308.12191] [QA].
  • SILT: Shadow-aware Iterative Label Tuning for Learning to Detect Shadows from Noisy Labels - [2308.12064] [QA].
  • DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration - [2308.12058] [QA].
  • Aligning Language Models with Offline Reinforcement Learning from Human Feedback - [2308.12050] [QA].
  • Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages - [2308.12038] [QA].
  • RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D - [2308.12035] [QA].
  • From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models - [2308.12014] [QA].
  • RankMixup: Ranking-Based Mixup Training for Network Calibration - [2308.11990] [QA].
  • Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields - [2308.11974] [QA].
  • EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE - [2308.11971] [QA].
  • OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes - [2308.11928] [QA].
  • Recovering a Molecule's 3D Dynamics from Liquid-phase Electron Microscopy Movies - [2308.11927] [QA].
  • LFS-GAN: Lifelong Few-Shot Image Generation - [2308.11917] [QA].
  • Semantic-Aware Implicit Template Learning via Part Deformation Consistency - [2308.11916] [QA].
  • ACLS: Adaptive and Conditional Label Smoothing for Network Calibration - [2308.11911] [QA].
  • Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification - [2308.11901] [QA].
  • Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack - [2308.11894] [QA].
  • SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets - [2308.11880] [QA].
  • Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch - [2308.11874] [QA].
  • Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations - [2308.11796] [QA].
  • Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts - [2308.11793] [QA].
  • Understanding Hessian Alignment for Domain Generalization - [2308.11778] [QA].
  • Efficient Controllable Multi-Task Architectures - [2308.11744] [QA].
  • Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape - [2308.11737] [QA].
  • Efficient Benchmarking (of Language Models) - [2308.11696] [QA].
  • Delving into Motion-Aware Matching for Monocular 3D Object Tracking - [2308.11607] [QA].
  • StoryBench: A Multifaceted Benchmark for Continuous Story Visualization - [2308.11606] [QA].
  • SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation - [2308.11568] [QA].
  • Multi-event Video-Text Retrieval - [2308.11551] [QA].
  • TrackFlow: Multi-Object Tracking with Normalizing Flows - [2308.11513] [QA].
  • Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition - [2308.11489] [QA].
  • Learning a More Continuous Zero Level Set in Unsigned Distance Fields through Level Set Projection - [2308.11441] [QA].
  • A Survey on Large Language Model based Autonomous Agents - [2308.11432] [QA].
  • ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes - [2308.11417] [QA].
  • How Much Temporal Long-Term Context is Needed for Action Segmentation? - [2308.11358] [QA].
  • Exemplar-Free Continual Transformer with Convolutions - [2308.11357] [QA].
  • ProAgent: Building Proactive Cooperative AI with Large Language Models - [2308.11339] [QA].
  • GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training - [2308.11331] [QA].
  • CiteTracker: Correlating Image and Text for Visual Tracking - [2308.11322] [QA].
  • CNN based Cuneiform Sign Detection Learned from Annotated 3D Renderings and Mapped Photographs with Illumination Augmentation - [2308.11277] [QA].
  • HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations - [2308.11261] [QA].
  • ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts - [2308.11236] [QA].
  • LDP-Feat: Image Features with Local Differential Privacy - [2308.11223] [QA].
  • DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment - [2308.11206] [QA].
  • ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data - [2308.11194] [QA].
  • Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models - [2308.11186] [QA].
  • MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation - [2308.11185] [QA].
  • ReFit: Recurrent Fitting Network for 3D Human Recovery - [2308.11184] [QA].
  • Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation - [2308.11166] [QA].
  • Domain Generalization via Rationale Invariance - [2308.11158] [QA].
  • Efficient View Synthesis with Neural Radiance Distribution Field - [2308.11130] [QA].
  • LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction - [2308.11116] [QA].
  • CAME: Contrastive Automated Model Evaluation - [2308.11111] [QA].
  • Recursive Video Lane Detection - [2308.11106] [QA].
  • MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers - [2308.11096] [QA].
  • Video OWL-ViT: Temporally-consistent open-world localization in video - [2308.11093] [QA].
  • Audio-Visual Class-Incremental Learning - [2308.11073] [QA].
  • TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection - [2308.11072] [QA].
  • Neural Amortized Inference for Nested Multi-agent Reasoning - [2308.11071] [QA].
  • MetaGCD: Learning to Continually Learn in Generalized Category Discovery - [2308.11063] [QA].
  • UnLoc: A Unified Framework for Video Localization Tasks - [2308.11062] [QA].
  • Coordinate Quantized Neural Implicit Representations for Multi-view Reconstruction - [2308.11025] [QA].
  • Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images - [2308.11015] [QA].
  • Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation - [2308.10898] [QA].
  • Can Language Models Learn to Listen? - [2308.10897] [QA].
  • EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition - [2308.10832] [QA].
  • Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction - [2308.10820] [QA].
  • Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers - [2308.10814] [QA].
  • Improving Continuous Sign Language Recognition with Cross-Lingual Signs - [2308.10809] [QA].
  • MGMAE: Motion Guided Masking for Video Masked Autoencoding - [2308.10794] [QA].
  • Instruction Tuning for Large Language Models: A Survey - [2308.10792] [QA].
  • WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models - [2308.10755] [QA].
  • On the Adversarial Robustness of Multi-Modal Foundation Models - [2308.10741] [QA].
  • Patch Is Not All You Need - [2308.10729] [QA].
  • Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction - [2308.10694] [QA].
  • Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification - [2308.10658] [QA].
  • GaitPT: Skeletons Are All You Need For Gait Recognition - [2308.10623] [QA].
  • A step towards understanding why classification helps regression - [2308.10603] [QA].
  • Image-free Classifier Injection for Zero-Shot Classification - [2308.10599] [QA].
  • CHORD: Category-level Hand-held Object Reconstruction via Shape Deformation - [2308.10574] [QA].
  • Self-Feedback DETR for Temporal Action Detection - [2308.10570] [QA].
  • Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations - [2308.10554] [QA].
  • QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection - [2308.10515] [QA].
  • Large Language Model as a User Simulator - [2308.11534] [QA].
  • Texture Generation on 3D Meshes with Point-UV Diffusion - [2308.10490] [QA].
  • ADNet: Lane Shape Prediction via Anchor Decomposition - [2308.10481] [QA].
  • STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning - [2308.10468] [QA].
  • Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models - [2308.10462] [QA].
  • Privacy-Preserving Face Recognition Using Random Frequency Components - [2308.10461] [QA].
  • Explore and Tell: Embodied Visual Captioning in 3D Environments - [2308.10447] [QA].
  • When Prompt-based Incremental Learning Does Not Meet Strong Pretraining - [2308.10445] [QA].
  • X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events - [2308.10441] [QA].
  • GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems - [2308.10435] [QA].
  • Diffusion Model as Representation Learner - [2308.10916] [QA].
  • Simple Baselines for Interactive Video Retrieval with Questions and Answers - [2308.10402] [QA].
  • FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models - [2308.10397] [QA].
  • Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models - [2308.10379] [QA].
  • LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models - [2308.11462] [QA].
  • Strata-NeRF : Neural Radiance Fields for Stratified Scenes - [2308.10337] [QA].
  • Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos - [2308.10334] [QA].
  • Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting - [2308.10315] [QA].
  • DVGaze: Dual-View Gaze Estimation - [2308.10310] [QA].
  • Representation Disparity-aware Distillation for 3D Object Detection - [2308.10308] [QA].
  • Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation - [2308.10306] [QA].
  • Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video - [2308.10305] [QA].
  • DomainAdaptor: A Novel Approach to Test-time Adaptation - [2308.10297] [QA].
  • DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization - [2308.10285] [QA].
  • GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning - [2308.10279] [QA].
  • CharacterChat: Learning towards Conversational AI with Personalized Social Support - [2308.10278] [QA].
  • Minimalist Traffic Prediction: Linear Layer Is All You Need - [2308.10276] [QA].
  • StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data - [2308.10253] [QA].
  • GeT: Generative Target Structure Debiasing for Domain Adaptation - [2308.10205] [QA].
  • ChatEDA: A Large Language Model Powered Autonomous Agent for EDA - [2308.10204] [QA].
  • ViT-Lens: Towards Omni-modal Representations - [2308.10185] [QA].
  • Neural Interactive Keypoint Detection - [2308.10174] [QA].
  • VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation - [2308.10172] [QA].
  • FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory - [2308.10170] [QA].
  • Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection - [2308.10155] [QA].
  • A Survey on Fairness in Large Language Models - [2308.10149] [QA].
  • ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer - [2308.10147] [QA].
  • OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision - [2308.10146] [QA].
  • ExpeL: LLM Agents Are Experiential Learners - [2308.10144] [QA].
  • March in Chat: Interactive Prompting for Remote Embodied Referring Expression - [2308.10141] [QA].
  • AutoReP: Automatic ReLU Replacement for Fast Private Network Inference - [2308.10134] [QA].
  • TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective - [2308.10133] [QA].
  • 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation - [2308.10123] [QA].
  • HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation - [2308.10122] [QA].
  • Robust Mixture-of-Expert Training for Convolutional Neural Networks - [2308.10110] [QA].
  • Root Pose Decomposition Towards Generic Non-rigid 3D Reconstruction with Monocular Videos - [2308.10089] [QA].
  • GameEval: Evaluating LLMs on Conversational Games - [2308.10032] [QA].
  • Single Image Reflection Separation via Component Synergy - [2308.10027] [QA].
  • Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation - [2308.10016] [QA].
  • Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts - [2308.10005] [QA].
  • ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment - [2308.09987] [QA].
  • FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models - [2308.09975] [QA].
  • Disposable Transfer Learning for Selective Source Task Unlearning - [2308.09971] [QA].
  • Tackling Vision Language Tasks Through Learning Inner Monologues - [2308.09970] [QA].
  • Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos - [2308.09951] [QA].
  • Scene-Aware Feature Matching - [2308.09949] [QA].
  • Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling - [2308.09946] [QA].
  • On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion - [2308.09942] [QA].
  • Understanding Self-attention Mechanism via Dynamical System Perspective - [2308.09939] [QA].
  • BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions - [2308.09936] [QA].
  • MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition - [2308.09922] [QA].
  • VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations - [2308.09916] [QA].
  • Scalable Video Object Segmentation with Simplified Framework - [2308.09903] [QA].
  • SwinLSTM:Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM - [2308.09891] [QA].
  • Calibrating Uncertainty for Semi-Supervised Crowd Counting - [2308.09887] [QA].
  • Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders - [2308.09882] [QA].
  • Skill Transformer: A Monolithic Policy for Mobile Manipulation - [2308.09873] [QA].
  • A Theory of Topological Derivatives for Inverse Rendering of Geometry - [2308.09865] [QA].
  • How susceptible are LLMs to Logical Fallacies? - [2308.09853] [QA].
  • Learning from A Single Graph is All You Need for Near-Shortest Path Routing in Wireless Networks - [2308.09829] [QA].
  • VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control - [2308.09804] [QA].
  • Long-range Multimodal Pretraining for Movie Understanding - [2308.09775] [QA].
  • Smoothness Similarity Regularization for Few-Shot GAN Adaptation - [2308.09717] [QA].
  • Robust Monocular Depth Estimation under Challenging Conditions - [2308.09711] [QA].
  • Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment - [2308.09662] [QA].
  • Is context all you need? Scaling Neural Sign Language Translation to Large Domains of Discourse - [2308.09622] [QA].
  • LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark - [2308.09618] [QA].
  • ChatHaruhi: Reviving Anime Character in Reality via Large Language Model - [2308.09597] [QA].
  • StableVideo: Text-driven Consistency-aware Diffusion Video Editing - [2308.09592] [QA].
  • WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct - [2308.09583] [QA].
  • PUMGPT: A Large Vision-Language Model for Product Understanding - [2308.09568] [QA].
  • Normalization Is All You Need: Understanding Layer-Normalized Federated Learning under Extreme Label Shift - [2308.09565] [QA].
  • Deep Equilibrium Object Detection - [2308.09564] [QA].
  • Meta-ZSDETR: Zero-shot DETR with Meta-learning - [2308.09540] [QA].
  • Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning - [2308.09534] [QA].
  • Leveraging Intrinsic Properties for Non-Rigid Garment Alignment - [2308.09519] [QA].
  • ResQ: Residual Quantization for Video Perception - [2308.09511] [QA].
  • Vision Relation Transformer for Unbiased Scene Graph Generation - [2308.09472] [QA].
  • Scope is all you need: Transforming LLMs for HPC Code - [2308.09440] [QA].
  • MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection - [2308.09421] [QA].
  • Generalizable Decision Boundaries: Dualistic Meta-Learning for Open Set Domain Generalization - [2308.09391] [QA].
  • DReg-NeRF: Deep Registration for Neural Radiance Fields - [2308.09386] [QA].
  • Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events - [2308.09383] [QA].
  • Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models - [2308.09363] [QA].
  • RLIPv2: Fast Scaling of Relational Language-Image Pre-training - [2308.09351] [QA].
  • Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching - [2308.09346] [QA].
  • Audio-Visual Glance Network for Efficient Video Recognition - [2308.09322] [QA].
  • Towards Attack-tolerant Federated Learning via Critical Parameter Analysis - [2308.09318] [QA].
  • Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation - [2308.09314] [QA].
  • Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge - [2308.09311] [QA].
  • DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability - [2308.09306] [QA].
  • Human Part-wise 3D Motion Context Learning for Sign Language Recognition - [2308.09305] [QA].
  • NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization for Continual Learning - [2308.09297] [QA].
  • Self-Calibrated Cross Attention Network for Few-Shot Segmentation - [2308.09294] [QA].
  • Diverse Cotraining Makes Strong Semi-Supervised Segmentor - [2308.09281] [QA].
  • Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos - [2308.09247] [QA].
  • Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos - [2308.09245] [QA].
  • SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos - [2308.09244] [QA].
  • ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation - [2308.09242] [QA].
  • Generalized Sum Pooling for Metric Learning - [2308.09228] [QA].
  • FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning - [2308.09160] [QA].
  • The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation - [2308.09139] [QA].
  • ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection - [2308.09098] [QA].
  • SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning - [2308.09040] [QA].
  • Reinforced Self-Training (ReST) for Language Modeling - [2308.08998] [QA].
  • Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction - [2308.08942] [QA].
  • Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification - [2308.08887] [QA].
  • Event-Guided Procedure Planning from Instructional Videos with Text Supervision - [2308.08885] [QA].
  • Towards Semi-supervised Learning with Non-random Missing Labels - [2308.08872] [QA].
  • Spatially and Spectrally Consistent Deep Functional Maps - [2308.08871] [QA].
  • D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field - [2308.08857] [QA].
  • Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling - [2308.08855] [QA].
  • CMB: A Comprehensive Medical Benchmark in Chinese - [2308.08833] [QA].
  • Fast Inference and Update of Probabilistic Density Estimation on Trajectory Prediction - [2308.08824] [QA].
  • MixBag: Bag-Level Data Augmentation for Learning from Label Proportions - [2308.08822] [QA].
  • Label Shift Adapter for Test-Time Adaptation under Covariate and Label Shifts - [2308.08810] [QA].
  • Long-Range Grouping Transformer for Multi-View 3D Reconstruction - [2308.08724] [QA].
  • V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints - [2308.08715] [QA].
  • Dynamic Neural Network is All You Need: Understanding the Robustness of Dynamic Mechanisms in Neural Networks - [2308.08709] [QA].
  • TeCH: Text-guided Reconstruction of Lifelike Clothed Humans - [2308.08545] [QA].
  • MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions - [2308.08544] [QA].
  • Learning to Distill Global Representation for Sparse-View CT - [2308.08463] [QA].
  • ALIP: Adaptive Language-Image Pre-training with Synthetic Caption - [2308.08428] [QA].
  • Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer - [2308.08414] [QA].
  • SIGMA: Scale-Invariant Global Sparse Shape Matching - [2308.08393] [QA].
  • Agglomerative Transformer for Human-Object Interaction Detection - [2308.08370] [QA].
  • Membrane Potential Batch Normalization for Spiking Neural Networks - [2308.08359] [QA].
  • Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations - [2308.08321] [QA].
  • Dual-Stream Diffusion Net for Text-to-Video Generation - [2308.08316] [QA].
  • SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes - [2308.08258] [QA].
  • MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation - [2308.08239] [QA].
  • Inherent Redundancy in Spiking Neural Networks - [2308.08227] [QA].
  • Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network - [2308.08220] [QA].
  • Unsupervised Domain Adaptive Detection with Network Stability Analysis - [2308.08182] [QA].
  • Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis - [2308.08157] [QA].
  • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework - [2308.08155] [QA].
  • GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds - [2308.08140] [QA].
  • OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution - [2308.08114] [QA].
  • View Consistent Purification for Accurate Cross-View Localization - [2308.08110] [QA].
  • Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation - [2308.08090] [QA].
  • DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory - [2308.08089] [QA].
  • Shortcut-V2V: Compression Framework for Video-to-Video Translation based on Temporal Redundancy Reduction - [2308.08011] [QA].
  • Teach LLMs to Personalize -- An Approach inspired by Writing Education - [2308.07968] [QA].
  • CoDeF: Content Deformation Fields for Temporally Consistent Video Processing - [2308.07926] [QA].
  • RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models - [2308.07922] [QA].
  • Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification - [2308.07921] [QA].
  • Helping Hands: An Object-Aware Ego-Centric Video Recognition Model - [2308.07918] [QA].
  • Relightable and Animatable Neural Avatar from Sparse-View Video - [2308.07903] [QA].
  • Through the Lens of Core Competency: Survey on Evaluation of Large Language Models - [2308.07902] [QA].
  • Memory-and-Anticipation Transformer for Online Action Understanding - [2308.07893] [QA].
  • Link-Context Learning for Multimodal LLMs - [2308.07891] [QA].
  • ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces - [2308.07868] [QA].
  • StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models - [2308.07863] [QA].
  • Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models - [2308.07847] [QA].
  • ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition - [2308.07815] [QA].
  • Learning to Identify Critical States for Reinforcement Learning from Videos - [2308.07795] [QA].
  • DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding - [2308.07787] [QA].
  • Identity-Consistent Aggregation for Video Object Detection - [2308.07737] [QA].
  • UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation - [2308.07732] [QA].
  • DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models - [2308.07687] [QA].
  • Boosting Multi-modal Model Performance with Adaptive Gradient Modulation - [2308.07686] [QA].
  • Attention Is Not All You Need Anymore - [2308.07661] [QA].
  • From Commit Message Generation to History-Aware Commit Message Completion - [2308.07655] [QA].
  • EQ-Net: Elastic Quantization Neural Networks - [2308.07650] [QA].
  • Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval - [2308.07648] [QA].
  • Backpropagation Path Search On Adversarial Transferability - [2308.07625] [QA].
  • Story Visualization by Online Text Augmentation with Context Memory - [2308.07575] [QA].
  • 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack - [2308.07546] [QA].
  • DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation - [2308.07498] [QA].
  • Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering - [2308.07411] [QA].
  • Text Injection for Capitalization and Turn-Taking Prediction in Speech Models - [2308.07395] [QA].
  • PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects - [2308.07391] [QA].
  • Platypus: Quick, Cheap, and Powerful Refinement of LLMs - [2308.07317] [QA].
  • Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation - [2308.07316] [QA].
  • Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation - [2308.07313] [QA].
  • The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation - [2308.07286] [QA].
  • Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents - [2308.07241] [QA].
  • RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs - [2308.07228] [QA].
  • Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning - [2308.07209] [QA].
  • ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate - [2308.07201] [QA].
  • OctoPack: Instruction Tuning Code Large Language Models - [2308.07124] [QA].
  • CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation - [2308.07146] [QA].
  • Occ$^2$Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions - [2308.16160] [QA].
  • Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice - [2308.07120] [QA].
  • Large Language Models for Information Retrieval: A Survey - [2308.07107] [QA].
  • Masked Motion Predictors are Strong 3D Action Representation Learners - [2308.07092] [QA].
  • S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields - [2308.07032] [QA].
  • ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion - [2308.07009] [QA].
  • Global Features are All You Need for Image Retrieval and Reranking - [2308.06954] [QA].
  • Knowing Where to Focus: Event-aware Transformer for Video Grounding - [2308.06947] [QA].
  • CBA: Improving Online Continual Learning via Continual Bias Adaptor - [2308.06925] [QA].
  • CausalLM is not optimal for in-context learning - [2308.06912] [QA].
  • Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking - [2308.06904] [QA].
  • Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization - [2308.06879] [QA].
  • SpeechX: Neural Codec Language Model as a Versatile Speech Transformer - [2308.06873] [QA].
  • RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks - [2308.06787] [QA].
  • Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning - [2308.06777] [QA].
  • Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches - [2308.06776] [QA].
  • Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan - [2308.06774] [QA].
  • AerialVLN: Vision-and-Language Navigation for UAVs - [2308.06735] [QA].
  • IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models - [2308.06721] [QA].
  • Compositional Feature Augmentation for Unbiased Scene Graph Generation - [2308.06712] [QA].
  • Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection - [2308.06701] [QA].
  • Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation - [2308.06693] [QA].
  • Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training - [2308.06689] [QA].
  • 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking - [2308.06635] [QA].
  • VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use - [2308.06595] [QA].
  • Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction - [2308.06554] [QA].
  • Revisiting Vision Transformer from the View of Path Ensemble - [2308.06548] [QA].
  • SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning - [2308.06531] [QA].
  • BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation - [2308.06530] [QA].
  • One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training - [2308.07934] [QA].
  • Tiny and Efficient Model for the Edge Detection Generalization - [2308.06468] [QA].
  • Multi-Label Knowledge Distillation - [2308.06453] [QA].
  • Detecting and Preventing Hallucinations in Large Vision Language Models - [2308.06394] [QA].
  • U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds - [2308.06383] [QA].
  • Enhancing Network Management Using Code Generated by Large Language Models - [2308.06261] [QA].
  • Self-Alignment with Instruction Backtranslation - [2308.06259] [QA].
  • FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods - [2308.06248] [QA].
  • Exploring Predicate Visual Context in Detecting of Human-Object Interactions - [2308.06202] [QA].
  • Improving Joint Speech-Text Representations Without Alignment - [2308.06125] [QA].
  • Composable Function-preserving Expansions for Transformer Architectures - [2308.06103] [QA].
  • Out-of-Distribution Detection for Monocular Depth Estimation - [2308.06072] [QA].
  • Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning - [2308.06038] [QA].
  • Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation - [2308.06015] [QA].
  • Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection - [2308.05991] [QA].
  • TrajPAC: Towards Robustness Verification of Pedestrian Trajectory Prediction Models - [2308.05985] [QA].
  • BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents - [2308.05960] [QA].
  • Generalizing Event-Based Motion Deblurring in Real-World Scenarios - [2308.05932] [QA].
  • Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking - [2308.05911] [QA].
  • PIPPA: A Partially Synthetic Conversational Dataset - [2308.05884] [QA].
  • PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs - [2308.05744] [QA].
  • Follow Anything: Open-set detection, tracking, and following in real-time - [2308.05737] [QA].
  • AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining - [2308.05734] [QA].
  • FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models - [2308.05733] [QA].
  • PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers - [2308.05732] [QA].
  • Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient - [2308.05681] [QA].
  • 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds - [2308.05667] [QA].
  • Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network - [2308.05605] [QA].
  • Cross-Domain Product Representation Learning for Rich-Content E-Commerce - [2308.05550] [QA].
  • Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation - [2308.05493] [QA].
  • LLM As DBA - [2308.05481] [QA].
  • Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation - [2308.05441] [QA].
  • Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation - [2308.05438] [QA].
  • SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data - [2308.05410] [QA].
  • Learning Gabor Texture Features for Fine-Grained Recognition - [2308.05396] [QA].
  • Enhancing Trust in LLM-Based AI Automation Agents: New Considerations and Future Challenges - [2308.05391] [QA].
  • Interaction-aware Joint Attention Estimation Using People Attributes - [2308.05382] [QA].
  • Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment - [2308.05374] [QA].
  • Flexible Isosurface Extraction for Gradient-Based Mesh Optimization - [2308.05371] [QA].
  • Pseudo-label Alignment for Semi-supervised Instance Segmentation - [2308.05359] [QA].
  • OpenProteinSet: Training data for structural biology at scale - [2308.05326] [QA].
  • RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation - [2308.05318] [QA].
  • Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI - [2308.05221] [QA].
  • LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation - [2308.05095] [QA].
  • Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution - [2308.05022] [QA].
  • Robust Object Modeling for Visual Tracking - [2308.05140] [QA].
  • IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models - [2308.04995] [QA].
  • Foreground Object Search by Distilling Composite Image Feature - [2308.04990] [QA].
  • Prototypical Kernel Learning and Open-set Foreground Perception for Generalized Few-shot Semantic Segmentation - [2308.04952] [QA].
  • SelectNAdapt: Support Set Selection for Few-Shot Domain Adaptation - [2308.04946] [QA].
  • LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking - [2308.04945] [QA].
  • Cross-view Semantic Alignment for Livestreaming Product Recognition - [2308.04912] [QA].
  • MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation - [2308.04829] [QA].
  • WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields - [2308.04826] [QA].
  • Joint-Relation Transformer for Multi-Person Motion Prediction - [2308.04808] [QA].
  • PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration - [2308.04782] [QA].
  • Objects do not disappear: Video object detection by single-frame object location anticipation - [2308.04770] [QA].
  • Bird's-Eye-View Scene Graph for Vision-Language Navigation - [2308.04758] [QA].
  • JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models - [2308.04729] [QA].
  • GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization - [2308.04699] [QA].
  • Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising - [2308.04682] [QA].
  • Accelerating LLM Inference with Staged Speculative Decoding - [2308.04623] [QA].
  • Rendering Humans from Object-Occluded Monocular Videos - [2308.04622] [QA].
  • Shepherd: A Critic for Language Model Generation - [2308.04592] [QA].
  • LATR: 3D Lane Detection from Monocular Images with Transformer - [2308.04583] [QA].
  • FocalFormer3D : Focusing on Hard Instance for 3D Object Detection - [2308.04556] [QA].
  • Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation - [2308.04549] [QA].
  • SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore - [2308.04430] [QA].
  • DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds - [2308.04383] [QA].
  • 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment - [2308.04352] [QA].
  • A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages - [2308.04477] [QA].
  • Lossy and Lossless (L$^2$) Post-training Model Size Compression - [2308.04269] [QA].
  • FLIRT: Feedback Loop In-context Red Teaming - [2308.04265] [QA].
  • Exploring Transformers for Open-world Instance Segmentation - [2308.04206] [QA].
  • D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation - [2308.04197] [QA].
  • Under-Display Camera Image Restoration with Scattering Effect - [2308.04163] [QA].
  • EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation - [2308.04162] [QA].
  • Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions - [2308.04152] [QA].
  • OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation - [2308.04126] [QA].
  • 3D Gaussian Splatting for Real-Time Radiance Field Rendering - [2308.04079] [QA].
  • Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation - [2308.04061] [QA].
  • Gentopia: A Collaborative Platform for Tool-Augmented LLMs - [2308.04030] [QA].
  • AgentSims: An Open-Source Sandbox for Large Language Model Evaluation - [2308.04026] [QA].
  • Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning - [2308.04016] [QA].
  • Continual Pre-Training of Large Language Models: How to (re)warm your model? - [2308.04014] [QA].
  • Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval - [2308.04008] [QA].
  • PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection - [2308.03982] [QA].
  • Simple synthetic data reduces sycophancy in large language models - [2308.03958] [QA].
  • TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models - [2308.03906] [QA].
  • From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal - [2308.03867] [QA].
  • 3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields - [2308.03757] [QA].
  • Tiny LVLM-eHub: Early Multimodal Experiments with Bard - [2308.03729] [QA].
  • Scaling may be all you need for achieving human-level object recognition capacity with human-like visual experience - [2308.03712] [QA].
  • AgentBench: Evaluating LLMs as Agents - [2308.03688] [QA].
  • Learning Concise and Descriptive Attributes for Visual Recognition - [2308.03685] [QA].
  • AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose - [2308.03610] [QA].
  • FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision - [2308.03594] [QA].
  • AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning - [2308.03526] [QA].
  • Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising - [2308.03448] [QA].
  • TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents - [2308.03427] [QA].
  • RecycleGPT: An Autoregressive Language Model with Recyclable Module - [2308.03421] [QA].
  • GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images - [2308.03413] [QA].
  • Heterogeneous Forgetting Compensation for Class-Incremental Learning - [2308.03374] [QA].
  • Dual Aggregation Transformer for Image Super-Resolution - [2308.03364] [QA].
  • Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots - [2308.03357] [QA].
  • SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs - [2308.03349] [QA].
  • Part-Aware Transformer for Generalizable Person Re-identification - [2308.03322] [QA].
  • Studying Large Language Model Generalization with Influence Functions - [2308.03296] [QA].
  • SynJax: Structured Probability Distributions for JAX - [2308.03291] [QA].
  • FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search - [2308.03290] [QA].
  • Multi-Label Self-Supervised Learning with Scene Images - [2308.03286] [QA].
  • Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation - [2308.03282] [QA].
  • Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing - [2308.03280] [QA].
  • UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition - [2308.03279] [QA].
  • A Benchmark for Chinese-English Scene Text Image Super-resolution - [2308.03262] [QA].
  • Source-free Domain Adaptive Human Pose Estimation - [2308.03202] [QA].
  • Building Safe and Reliable AI systems for Safety Critical Tasks with Vision-Language Processing - [2308.03176] [QA].
  • CGBA: Curvature-aware Geometric Black-box Attack - [2308.03163] [QA].
  • Prototypes-oriented Transductive Few-shot Learning with Conditional Transport - [2308.03047] [QA].
  • Learning Fine-Grained Features for Pixel-wise Video Correspondences - [2308.03040] [QA].
  • Pre-Trained Large Language Models for Industrial Control - [2308.03028] [QA].
  • Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection - [2308.02983] [QA].
  • An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability - [2308.02897] [QA].
  • Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation - [2308.02874] [QA].
  • Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis - [2308.02840] [QA].
  • EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education - [2308.02773] [QA].
  • DeDrift: Robust Similarity Search under Content Drift - [2308.02752] [QA].
  • ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation - [2308.03793] [QA].
  • MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities - [2308.02490] [QA].
  • Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP - [2308.02487] [QA].
  • Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints - [2308.02453] [QA].
  • Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text - [2308.02357] [QA].
  • FB-BEV: BEV Representation from Forward-Backward View Transformations - [2308.02236] [QA].
  • ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation - [2308.02223] [QA].
  • Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology - [2308.02180] [QA].
  • Learning Referring Video Object Segmentation from Weak Annotation - [2308.02162] [QA].
  • Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization - [2308.02151] [QA].
  • Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation - [2308.02097] [QA].
  • The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World - [2308.01907] [QA].
  • DETR Doesn't Need Multi-Scale or Locality Design - [2308.01904] [QA].
  • ConceptLab: Creative Generation using Diffusion Prior Constraints - [2308.02669] [QA].
  • ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation - [2308.01861] [QA].
  • Scaling Relationship on Learning Mathematical Reasoning with Large Language Models - [2308.01825] [QA].
  • RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension - [2308.02299] [QA].
  • Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport - [2308.01779] [QA].
  • Ambient Adventures: Teaching ChatGPT on Developing Complex Stories - [2308.01734] [QA].
  • LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment - [2308.01686] [QA].
  • A Multidimensional Analysis of Social Biases in Vision Transformers - [2308.01948] [QA].
  • InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent - [2308.01552] [QA].
  • Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation - [2308.01547] [QA].
  • MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies - [2308.01546] [QA].
  • Multimodal Neurons in Pretrained Text-Only Transformers - [2308.01544] [QA].
  • TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations - [2308.01499] [QA].
  • Target-point Attention Transformer: A novel trajectory predict network for end-to-end autonomous driving - [2308.1496] [QA].
  • Efficient neural supersampling on a novel gaming dataset - [2308.01483] [QA].
  • HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions - [2308.01477] [QA].
  • Training Data Protection with Compositional Diffusion Models - [2308.01937] [QA].
  • VertexSerum: Poisoning Graph Neural Networks for Link Inference - [2308.01469] [QA].
  • From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion - [2308.02560] [QA].
  • On $κ$-solutions and canonical neighborhoods in 4d Ricci flow - [2308.1448] [QA].
  • OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models - [2308.01390] [QA].
  • DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales - [2308.01320] [QA].
  • Computational Long Exposure Mobile Photography - [2308.01379] [QA].
  • More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes - [2308.01313] [QA].
  • Revisiting DETR Pre-training for Object Detection - [2308.01300] [QA].
  • XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models - [2308.01263] [QA].
  • A Hyper-pixel-wise Contrastive Learning Augmented Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images and Digital Elevation Model Data - [2308.1251] [QA].
  • Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation - [2308.01240] [QA].
  • LSF-IDM: Automotive Intrusion Detection Model with Lightweight Attribution and Semantic Fusion - [2308.1237] [QA].
  • Grounded Image Text Matching with Mismatched Relation Reasoning - [2308.01236] [QA].
  • Geometric wakes in collimators and step transitions of arbitrary cross-sections: conformal mapping approach - [2308.1235] [QA].
  • One Tree to Rule Them All: Poly-Logarithmic Universal Steiner Tree - [2308.1199] [QA].
  • Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation - [2308.01194] [QA].
  • Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey - [2308.01191] [QA].
  • Three-level Dicke quantum battery - [2308.1188] [QA].
  • Multiobjective Optimization of Non-Smooth PDE-Constrained Problems - [2308.1113] [QA].
  • Black hole thermodynamics in Horndeski theories - [2308.1082] [QA].
  • MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening - [2308.1057] [QA].
  • Stability Analysis for a Class of Heterogeneous Catalysis Models - [2308.1049] [QA].
  • Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation - [2308.01045] [QA].
  • An improved infrastructure for the IceCube realtime system - [2308.1031] [QA].
  • Model-agnostic search for the quasinormal modes of gravitational wave echoes - [2308.1017] [QA].
  • Enhancing Representation Learning for Periodic Time Series with Floss: A Frequency Domain Regularization Approach - [2308.1011] [QA].
  • From Sparse to Soft Mixtures of Experts - [2308.00951] [QA].
  • Cosmological Distance Measurement of 12 Nearby Supernovae IIP with ROTSE-IIIB - [2308.0916] [QA].
  • ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation - [2308.00906] [QA].
  • VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference - [2308.0904] [QA].
  • Weak localization in radiative transfer of acoustic waves in a randomly-fluctuating slab - [2308.0822] [QA].
  • Optimal design of plane elastic membranes using the convexified Föppl's model - [2308.0811] [QA].
  • Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction - [2308.00799] [QA].
  • LISA: Reasoning Segmentation via Large Language Model - [2308.00692] [QA].
  • Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models - [2308.00675] [QA].
  • Note: Stokes-Einstein relation without hydrodynamic diameter in the TIP4P/Ice water model - [2308.0653] [QA].
  • ELFNet: Evidential Local-global Fusion for Stereo Matching - [2308.00728] [QA].
  • Detecting Cloud Presence in Satellite Images Using the RGB-based CLIP Vision-Language Model - [2308.0541] [QA].
  • Understanding URDF: A Dataset and Analysis - [2308.0514] [QA].
  • Stochastic Geometry Based Modeling and Analysis on Network NOMA in Downlink CoMP Systems - [2308.0499] [QA].
  • A many-sorted epistemic logic for chromatic hypergraphs - [2308.0477] [QA].
  • FLatten Transformer: Vision Transformer using Focused Linear Attention - [2308.00442] [QA].
  • SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning - [2308.00436] [QA].
  • DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving - [2308.00398] [QA].
  • Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning - [2308.02533] [QA].
  • Deep Image Harmonization with Learnable Augmentation - [2308.00376] [QA].
  • Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation - [2308.00356] [QA].
  • MetaGPT: Meta Programming for Multi-Agent Collaborative Framework - [2308.00352] [QA].
  • Artifact: Measuring and Mitigating Gaps in Structural Testing - [2308.0316] [QA].
  • Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models - [2308.00304] [QA].
  • Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models - [2308.0304] [QA].
  • Online Prototype Learning for Online Continual Learning - [2308.00301] [QA].
  • CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering - [2308.0284] [QA].
  • Improving Pixel-based MIM by Reducing Wasted Modeling Capability - [2308.00261] [QA].
  • GOALS-JWST: Gas Dynamics and Excitation in NGC7469 revealed by NIRSpec - [2308.0209] [QA].

July 2023

  • Predicting masked tokens in stochastic locations improves masked image modeling - [2308.00566] [QA].
  • Learning to Model the World with Language - [2308.01399] [QA].
  • Discovering Adaptable Symbolic Algorithms from Scratch - [2307.16890] [QA].
  • Virtual Prompt Injection for Instruction-Tuned Large Language Models - [2307.16888] [QA].
  • Shortcut Partitions in Minor-Free Graphs: Steiner Point Removal, Distance Oracles, Tree Covers, and More - [2308.0555] [QA].
  • Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy - [2307.16867] [QA].
  • Random Sub-Samples Generation for Self-Supervised Real Image Denoising - [2307.16825] [QA].
  • ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs - [2307.16789] [QA].
  • UniVTG: Towards Unified Video-Language Temporal Grounding - [2307.16715] [QA].
  • DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation - [2307.16687] [QA].
  • Guiding Image Captioning Models Toward More Specific Captions - [2307.16686] [QA].
  • Graph Structure from Point Clouds: Geometric Attention is All You Need - [2307.16662] [QA].
  • CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification - [2307.16634] [QA].
  • FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration - [2307.16617] [QA].
  • Transferable Decoding with Visual Entities for Zero-Shot Image Captioning - [2307.16525] [QA].
  • Towards General Low-Light Raw Noise Synthesis and Modeling - [2307.16508] [QA].
  • MovieChat: From Dense Token to Sparse Memory for Long Video Understanding - [2307.16449] [QA].
  • DRAW: Defending Camera-shooted RAW against Image Manipulation - [2307.16418] [QA].
  • DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization - [2307.16415] [QA].
  • Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks - [2307.16395] [QA].
  • JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery - [2307.16377] [QA].
  • LP-MusicCaps: LLM-Based Pseudo Music Captioning - [2307.16372] [QA].
  • AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? - [2307.16368] [QA].
  • Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples - [2307.16361] [QA].
  • Evaluating ChatGPT and GPT-4 for Visual Programming - [2308.02522] [QA].
  • Unified Model for Image, Video, Audio and Language Tasks - [2307.16184] [QA].
  • Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models - [2307.16180] [QA].
  • SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension - [2307.16125] [QA].
  • Separate Scene Text Detector for Unseen Scripts is Not All You Need - [2307.15991] [QA].
  • XMem++: Production-level Video Segmentation From Few Annotated Frames - [2307.15958] [QA].
  • CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation - [2307.15942] [QA].
  • What can Discriminator do? Towards Box-free Ownership Verification of Generative Adversarial Network - [2307.15860] [QA].
  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control - [2307.15818] [QA].
  • The Hydra Effect: Emergent Self-repair in Language Model Computations - [2307.15771] [QA].
  • MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking - [2307.15700] [QA].
  • Scaling Data Generation in Vision-and-Language Navigation - [2307.15644] [QA].
  • Robust Distortion-free Watermarks for Language Models - [2307.15593] [QA].
  • Beating Backdoor Attack at Its Own Game - [2307.15539] [QA].
  • Exploring Format Consistency for Instruction Tuning - [2307.15504] [QA].
  • FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines - [2307.15475] [QA].
  • Is One Epoch All You Need For Multi-Fidelity Hyperparameter Optimization? - [2307.15422] [QA].
  • Uncertainty-aware Unsupervised Multi-Object Tracking - [2307.15409] [QA].
  • Supervised Homography Learning with Realistic Dataset Generation - [2307.15353] [QA].
  • Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - [2307.15337] [QA].
  • Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF - [2307.15333] [QA].
  • TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts - [2307.15324] [QA].
  • Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification - [2307.15254] [QA].
  • Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback - [2307.15217] [QA].
  • PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization - [2307.15199] [QA].
  • Med-Flamingo: a Multimodal Medical Few-shot Learner - [2307.15189] [QA].
  • Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields - [2307.15131] [QA].
  • To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation - [2307.15063] [QA].
  • Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation - [2308.07931] [QA].
  • Learning Depth Estimation for Transparent and Mirror Surfaces - [2307.15052] [QA].
  • Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models - [2307.15049] [QA].
  • Universal and Transferable Adversarial Attacks on Aligned Language Models - [2307.15043] [QA].
  • TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis - [2307.15042] [QA].
  • Diverse Inpainting and Editing with GAN Inversion - [2307.15033] [QA].
  • SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark - [2307.15020] [QA].
  • How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges - [2307.15016] [QA].
  • Scaling TransNormer to 175 Billion Parameters - [2307.14995] [QA].
  • S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents - [2307.14984] [QA].
  • Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models - [2307.14971] [QA].
  • PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback - [2307.14936] [QA].
  • Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals - [2308.02510] [QA].
  • Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning - [2307.14786] [QA].
  • Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining - [2307.14768] [QA].
  • Test Time Adaptation for Blind Image Quality Assessment - [2307.14735] [QA].
  • P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds - [2307.14726] [QA].
  • Pre-training Vision Transformers with Very Limited Synthesized Images - [2307.14710] [QA].
  • Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation - [2307.14709] [QA].
  • 360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking - [2307.14630] [QA].
  • NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection - [2307.14620] [QA].
  • TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation - [2307.14611] [QA].
  • Clustering based Point Cloud Representation Learning for 3D Analysis - [2307.14605] [QA].
  • Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition - [2307.14535] [QA].
  • MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation - [2307.14460] [QA].
  • Three Bricks to Consolidate Watermarks for Large Language Models - [2308.00113] [QA].
  • MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation - [2307.14336] [QA].
  • WavJourney: Compositional Audio Creation with Large Language Models - [2307.14335] [QA].
  • Towards Generalist Biomedical AI - [2307.14334] [QA].
  • G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory - [2307.14277] [QA].
  • Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences - [2307.14225] [QA].
  • ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation - [2307.14187] [QA].
  • Creative Birds: Self-Supervised Single-View 3D Style Transfer - [2307.14127] [QA].
  • Leveraging Implicit Feedback from Deployment Data in Dialogue - [2307.14117] [QA].
  • Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching - [2307.14071] [QA].
  • Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models - [2307.14061] [QA].
  • 3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability - [2307.14051] [QA].
  • Controllable Guide-Space for Generalizable Face Forgery Detection - [2307.14039] [QA].
  • Adaptive Frequency Filters As Efficient Global Token Mixers - [2307.14008] [QA].
  • Tracking Anything in High Quality - [2307.13974] [QA].
  • AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception - [2307.13933] [QA].
  • Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception - [2307.13929] [QA].
  • trajdata: A Unified Interface to Multiple Human Trajectory Datasets - [2307.13924] [QA].
  • Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation - [2307.13908] [QA].
  • WebArena: A Realistic Web Environment for Building Autonomous Agents - [2307.13854] [QA].
  • How to Scale Your EMA - [2307.13813] [QA].
  • E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning - [2307.13770] [QA].
  • PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single View - [2307.13756] [QA].
  • Foundational Models Defining a New Era in Vision: A Survey and Outlook - [2307.13721] [QA].
  • Composite Diffusion | whole >= Σparts - [2307.13720] [QA].
  • ARB: Advanced Reasoning Benchmark for Large Language Models - [2307.13692] [QA].
  • RecursiveDet: End-to-End Region-based Recursive Object Detection - [2307.13619] [QA].
  • Model Calibration in Dense Classification with Adaptive Label Perturbation - [2307.13539] [QA].
  • Spectrum-guided Multi-granularity Referring Video Object Segmentation - [2307.13537] [QA].
  • Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection - [2307.13529] [QA].
  • FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios - [2307.13528] [QA].
  • Weakly-supervised 3D Pose Transfer with Keypoints - [2307.13459] [QA].
  • Predicting Code Coverage without Execution - [2307.13383] [QA].
  • Unmasking Anomalies in Road-Scene Segmentation - [2307.13316] [QA].
  • LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition - [2307.13269] [QA].
  • Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network - [2307.13254] [QA].
  • GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers - [2307.13251] [QA].
  • Strivec: Sparse Tri-Vector Radiance Fields - [2307.13226] [QA].
  • GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping - [2307.13204] [QA].
  • Contrastive Example-Based Control - [2307.13101] [QA].
  • LLM-Rec: Personalized Recommendation via Prompting Large Language Models - [2307.15780] [QA].
  • 3D-LLM: Injecting the 3D World into Large Language Models - [2307.12981] [QA].
  • A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models - [2307.12980] [QA].
  • Evaluating the Ripple Effects of Knowledge Editing in Language Models - [2307.12976] [QA].
  • DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting - [2307.12972] [QA].
  • Aligning Large Language Models with Human: A Survey - [2307.12966] [QA].
  • RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment - [2307.12950] [QA].
  • GridMM: Grid Memory Map for Vision-and-Language Navigation - [2307.12907] [QA].
  • A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis - [2307.12856] [QA].
  • Multiscale Video Pretraining for Long-Term Activity Forecasting - [2307.12854] [QA].
  • Fast Full-frame Video Stabilization with Iterative Optimization - [2307.12774] [QA].
  • COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts - [2307.12730] [QA].
  • Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction - [2307.12729] [QA].
  • MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features - [2307.12698] [QA].
  • PG-RCNN: Semantic Surface Point Generation for 3D Object Detection - [2307.12637] [QA].
  • CTVIS: Consistent Training for Online Video Instance Segmentation - [2307.12616] [QA].
  • Less is More: Focus Attention for Efficient DETR - [2307.12612] [QA].
  • PRIOR: Prototype Representation Joint Learning from Medical Images and Reports - [2307.12577] [QA].
  • A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation - [2307.12574] [QA].
  • Interpolating between Images with Diffusion Models - [2307.12560] [QA].
  • PUMA: Secure Inference of LLaMA-7B in Five Minutes - [2307.12533] [QA].
  • Cross Contrasting Feature Perturbation for Domain Generalization - [2307.12502] [QA].
  • TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition - [2307.12493] [QA].
  • Rethinking Data Distillation: Do Not Overlook Calibration - [2307.12463] [QA].
  • ProtoFL: Unsupervised Federated Learning via Prototypical Distillation - [2307.12450] [QA].
  • Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection - [2307.12427] [QA].
  • Testing Hateful Speeches against Policies - [2307.12418] [QA].
  • Learning Navigational Visual Representations with Semantic Map Supervision - [2307.12335] [QA].
  • TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering - [2307.12291] [QA].
  • Downstream-agnostic Adversarial Examples - [2307.12280] [QA].
  • Geometry-Aware Adaptation for Pretrained Models - [2307.12226] [QA].
  • LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference - [2307.12217] [QA].
  • LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction - [2307.12194] [QA].
  • Optimized Network Architectures for Large Language Model Training with Billions of Parameters - [2307.12169] [QA].
  • Hallucination Improves the Performance of Unsupervised Visual Representation Learning - [2307.12168] [QA].
  • DIP-RL: Demonstration-Inferred Preference Learning in Minecraft - [2307.12158] [QA].
  • Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes - [2307.12101] [QA].
  • Discovering Spatio-Temporal Rationales for Video Question Answering - [2307.12058] [QA].
  • On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement - [2307.12027] [QA].
  • Learning Vision-and-Language Navigation from YouTube Videos - [2307.11984] [QA].
  • Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels? - [2307.11978] [QA].
  • CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots - [2307.11865] [QA].
  • HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness - [2307.11823] [QA].
  • Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts - [2307.11661] [QA].
  • OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? - [2307.11636] [QA].
  • Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation - [2307.11545] [QA].
  • CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields - [2307.11526] [QA].
  • CORE: Cooperative Reconstruction for Multi-Agent Perception - [2307.11514] [QA].
  • SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection - [2307.11477] [QA].
  • Distribution Shift Matters for Knowledge Distillation with Webly Collected Images - [2307.11469] [QA].
  • Strip-MLP: Efficient Token Interaction for Vision MLP - [2307.11458] [QA].
  • Prompting Large Language Models with Speech Recognition Abilities - [2307.11795] [QA].
  • FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields - [2307.11418] [QA].
  • Deep Directly-Trained Spiking Neural Networks for Object Detection - [2307.11411] [QA].
  • Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning - [2307.11410] [QA].
  • Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition - [2307.11404] [QA].
  • CLR: Channel-wise Lightweight Reprogramming for Continual Learning - [2307.11386] [QA].
  • What can a Single Attention Layer Learn? A Study Through the Random Features Lens - [2307.11353] [QA].
  • Tuning Pre-trained Model via Moment Probing - [2307.11342] [QA].
  • Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields - [2307.11335] [QA].
  • DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport - [2307.11308] [QA].
  • PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring - [2307.11299] [QA].
  • MAS: Towards Resource-Efficient Federated Multiple-Task Learning - [2307.11285] [QA].
  • Brain2Music: Reconstructing Music from Human Brain Activity - [2307.11078] [QA].
  • AlignDet: Aligning Pre-training and Fine-tuning in Object Detection - [2307.11077] [QA].
  • Cascade-DETR: Delving into High-Quality Universal Object Detection - [2307.11035] [QA].
  • General Image-to-Image Translation with One-Shot Image Guidance - [2307.14352] [QA].
  • Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image - [2307.10984] [QA].
  • Improving Online Lane Graph Extraction by Object-Lane Clustering - [2307.10947] [QA].
  • Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery - [2307.10943] [QA].
  • PASTA: Pretrained Action-State Transformer Agents - [2307.10936] [QA].
  • FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets - [2307.10928] [QA].
  • Diffusion Sampling with Momentum for Mitigating Divergence Artifacts - [2307.11118] [QA].
  • The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning - [2307.10907] [QA].
  • BlendFace: Re-designing Identity Encoders for Face-Swapping - [2307.10854] [QA].
  • BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion - [2307.10816] [QA].
  • Meta-Transformer: A Unified Framework for Multimodal Learning - [2307.10802] [QA].
  • HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces - [2307.10797] [QA].
  • See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data - [2307.10782] [QA].
  • Urban Radiance Field Representation with Deformable Neural Mesh Primitives - [2307.10776] [QA].
  • Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV - [2307.10713] [QA].
  • Lighting up NeRF via Unsupervised Decomposition and Enhancement - [2307.10664] [QA].
  • SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models - [2307.10635] [QA].
  • Physics-Driven Turbulence Image Restoration with Stochastic Refinement - [2307.10603] [QA].
  • Flatness-Aware Minimization for Domain Generalization - [2307.11108] [QA].
  • Instruction-following Evaluation through Verbalizer Manipulation - [2307.10558] [QA].
  • EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization - [2307.10554] [QA].
  • TokenFlow: Consistent Diffusion Features for Consistent Video Editing - [2307.10373] [QA].
  • DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering - [2307.10173] [QA].
  • DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI - [2307.10172] [QA].
  • Challenges and Applications of Large Language Models - [2307.10169] [QA].
  • LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs - [2307.10168] [QA].
  • Improving Multimodal Datasets with Image Captioning - [2307.10350] [QA].
  • FABRIC: Personalizing Diffusion Models with Iterative Feedback - [2307.10159] [QA].
  • Android in the Wild: A Large-Scale Dataset for Android Device Control - [2307.10088] [QA].
  • Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples - [2307.10062] [QA].
  • MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions - [2307.10008] [QA].
  • Hierarchical Spatio-Temporal Representation Learning for Gait Recognition - [2307.09856] [QA].
  • What do neural networks learn in image classification? A frequency shortcut perspective - [2307.09829] [QA].
  • Density-invariant Features for Distant Point Cloud Registration - [2307.09788] [QA].
  • Text2Layer: Layered Image Generation using Latent Diffusion Model - [2307.09781] [QA].
  • Towards Building More Robust Models with Frequency Bias - [2307.09763] [QA].
  • Generative Prompt Model for Weakly Supervised Object Localization - [2307.09756] [QA].
  • Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation - [2307.09755] [QA].
  • CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation - [2307.10316] [QA].
  • AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks - [2307.09724] [QA].
  • Towards Saner Deep Image Registration - [2307.09696] [QA].
  • GlobalMapper: Arbitrary-Shaped Urban Layout Generation - [2307.09693] [QA].
  • Towards A Unified Agent with Foundation Models - [2307.09668] [QA].
  • Object-aware Gaze Target Detection - [2307.09662] [QA].
  • Promoting Exploration in Memory-Augmented Adam using Critical Momenta - [2307.09638] [QA].
  • Conditional 360-degree Image Synthesis for Immersive Indoor Scene Decoration - [2307.09621] [QA].
  • ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning - [2307.09474] [QA].
  • Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla - [2307.09458] [QA].
  • OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation - [2307.09356] [QA].
  • Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis - [2307.09323] [QA].
  • Biomaker CA: a Biome Maker project using Cellular Automata - [2307.09320] [QA].
  • EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting - [2307.09306] [QA].
  • Llama 2: Open Foundation and Fine-Tuned Chat Models - [2307.09288] [QA].
  • Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding - [2307.09267] [QA].
  • Augmenting CLIP with Improved Visio-Linguistic Reasoning - [2307.09233] [QA].
  • NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF - [2307.09112] [QA].
  • LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition under Label Noise - [2307.09023] [QA].
  • How is ChatGPT's behavior changing over time? - [2307.09009] [QA].
  • Ord2Seq: Regarding Ordinal Regression as Label Sequence Prediction - [2307.09004] [QA].
  • Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond - [2307.08996] [QA].
  • Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels - [2307.08809] [QA].
  • Similarity Min-Max: Zero-Shot Day-Night Domain Adaptation - [2307.08779] [QA].
  • GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution - [2307.08775] [QA].
  • Diffusion Models Beat GANs on Image Classification - [2307.08702] [QA].
  • AlpaGasus: Training A Better Alpaca with Fewer Data - [2307.08701] [QA].
  • Neural Video Depth Stabilizer - [2307.08695] [QA].
  • TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT - [2307.08674] [QA].
  • Retentive Network: A Successor to Transformer for Large Language Models - [2307.08621] [QA].
  • BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs - [2307.08581] [QA].
  • Scale-Aware Modulation Meet Transformer - [2307.08579] [QA].
  • Does Visual Pretraining Help End-to-End Reasoning? - [2307.08506] [QA].
  • BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization - [2307.08504] [QA].
  • Cumulative Spatial Knowledge Distillation for Vision Transformers - [2307.08500] [QA].
  • Differentiable Transportation Pruning - [2307.08483] [QA].
  • SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training - [2307.08476] [QA].
  • Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation - [2307.08448] [QA].
  • DOT: A Distillation-Oriented Trainer - [2307.08436] [QA].
  • On the application of Large Language Models for language teaching and assessment technology - [2307.08393] [QA].
  • Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation - [2307.08388] [QA].
  • Self-supervised Monocular Depth Estimation: Let's Talk About The Weather - [2307.08357] [QA].
  • ShiftNAS: Improving One-shot NAS via Probability Shift - [2307.08300] [QA].
  • Random Boxes Are Open-world Object Detectors - [2307.08249] [QA].
  • Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs - [2307.08197] [QA].
  • Measuring Faithfulness in Chain-of-Thought Reasoning - [2307.13702] [QA].
  • Question Decomposition Improves the Faithfulness of Model-Generated Reasoning - [2307.11768] [QA].
  • Feedback is All You Need: Real-World Reinforcement Learning with Approximate Physics-Based Models - [2307.08168] [QA].
  • Planting a SEED of Vision in Large Language Model - [2307.08041] [QA].
  • Multi-Object Discovery by Low-Dimensional Object Motion - [2307.08027] [QA].
  • Householder Projector for Unsupervised Latent Semantics Discovery - [2307.08012] [QA].
  • Towards Viewpoint-Invariant Visual Recognition via Adversarial Training - [2307.10235] [QA].
  • Language Conditioned Traffic Generation - [2307.07947] [QA].
  • Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling - [2307.07944] [QA].
  • CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion - [2307.07938] [QA].
  • Communicative Agents for Software Development - [2307.07924] [QA].
  • Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training - [2307.07909] [QA].
  • Handwritten and Printed Text Segmentation: A Signature Case Study - [2307.07887] [QA].
  • Unified Adversarial Patch for Cross-modal Attacks in the Physical World - [2307.07859] [QA].
  • Adaptive Nonlinear Latent Transformation for Conditional Face Editing - [2307.07790] [QA].
  • Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer - [2307.07754] [QA].
  • INVE: Interactive Neural Video Editing - [2307.07663] [QA].
  • RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World - [2307.07653] [QA].
  • CoTracker: It is Better to Track Together - [2307.07635] [QA].
  • NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis - [2307.07511] [QA].
  • DreamTeacher: Pretraining Image Backbones with Deep Generative Models - [2307.07487] [QA].
  • Multimodal Distillation for Egocentric Action Recognition - [2307.07483] [QA].
  • Improving Zero-Shot Generalization for CLIP with Synthesized Prompts - [2307.07397] [QA].
  • Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning - [2307.07250] [QA].
  • FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation - [2307.07245] [QA].
  • Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts - [2307.07218] [QA].
  • Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection - [2307.07205] [QA].
  • Learning to Retrieve In-Context Examples for Large Language Models - [2307.07164] [QA].
  • Bootstrapping Vision-Language Learning with Decoupled Language Pre-training - [2307.07063] [QA].
  • DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations - [2307.07047] [QA].
  • HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models - [2307.06949] [QA].
  • In-context Autoencoder for Context Compression in a Large Language Model - [2307.06945] [QA].
  • InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation - [2307.06942] [QA].
  • Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation - [2307.06940] [QA].
  • mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs - [2307.06930] [QA].
  • Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models - [2307.06925] [QA].
  • Generating Benchmarks for Factuality Evaluation of Language Models - [2307.06908] [QA].
  • Copy Is All You Need - [2307.06962] [QA].
  • Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews - [2307.06464] [QA].
  • Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events - [2307.06439] [QA].
  • T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation - [2307.06350] [QA].
  • Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution - [2307.06304] [QA].
  • Instruction Mining: High-Quality Instruction Data Selection for Large Language Models - [2307.06290] [QA].
  • MMBench: Is Your Multi-modal Model an All-around Player? - [2307.06281] [QA].
  • SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning - [2307.06135] [QA].
  • VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View - [2307.06082] [QA].
  • PolyLM: An Open Source Polyglot Large Language Model - [2307.06018] [QA].
  • VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models - [2307.05973] [QA].
  • Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations - [2307.05959] [QA].
  • GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video - [2307.05853] [QA].
  • Towards Robust and Efficient Continual Language Learning - [2307.05741] [QA].
  • Stack More Layers Differently: High-Rank Training Through Low-Rank Updates - [2307.05695] [QA].
  • Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives - [2307.05473] [QA].
  • Self-consistency for open-ended generations - [2307.06857] [QA].
  • EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone - [2307.05463] [QA].
  • Efficient 3D Articulated Human Generation with Layered Surface Volumes - [2307.05462] [QA].
  • Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features - [2307.05454] [QA].
  • Self-Supervised Learning with Lie Symmetries for Partial Differential Equations - [2307.05432] [QA].
  • Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration - [2307.05300] [QA].
  • Generative Pretraining in Multimodality - [2307.05222] [QA].
  • DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks - [2307.05628] [QA].
  • Test-Time Training on Video Streams - [2307.05014] [QA].
  • Monotone deep Boltzmann machines - [2307.04990] [QA].
  • Secrets of RLHF in Large Language Models Part I: PPO - [2307.04964] [QA].
  • Semantic-SAM: Segment and Recognize Anything at Any Granularity - [2307.04767] [QA].
  • SITTA: A Semantic Image-Text Alignment for Image Captioning - [2307.05591] [QA].
  • Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement - [2307.04751] [QA].
  • RoCo: Dialectic Multi-Robot Collaboration with Large Language Models - [2307.04738] [QA].
  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning - [2307.04725] [QA].
  • Large Language Models as General Pattern Machines - [2307.04721] [QA].
  • International Institutions for Advanced AI - [2307.04699] [QA].
  • VampNet: Music Generation via Masked Acoustic Token Modeling - [2307.04686] [QA].
  • AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System - [2307.04577] [QA].
  • Improving Factuality of Abstractive Summarization via Contrastive Reward Learning - [2307.04507] [QA].
  • RLTF: Reinforcement Learning from Unit Test Feedback - [2307.04349] [QA].
  • Convex Decomposition of Indoor Scenes - [2307.04246] [QA].
  • Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View - [2307.04106] [QA].
  • SVIT: Scaling up Visual Instruction Tuning - [2307.04087] [QA].
  • Toward Interactive Dictation - [2307.04008] [QA].
  • On decoder-only architecture for speech-to-text and large language model integration - [2307.03917] [QA].
  • Large Language Models for Supply Chain Optimization - [2307.03875] [QA].
  • Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation - [2307.03869] [QA].
  • AutoDecoding Latent 3D Diffusion Models - [2307.05445] [QA].
  • Equivariant Single View Pose Prediction Via Induced and Restricted Representations - [2307.03704] [QA].
  • Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation - [2307.03659] [QA].
  • GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest - [2307.03601] [QA].
  • One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention - [2307.03576] [QA].
  • Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning - [2307.03486] [QA].
  • Solvent: A Framework for Protein Folding - [2307.04603] [QA].
  • Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning - [2307.03406] [QA].
  • Teaching Arithmetic to Small Transformers - [2307.03381] [QA].
  • BiPhone: Modeling Inter Language Phonetic Influences in Text - [2307.03322] [QA].
  • Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers - [2307.03183] [QA].
  • Lost in the Middle: How Language Models Use Long Contexts - [2307.03172] [QA].
  • Focused Transformer: Contrastive Training for Context Scaling - [2307.03170] [QA].
  • VideoGLUE: Video General Understanding Evaluation of Foundation Models - [2307.03166] [QA].
  • Distilling Large Vision-Language Model with Out-of-Distribution Generalizability - [2307.03135] [QA].
  • Frontier AI Regulation: Managing Emerging Risks to Public Safety - [2307.03718] [QA].
  • A Survey on Evaluation of Large Language Models - [2307.03109] [QA].
  • Improving Retrieval-Augmented Large Language Models via Data Importance Learning - [2307.03027] [QA].
  • Style Over Substance: Evaluation Biases for Large Language Models - [2307.03025] [QA].
  • Contrast Is All You Need - [2307.02882] [QA].
  • What Should Data Science Education Do with Large Language Models? - [2307.02792] [QA].
  • Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts - [2307.02768] [QA].
  • Wireless Multi-Agent Generative AI: From Connected Intelligence to Collective Intelligence - [2307.02757] [QA].
  • SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference - [2307.02628] [QA].
  • LongNet: Scaling Transformers to 1,000,000,000 Tokens - [2307.02486] [QA].
  • Building Cooperative Embodied Agents Modularly with Large Language Models - [2307.02485] [QA].
  • Elastic Decision Transformer - [2307.02484] [QA].
  • Jailbroken: How Does LLM Safety Training Fail? - [2307.02483] [QA].
  • Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks - [2307.02477] [QA].
  • What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? - [2307.02469] [QA].
  • Using Rewrite Strategies for Efficient Functional Automatic Differentiation - [2307.02447] [QA].
  • DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models - [2307.02421] [QA].
  • MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers - [2307.02321] [QA].
  • Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need - [2307.02249] [QA].
  • Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks - [2307.02179] [QA].
  • Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning - [2307.03692] [QA].
  • Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning - [2307.02053] [QA].
  • SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis - [2307.01952] [QA].
  • Physics-based Motion Retargeting from Sparse Inputs - [2307.01938] [QA].
  • Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners - [2307.01928] [QA].
  • Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning - [2307.01849] [QA].
  • Embodied Task Planning with Large Language Models - [2307.01848] [QA].
  • Collaborative Score Distillation for Consistent Visual Synthesis - [2307.04787] [QA].
  • DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation - [2307.01831] [QA].
  • Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification - [2307.01759] [QA].
  • Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data - [2307.01701] [QA].
  • mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding - [2307.02499] [QA].
  • ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour - [2307.01630] [QA].
  • On Hofstadter's G-sequence - [2307.1471] [QA].
  • Hybrid two-level MCMC for Bayesian Inverse Problems - [2307.1463] [QA].
  • Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection - [2307.1462] [QA].
  • Multi-Task Learning Improves Performance In Deep Argument Mining Models - [2307.1401] [QA].
  • EIGER IV: The cool 10$^4$K circumgalactic environment of high-$z$ galaxies reveals remarkably efficient IGM enrichment - [2307.1273] [QA].
  • Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning - [2307.01200] [QA].
  • Segment Anything Meets Point Tracking - [2307.01197] [QA].
  • Variational integrals on Hessian spaces: partial regularity for critical points - [2307.1191] [QA].
  • Characterisation of three-body loss in ${}^{166}$Er and optimised production of large Bose-Einstein condensates - [2307.1245] [QA].
  • Improving Language Plasticity via Pretraining with Active Forgetting - [2307.01163] [QA].
  • SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions - [2307.01139] [QA].
  • MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion - [2307.01097] [QA].
  • Scalable quantum neural networks by few quantum resources - [2307.1017] [QA].
  • Visual Instruction Tuning with Polite Flamingo - [2307.01003] [QA].
  • NOMA-Assisted Grant-Free Transmission: How to Design Pre-Configured SNR Levels? - [2307.0990] [QA].
  • Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset - [2307.00818] [QA].
  • SketchMetaFace: A Learning-based Sketching Interface for High-fidelity 3D Character Face Modeling - [2307.00804] [QA].
  • EmoGen: Eliminating Subjective Bias in Emotional Music Generation - [2307.01229] [QA].
  • JourneyDB: A Benchmark for Generative Image Understanding - [2307.00716] [QA].
  • LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance - [2307.00522] [QA].
  • Almost sure bounds for a weighted Steinhaus random multiplicative function - [2307.0499] [QA].
  • One Copy Is All You Need: Resource-Efficient Streaming of Medical Imaging Data at Scale - [2307.00438] [QA].
  • ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models - [2307.00398] [QA].
  • DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment - [2307.00329] [QA].
  • Personality Traits in Large Language Models - [2307.00184] [QA].

June 2023

  • Meta-training with Demonstration Retrieval for Efficient Few-shot Learning - [2307.00119] [QA].
  • Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control - [2307.00117] [QA].
  • Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing - [2306.17848] [QA].
  • Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors - [2306.17843] [QA].
  • SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs - [2306.17842] [QA].
  • Statler: State-Maintaining Language Models for Embodied Reasoning - [2306.17840] [QA].
  • DisCo: Disentangled Control for Referring Human Dance Generation in Real World - [2307.00040] [QA].
  • Stay on topic with Classifier-Free Guidance - [2306.17806] [QA].
  • Topologically Attributed Graphs for Shape Discrimination - [2306.17805] [QA].
  • The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit - [2306.17759] [QA].
  • Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting - [2306.17563] [QA].
  • Preference Ranking Optimization for Human Alignment - [2306.17492] [QA].
  • ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation - [2306.17319] [QA].
  • Towards Zero-Shot Scale-Aware Monocular Depth Estimation - [2306.17253] [QA].
  • Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors - [2306.17156] [QA].
  • Generate Anything Anywhere in Any Scene - [2306.17154] [QA].
  • Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation - [2306.17115] [QA].
  • LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding - [2306.17107] [QA].
  • End-to-end Autonomous Driving: Challenges and Frontiers - [2306.16927] [QA].
  • BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion - [2306.16940] [QA].
  • DreamDiffusion: Generating High-Quality Images from Brain EEG Signals - [2306.16934] [QA].
  • One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization - [2306.16928] [QA].
  • NeuralFuse: Learning to Improve the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes - [2306.16869] [QA].
  • ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch - [2306.16857] [QA].
  • Benchmarking Large Language Model Capabilities for Conditional Generation - [2306.16793] [QA].
  • Dynamic-Resolution Model Learning for Object Pile Manipulation - [2306.16700] [QA].
  • KITE: Keypoint-Conditioned Policies for Semantic Manipulation - [2306.16605] [QA].
  • An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs - [2306.16601] [QA].
  • LLM Calibration and Automatic Hallucination Detection via Pareto Optimal Self-supervision - [2306.16564] [QA].
  • Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language - [2306.16410] [QA].
  • On the Exploitability of Instruction Tuning - [2306.17194] [QA].
  • Towards Measuring the Representation of Subjective Global Opinions in Language Models - [2306.16388] [QA].
  • Inferring the Goals of Communicating Agents from Actions and Instructions - [2306.16207] [QA].
  • SVNR: Spatially-variant Noise Removal with Denoising Diffusion - [2306.16052] [QA].
  • Positive Label Is All You Need for Multi-Label Classification - [2306.16016] [QA].
  • Accelerating Transducers through Adjacent Token Merging - [2306.16009] [QA].
  • Confidence Ranking for CTR Prediction - [2307.1206] [QA].
  • Subclass-balancing Contrastive Learning for Long-tailed Recognition - [2306.15925] [QA].
  • Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias - [2306.15895] [QA].
  • HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution - [2306.15794] [QA].
  • REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction - [2306.15724] [QA].
  • PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment - [2306.15667] [QA].
  • CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a $10,000 Budget; An Extra $4,000 Unlocks 81.8% Accuracy - [2306.15658] [QA].
  • Asynchronous Algorithmic Alignment with Cocycles - [2306.15632] [QA].
  • LeanDojo: Theorem Proving with Retrieval-Augmented Language Models - [2306.15626] [QA].
  • Extending Context Window of Large Language Models via Positional Interpolation - [2306.15595] [QA].
  • Explainable Multimodal Emotion Reasoning - [2306.15401] [QA].
  • Length Generalization in Arithmetic Transformers - [2306.15400] [QA].
  • 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement - [2306.15354] [QA].
  • MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation - [2306.15253] [QA].
  • Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic - [2306.15195] [QA].
  • MIMIC: Masked Image Modeling with Image Correspondences - [2306.15128] [QA].
  • Understanding In-Context Learning via Supportive Pretraining Data - [2306.15091] [QA].
  • RVT: Robotic View Transformer for 3D Object Manipulation - [2306.14896] [QA].
  • Supervised Pretraining Can Learn In-Context Reinforcement Learning - [2306.14892] [QA].
  • Restart Sampling for Improving Generative Processes - [2306.14878] [QA].
  • Are aligned neural networks adversarially aligned? - [2306.15447] [QA].
  • ViNT: A Foundation Model for Visual Navigation - [2306.14846] [QA].
  • Kosmos-2: Grounding Multimodal Large Language Models to the World - [2306.14824] [QA].
  • MotionGPT: Human Motion as a Foreign Language - [2306.14795] [QA].
  • SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality - [2306.14610] [QA].
  • Aligning Large Multi-Modal Model with Robust Instruction Tuning - [2306.14565] [QA].
  • A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis - [2306.14544] [QA].
  • CEIL: Generalized Contextual Imitation Learning - [2306.14534] [QA].
  • ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks - [2306.14525] [QA].
  • RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools - [2306.14447] [QA].
  • DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing - [2306.14435] [QA].
  • Faster Segment Anything: Towards Lightweight SAM for Mobile Applications - [2306.14289] [QA].
  • BiFF: Bi-level Future Fusion with Polyline-based Coordinate for Interactive Trajectory Prediction - [2306.14161] [QA].
  • DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data - [2306.14153] [QA].
  • Language models are weak learners - [2306.14101] [QA].
  • SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models - [2306.14066] [QA].
  • DesCo: Learning Object Recognition with Rich Language Descriptions - [2306.14060] [QA].
  • H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models - [2306.14048] [QA].
  • Thinking Like an Annotator: Generation of Dataset Labeling Instructions - [2306.14035] [QA].
  • Cross-Validation Is All You Need: A Statistical Approach To Label Noise Estimation - [2306.13990] [QA].
  • Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data - [2306.13840] [QA].
  • LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding - [2306.14924] [QA].
  • Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window - [2306.13776] [QA].
  • Zero-shot spatial layout conditioning for text-to-image diffusion models - [2306.13754] [QA].
  • Bring Your Own Data! Self-Supervised Evaluation for Large Language Models - [2306.13651] [QA].
  • GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models - [2306.13649] [QA].
  • OpenMask3D: Open-Vocabulary 3D Instance Segmentation - [2306.13631] [QA].
  • System-Level Natural Language Feedback - [2306.13588] [QA].
  • Scaling MLPs: A Tale of Inductive Bias - [2306.13575] [QA].
  • A Survey on Multimodal Large Language Models - [2306.13549] [QA].
  • DreamEditor: Text-Driven 3D Scene Editing with Neural Fields - [2306.13455] [QA].
  • Long-range Language Modeling with Self-retrieval - [2306.13421] [QA].
  • MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models - [2306.13394] [QA].
  • Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces - [2306.13091] [QA].
  • Continuous Layout Editing of Single Images with Diffusion Models - [2306.13078] [QA].
  • Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing - [2306.12929] [QA].
  • AudioPaLM: A Large Language Model That Can Speak and Listen - [2306.12925] [QA].
  • Learning from Visual Observation via Offline Pretrained State-to-Go Transformer - [2306.12860] [QA].
  • Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields - [2306.12760] [QA].
  • SoftGPT: Learn Goal-oriented Soft Object Manipulation Skills by Generative Pre-trained Heterogeneous Graph Transformer - [2306.12677] [QA].
  • From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought - [2306.12672] [QA].
  • Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities - [2306.12609] [QA].
  • Local 3D Editing via 3D Distillation of CLIP Knowledge - [2306.12570] [QA].
  • FFCV: Accelerating Training by Removing Data Bottlenecks - [2306.12517] [QA].
  • Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference - [2306.12509] [QA].
  • DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation - [2306.12422] [QA].
  • OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents - [2306.16527] [QA].
  • Fast Segment Anything - [2306.12156] [QA].
  • Mass-Producing Failures of Multimodal Systems with Language Models - [2306.12105] [QA].
  • HSR-Diff:Hyperspectral Image Super-Resolution via Conditional Diffusion Models - [2306.12085] [QA].
  • EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations - [2306.12059] [QA].
  • Training Transformers with 4-bit Integers - [2306.11987] [QA].
  • Opportunities and Risks of LLMs for Scalable Deliberation with Polis - [2306.11932] [QA].
  • Randomized Quantization is All You Need for Differential Privacy in Federated Learning - [2306.11913] [QA].
  • SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling - [2306.11886] [QA].
  • Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision - [2306.11719] [QA].
  • RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation - [2306.11706] [QA].
  • Textbooks Are All You Need - [2306.11644] [QA].
  • Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion - [2306.11593] [QA].
  • HomeRobot: Open-Vocabulary Mobile Manipulation - [2306.11565] [QA].
  • Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs - [2306.11536] [QA].
  • RM-PRT: Realistic Robotic Manipulation Simulator and Benchmark with Progressive Reasoning Tasks - [2306.11335] [QA].
  • Dynamic Perceiver for Efficient Visual Recognition - [2306.11248] [QA].
  • Quilt-1M: One Million Image-Text Pairs for Histopathology - [2306.11207] [QA].
  • Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset - [2306.11167] [QA].
  • FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation - [2306.11046] [QA].
  • RepoFusion: Training Code Models to Understand Your Repository - [2306.10998] [QA].
  • BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models - [2306.10968] [QA].
  • MotionGPT: Finetuned LLMs are General-Purpose Motion Generators - [2306.10900] [QA].
  • 3D VR Sketch Guided 3D Shape Prototyping and Exploration - [2306.10830] [QA].
  • Multitrack Music Transcription with a Time-Frequency Perceiver - [2306.10785] [QA].
  • Guiding Language Models of Code with Global Context using Monitors - [2306.10763] [QA].
  • UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning - [2306.10543] [QA].
  • Point-Cloud Completion with Pretrained Text-to-image Diffusion Models - [2306.10533] [QA].
  • CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents - [2306.10376] [QA].
  • GLIMMER: generalized late-interaction memory reranker - [2306.10231] [QA].
  • ZeRO++: Extremely Efficient Collective Communication for Giant Model Training - [2306.10209] [QA].
  • Meta-Personalizing Vision-Language Models to Find Named Instances in Video - [2306.10169] [QA].
  • MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing - [2306.10012] [QA].
  • CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search - [2306.10008] [QA].
  • Robot Learning with Sensorimotor Pre-training - [2306.10007] [QA].
  • Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering - [2306.09996] [QA].
  • Evaluating Superhuman Models with Consistency Checks - [2306.09983] [QA].
  • LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning - [2306.09910] [QA].
  • Demystifying GPT Self-Repair for Code Generation - [2306.09896] [QA].
  • AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation - [2306.09864] [QA].
  • Full Parameter Fine-tuning for Large Language Models with Limited Resources - [2306.09782] [QA].
  • Gradient is All You Need? - [2306.09778] [QA].
  • Scaling Open-Vocabulary Object Detection - [2306.09683] [QA].
  • OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning - [2306.09682] [QA].
  • CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models - [2306.09635] [QA].
  • CAJun: Continuous Adaptive Jumping using a Learned Centroidal Controller - [2306.09557] [QA].
  • Block-State Transformer - [2306.09539] [QA].
  • Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models - [2306.11732] [QA].
  • Inverse Scaling: When Bigger Isn't Better - [2306.09479] [QA].
  • Explore, Establish, Exploit: Red Teaming Language Models from Scratch - [2306.09442] [QA].
  • Seeing the World through Your Eyes - [2306.09348] [QA].
  • UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video - [2306.09349] [QA].
  • Rosetta Neurons: Mining the Common Units in a Model Zoo - [2306.09346] [QA].
  • Evaluating Data Attribution for Text-to-Image Models - [2306.09345] [QA].
  • Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis - [2306.09341] [QA].
  • DreamHuman: Animatable 3D Avatars from Text - [2306.09329] [QA].
  • Language-Guided Music Recommendation for Video via Prompt Analogies - [2306.09327] [QA].
  • Neural Relighting with Subsurface Scattering by Learning the Radiance Transfer Gradient - [2306.09322] [QA].
  • Diffusion Models for Zero-Shot Open-Vocabulary Segmentation - [2306.09316] [QA].
  • Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind - [2306.09299] [QA].
  • KoLA: Carefully Benchmarking World Knowledge of Large Language Models - [2306.09296] [QA].
  • A9 Intersection Dataset: All You Need for Urban 3D Camera-LiDAR Roadside Perception - [2306.09266] [QA].
  • LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models - [2306.09265] [QA].
  • Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories - [2306.09224] [QA].
  • CMMLU: Measuring massive multitask language understanding in Chinese - [2306.09212] [QA].
  • NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations - [2306.09109] [QA].
  • Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - [2306.09093] [QA].
  • Behavioral Cloning via Search in Embedded Demonstration Dataset - [2306.09082] [QA].
  • Re-Benchmarking Pool-Based Active Learning for Binary Classification - [2306.08954] [QA].
  • LOVM: Language-Only Vision Model Selection - [2306.08893] [QA].
  • EPIC Fields: Marrying 3D Geometry and Video Understanding - [2306.08731] [QA].
  • VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing - [2306.08707] [QA].
  • Toward Grounded Social Reasoning - [2306.08651] [QA].
  • Language to Rewards for Robotic Skill Synthesis - [2306.08647] [QA].
  • Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models - [2306.08641] [QA].
  • AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn - [2306.08640] [QA].
  • TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement - [2306.08637] [QA].
  • Anticipatory Music Transformer - [2306.08620] [QA].
  • WizardCoder: Empowering Code Large Language Models with Evol-Instruct - [2306.08568] [QA].
  • Knowledge Distillation of Large Language Models - [2306.08543] [QA].
  • TryOnDiffusion: A Tale of Two UNets - [2306.08276] [QA].
  • Contrastive Loss is All You Need to Recover Analogies as Parallel Lines - [2306.08221] [QA].
  • Agile Catching with Whole-Body MPC and Blackbox Policy Learning - [2306.08205] [QA].
  • h2oGPT: Democratizing Large Language Models - [2306.08161] [QA].
  • Large-scale Language Model Rescoring on Long-form Data - [2306.08133] [QA].
  • AVIS: Autonomous Visual Information Seeking with Large Language Models - [2306.08129] [QA].
  • DORSal: Diffusion for Object-centric Representations of Scenes $\textit{et al.}$ - [2306.08068] [QA].
  • Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training - [2306.08055] [QA].
  • Efficient 3D Semantic Segmentation with Superpoint Transformer - [2306.08045] [QA].
  • Neural Scene Chronology - [2306.07970] [QA].
  • GeneCIS: A Benchmark for General Conditional Image Similarity - [2306.07969] [QA].
  • arXiVeri: Automatic table verification with GPT - [2306.07968] [QA].
  • One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning - [2306.07967] [QA].
  • Hidden Biases of End-to-End Driving Models - [2306.07957] [QA].
  • Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation - [2306.07954] [QA].
  • Questioning the Survey Responses of Large Language Models - [2306.07951] [QA].
  • Image Captioners Are Scalable Vision Learners Too - [2306.07915] [QA].
  • WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences - [2306.07906] [QA].
  • Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data - [2306.07881] [QA].
  • Area is all you need: repeatable elements make stronger adversarial attacks - [2306.07768] [QA].
  • E2E-LOAD: End-to-End Long-form Online Action Detection - [2306.07703] [QA].
  • SayTap: Language to Quadrupedal Locomotion - [2306.07580] [QA].
  • Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second - [2306.07552] [QA].
  • TART: A plug-and-play Transformer module for task-agnostic reasoning - [2306.07536] [QA].
  • Require Process Control? LSTMc is all you need! - [2306.07510] [QA].
  • AniFaceDrawing: Anime Portrait Exploration during Your Sketching - [2306.07476] [QA].
  • 3D molecule generation by denoising voxel grids - [2306.07473] [QA].
  • Instant Multi-View Head Capture through Learnable Registration - [2306.07437] [QA].
  • Controlling Text-to-Image Diffusion by Orthogonal Finetuning - [2306.07280] [QA].
  • Scalable 3D Captioning with Pretrained Models - [2306.07279] [QA].
  • Retrieval-Enhanced Contrastive Vision-Text Models - [2306.07196] [QA].
  • Benchmarking Neural Network Training Algorithms - [2306.07179] [QA].
  • Augmenting Language Models with Long-Term Memory - [2306.07174] [QA].
  • Transformers learn through gradual rank increase - [2306.07042] [QA].
  • Small Temperature is All You Need for Differentiable Architecture Search - [2306.06855] [QA].
  • Weakly supervised information extraction from inscrutable handwritten document images - [2306.06823] [QA].
  • Attention, Compilation, and Solver-based Symbolic Analysis are All You Need - [2306.06755] [QA].
  • LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark - [2306.06687] [QA].
  • Face0: Instantaneously Conditioning a Text-to-Image Model on a Face - [2306.06638] [QA].
  • RestGPT: Connecting Large Language Models with Real-World RESTful APIs - [2306.06624] [QA].
  • High-Fidelity Audio Compression with Improved RVQGAN - [2306.06546] [QA].
  • Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration - [2306.06513] [QA].
  • Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions - [2306.06212] [QA].
  • FasterViT: Fast Vision Transformers with Hierarchical Attention - [2306.06189] [QA].
  • Value function estimation using conditional diffusion models for control - [2306.07290] [QA].
  • Realistic Saliency Guided Image Enhancement - [2306.06092] [QA].
  • Mind2Web: Towards a Generalist Agent for the Web - [2306.06070] [QA].
  • GANeRF: Leveraging Discriminators to Optimize Neural Radiance Fields - [2306.06044] [QA].
  • DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds - [2306.06023] [QA].
  • S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput - [2306.06000] [QA].
  • GPT-Calls: Enhancing Call Segmentation and Tagging by Generating Synthetic Conversations via Large Language Models - [2306.07941] [QA].
  • Evaluating the Social Impact of Generative AI Systems in Systems and Society - [2306.05949] [QA].
  • Can Large Language Models Infer Causation from Correlation? - [2306.05836] [QA].
  • Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation - [2306.05783] [QA].
  • Embodied Executable Policy Learning with Language-based Scene Summarization - [2306.05696] [QA].
  • Judging LLM-as-a-judge with MT-Bench and Chatbot Arena - [2306.05685] [QA].
  • On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning - [2306.05637] [QA].
  • Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding - [2306.07944] [QA].
  • BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping - [2306.05544] [QA].
  • Multi-Modal Classifiers for Open-Vocabulary Object Detection - [2306.05493] [QA].
  • Grounded Text-to-Image Synthesis with Attention Refocusing - [2306.05427] [QA].
  • Background Prompting for Improved Object Depth - [2306.05428] [QA].
  • MIMIC-IT: Multi-Modal In-Context Instruction Tuning - [2306.05425] [QA].
  • Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - [2306.05424] [QA].
  • Tracking Everything Everywhere All at Once - [2306.05422] [QA].
  • Scaling Spherical CNNs - [2306.05420] [QA].
  • R-MAE: Regions Meet Masked Autoencoders - [2306.05411] [QA].
  • LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs - [2306.05410] [QA].
  • Matting Anything - [2306.05399] [QA].
  • Modular Visual Question Answering via Code Generation - [2306.05392] [QA].
  • Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models - [2306.05357] [QA].
  • Simple and Controllable Music Generation - [2306.05284] [QA].
  • M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models - [2306.05179] [QA].
  • SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions - [2306.05178] [QA].
  • PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization - [2306.05087] [QA].
  • ScaleDet: A Scalable Multi-Dataset Object Detector - [2306.04849] [QA].
  • Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts - [2306.04845] [QA].
  • Optimizing ViViT Training: Time and Memory Reduction for Action Recognition - [2306.04822] [QA].
  • INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models - [2306.04757] [QA].
  • How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources - [2306.04751] [QA].
  • Improving Open Language Models by Learning from Organic Interactions - [2306.04707] [QA].
  • On the Reliability of Watermarks for Large Language Models - [2306.04634] [QA].
  • Designing a Better Asymmetric VQGAN for StableDiffusion - [2306.04632] [QA].
  • ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections - [2306.04619] [QA].
  • PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts - [2306.04528] [QA].
  • Improving neural network representations using human similarity judgments - [2306.04507] [QA].
  • Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards - [2306.04488] [QA].
  • M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning - [2306.04387] [QA].
  • Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - [2306.04362] [QA].
  • MobileNMT: Enabling Translation in 15MB and 30ms - [2306.04235] [QA].
  • Benchmarking Foundation Models with Language-Model-as-an-Examiner - [2306.04181] [QA].
  • Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions - [2306.04140] [QA].
  • Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer - [2306.04076] [QA].
  • Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings - [2306.04064] [QA].
  • LLMZip: Lossless Text Compression using Large Language Models - [2306.04050] [QA].
  • Certified Reasoning with Language Models - [2306.04031] [QA].
  • Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks - [2306.04009] [QA].
  • ATT3D: Amortized Text-to-3D Object Synthesis - [2306.07349] [QA].
  • ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory - [2306.03901] [QA].
  • Emergent Correspondence from Image Diffusion - [2306.03881] [QA].
  • Deductive Verification of Chain-of-Thought Reasoning - [2306.03872] [QA].
  • LEACE: Perfect linear concept erasure in closed form - [2306.03819] [QA].
  • Learning to Ground Instructional Articles in Videos through Narrations - [2306.03802] [QA].
  • Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach - [2306.03604] [QA].
  • On Pitfalls of Test-Time Adaptation - [2306.03536] [QA].
  • Recognize Anything: A Strong Image Tagging Model - [2306.03514] [QA].
  • Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias - [2306.03509] [QA].
  • Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis - [2306.03504] [QA].
  • A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch - [2306.03484] [QA].
  • Natural Language Commanding via Program Synthesis - [2306.03460] [QA].
  • Large Language Models of Code Fail at Completing Code with Potential Bugs - [2306.03438] [QA].
  • GaitGCI: Generative Counterfactual Intervention for Gait Recognition - [2306.03428] [QA].
  • DVIS: Decoupled Video Instance Segmentation Framework - [2306.03413] [QA].
  • Vid2Act: Activate Offline Videos for Visual RL - [2306.03360] [QA].
  • Stabilizing Contrastive RL: Techniques for Offline Goal Reaching - [2306.03346] [QA].
  • Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents - [2306.03314] [QA].
  • A Static Evaluation of Code Completion by Large Language Models - [2306.03203] [QA].
  • Neuralangelo: High-Fidelity Neural Surface Reconstruction - [2306.03092] [QA].
  • MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion - [2306.03083] [QA].
  • InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models - [2306.03082] [QA].
  • HeadSculpt: Crafting 3D Head Avatars with Text - [2306.03038] [QA].
  • PokemonChat: Auditing ChatGPT for Pokémon Universe Knowledge - [2306.03024] [QA].
  • BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields - [2306.03000] [QA].
  • PolyVoice: Language Models for Speech to Speech Translation - [2306.02982] [QA].
  • Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding - [2306.02858] [QA].
  • Scene as Occupancy - [2306.02851] [QA].
  • Orca: Progressive Learning from Complex Explanation Traces of GPT-4 - [2306.02707] [QA].
  • LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion - [2306.02561] [QA].
  • RecAgent: A Novel Simulation Paradigm for Recommender Systems - [2306.02552] [QA].
  • PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model - [2306.02531] [QA].
  • A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models - [2306.02254] [QA].
  • SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model - [2306.02245] [QA].
  • Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models - [2306.02080] [QA].
  • Prompting Is All You Need: Automated Android Bug Replay with Large Language Models - [2306.01987] [QA].
  • AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap - [2306.01941] [QA].
  • RITA: Group Attention is All You Need for Timeseries Analytics - [2306.01926] [QA].
  • The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation - [2306.01923] [QA].
  • VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores - [2306.01879] [QA].
  • Probabilistic Adaptation of Text-to-Video Models - [2306.01872] [QA].
  • Binary and Ternary Natural Language Generation - [2306.01841] [QA].
  • DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model - [2306.01736] [QA].
  • Evaluating Language Models for Mathematics through Interactions - [2306.01694] [QA].
  • Fine-Grained Human Feedback Gives Better Rewards for Language Model Training - [2306.01693] [QA].
  • Harnessing large-language models to generate private synthetic text - [2306.01684] [QA].
  • STUDY: Socially Aware Temporally Causal Decoder Recommender Systems - [2306.07946] [QA].
  • Segment Anything in High Quality - [2306.01567] [QA].
  • Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection - [2306.01438] [QA].
  • An Empirical Study on Challenging Math Problem Solving with GPT-4 - [2306.01337] [QA].
  • LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning - [2306.01293] [QA].
  • Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators - [2306.01242] [QA].
  • Faster Causal Attention Over Large Sequences Through Sparse Flash Attention - [2306.01160] [QA].
  • The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only - [2306.01116] [QA].
  • Reimagining Retrieval Augmented Language Models for Answering Queries - [2306.01061] [QA].
  • Diffusion Self-Guidance for Controllable Image Generation - [2306.00986] [QA].
  • StyleDrop: Text-to-Image Generation in Any Style - [2306.00983] [QA].
  • StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners - [2306.00984] [QA].
  • SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds - [2306.00980] [QA].
  • AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - [2306.00978] [QA].
  • ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation - [2306.00971] [QA].
  • The Hidden Language of Diffusion Models - [2306.00966] [QA].
  • Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation - [2306.00964] [QA].
  • The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects - [2306.00956] [QA].
  • Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance - [2306.00943] [QA].
  • STEVE-1: A Generative Model for Text-to-Behavior in Minecraft - [2306.00937] [QA].
  • Inserting Anybody in Diffusion Models via Celeb Basis - [2306.00926] [QA].
  • T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation - [2306.00905] [QA].
  • LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - [2306.00890] [QA].
  • Birth of a Transformer: A Memory Viewpoint - [2306.00802] [QA].
  • Microstructure quality control of steels using deep learning - [2306.0797] [QA].
  • GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks? - [2306.00693] [QA].
  • Wuerstchen: Efficient Pretraining of Text-to-Image Models - [2306.00637] [QA].
  • ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing - [2306.00622] [QA].
  • Exploring Open-Vocabulary Semantic Segmentation without Human Labels - [2306.00450] [QA].
  • Example-based Motion Synthesis via Generative Motion Matching - [2306.00378] [QA].
  • Thought Cloning: Learning to Think while Acting by Imitating Human Thinking - [2306.00323] [QA].
  • Rethinking Model Evaluation as Narrowing the Socio-Technical Gap - [2306.03100] [QA].

May 2023

  • From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces - [2306.00245] [QA].
  • Bytes Are All You Need: Transformers Operating Directly On File Bytes - [2306.00238] [QA].
  • SafeDiffuser: Safe Planning with Diffusion Probabilistic Models - [2306.00148] [QA].
  • MuseCoco: Generating Symbolic Music from Text - [2306.00110] [QA].
  • MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training - [2306.00107] [QA].
  • Humans in 4D: Reconstructing and Tracking Humans with Transformers - [2305.20091] [QA].
  • Improving CLIP Training with Language Rewrites - [2305.20088] [QA].
  • Too Large; Data Reduction for Vision-Language Pre-Training - [2305.20087] [QA].
  • Understanding and Mitigating Copying in Diffusion Models - [2305.20086] [QA].
  • Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor - [2305.20082] [QA].
  • Efficient Diffusion Policies for Offline Reinforcement Learning - [2305.20081] [QA].
  • Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust - [2305.20030] [QA].
  • Monotonic Location Attention for Length Generalization - [2305.20019] [QA].
  • Human or Not? A Gamified Approach to the Turing Test - [2305.20010] [QA].
  • Deliberate then Generate: Enhanced Prompting Framework for Text Generation - [2305.19835] [QA].
  • Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models - [2305.19595] [QA].
  • Neural Kernel Surface Reconstruction - [2305.19590] [QA].
  • CodeTF: One-stop Transformer Library for State-of-the-art Code LLM - [2306.00029] [QA].
  • PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning - [2305.19472] [QA].
  • The Impact of Positional Encoding on Length Generalization in Transformers - [2305.19466] [QA].
  • Bigger, Better, Faster: Human-level Atari with human-level efficiency - [2305.19452] [QA].
  • Blockwise Parallel Transformer for Large Context Models - [2305.19370] [QA].
  • AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation - [2305.19245] [QA].
  • Grammar Prompting for Domain-Specific Language Generation with Large Language Models - [2305.19234] [QA].
  • LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images - [2305.19164] [QA].
  • Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate - [2305.19118] [QA].
  • Nested Diffusion Processes for Anytime Image Generation - [2305.19066] [QA].
  • Rank-adaptive spectral pruning of convolutional layers during training - [2305.19059] [QA].
  • StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation - [2305.19012] [QA].
  • Independent Component Alignment for Multi-Task Learning - [2305.19000] [QA].
  • LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus - [2305.18802] [QA].
  • HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance - [2305.18766] [QA].
  • VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions - [2305.18756] [QA].
  • GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - [2305.18752] [QA].
  • Real-World Image Variation by Aligning Diffusion Inversion Chain - [2305.18729] [QA].
  • Faith and Fate: Limits of Transformers on Compositionality - [2305.18654] [QA].
  • Controllable Text-to-Image Generation with GPT-4 - [2305.18583] [QA].
  • PaLI-X: On Scaling up a Multilingual Vision and Language Model - [2305.18565] [QA].
  • Brainformers: Trading Simplicity for Efficiency - [2306.00008] [QA].
  • RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths - [2305.18295] [QA].
  • Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models - [2305.18292] [QA].
  • Direct Preference Optimization: Your Language Model is Secretly a Reward Model - [2305.18290] [QA].
  • Photoswap: Personalized Subject Swapping in Images - [2305.18286] [QA].
  • Contextual Object Detection with Multimodal Large Language Models - [2305.18279] [QA].
  • Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors - [2305.18274] [QA].
  • Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising - [2305.18264] [QA].
  • GlyphControl: Glyph Conditional Control for Visual Text Generation - [2305.18259] [QA].
  • TaleCrafter: Interactive Story Visualization with Multiple Characters - [2305.18247] [QA].
  • Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models - [2305.18189] [QA].
  • Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models - [2305.18507] [QA].
  • Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning - [2305.18499] [QA].
  • BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages - [2305.18098] [QA].
  • Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation - [2305.18474] [QA].
  • DiffRate : Differentiable Compression Rate for Efficient Vision Transformers - [2305.17997] [QA].
  • Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of Weight Residuals - [2305.18425] [QA].
  • Geometric Algebra Transformers - [2305.18415] [QA].
  • KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models - [2305.18373] [QA].
  • Data Minimization at Inference Time - [2305.17593] [QA].
  • Scalable Transformer for PDE Surrogate Modeling - [2305.17560] [QA].
  • The Curse of Recursion: Training on Generated Data Makes Models Forget - [2305.17493] [QA].
  • What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks - [2305.18365] [QA].
  • SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks - [2305.17390] [QA].
  • MPCHAT: Towards Multimodal Persona-Grounded Conversation - [2305.17388] [QA].
  • Augmenting Large Language Model Translators via Translation Memories - [2305.17367] [QA].
  • DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text - [2305.17359] [QA].
  • Fine-Tuning Language Models with Just Forward Passes - [2305.17333] [QA].
  • Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models - [2305.17311] [QA].
  • Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance - [2305.17306] [QA].
  • SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL - [2306.00739] [QA].
  • Generating Images with Multimodal Language Models - [2305.17216] [QA].
  • Large Language Models as Tool Makers - [2305.17126] [QA].
  • Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time - [2305.17118] [QA].
  • High-Fidelity Image Compression with Score-based Generative Models - [2305.18231] [QA].
  • ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing - [2305.17098] [QA].
  • Mindstorms in Natural Language-Based Societies of Mind - [2305.17066] [QA].
  • SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation - [2305.17011] [QA].
  • Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets - [2305.17010] [QA].
  • Three Towers: Flexible Contrastive Learning with Pretrained Image Models - [2305.16999] [QA].
  • Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation - [2305.16985] [QA].
  • Training Socially Aligned Language Models in Simulated Human Society - [2305.16960] [QA].
  • MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies - [2305.16958] [QA].
  • On Evaluating Adversarial Robustness of Large Vision-Language Models - [2305.16934] [QA].
  • MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting - [2305.16896] [QA].
  • Playing repeated games with Large Language Models - [2305.16867] [QA].
  • Randomized Positional Encodings Boost Length Generalization of Transformers - [2305.16843] [QA].
  • Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup - [2305.16817] [QA].
  • Do GPTs Produce Less Literal Translations? - [2305.16806] [QA].
  • Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark - [2305.18212] [QA].
  • A Closer Look at In-Context Learning under Distribution Shifts - [2305.16704] [QA].
  • AdaPlanner: Adaptive Planning from Feedback with Language Models - [2305.16653] [QA].
  • Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing - [2305.16635] [QA].
  • Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models - [2305.16582] [QA].
  • On the Tool Manipulation Capability of Open-source Large Language Models - [2305.16504] [QA].
  • ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image - [2305.16411] [QA].
  • Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory - [2305.17144] [QA].
  • Break-A-Scene: Extracting Multiple Concepts from a Single Image - [2305.16311] [QA].
  • Landmark Attention: Random-Access Infinite Context Length for Transformers - [2305.16300] [QA].
  • Voyager: An Open-Ended Embodied Agent with Large Language Models - [2305.16291] [QA].
  • DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models - [2305.16381] [QA].
  • ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation - [2305.16213] [QA].
  • Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer - [2305.16380] [QA].
  • ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - [2305.16103] [QA].
  • Role-Play with Large Language Models - [2305.16367] [QA].
  • On Architectural Compression of Text-to-Image Diffusion Models - [2305.15798] [QA].
  • Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models - [2305.15779] [QA].
  • On the Planning Abilities of Large Language Models -- A Critical Investigation - [2305.15771] [QA].
  • Efficient Neural Music Generation - [2305.15719] [QA].
  • The False Promise of Imitating Proprietary LLMs - [2305.15717] [QA].
  • PandaGPT: One Model To Instruction-Follow Them All - [2305.16355] [QA].
  • Manifold Diffusion Fields - [2305.15586] [QA].
  • Unsupervised Semantic Correspondence Using Stable Diffusion - [2305.15581] [QA].
  • Lexinvariant Language Models - [2305.16349] [QA].
  • SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning - [2305.15486] [QA].
  • LayoutGPT: Compositional Visual Planning and Generation with Large Language Models - [2305.15393] [QA].
  • Learning high-level visual representations from a child's perspective without strong inductive biases - [2305.15372] [QA].
  • Gorilla: Large Language Model Connected with Massive APIs - [2305.15334] [QA].
  • Visual Programming for Text-to-Image Generation and Evaluation - [2305.15328] [QA].
  • Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy - [2305.15294] [QA].
  • ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers - [2305.15272] [QA].
  • Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration - [2305.15262] [QA].
  • Adaptive Policy Learning to Additional Tasks - [2305.15193] [QA].
  • Policy Learning based on Deep Koopman Representation - [2305.15188] [QA].
  • Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies - [2305.15115] [QA].
  • Dynamic Masking Rate Schedules for MLM Pretraining - [2305.15096] [QA].
  • Is GPT-4 a Good Data Analyst? - [2305.15038] [QA].
  • Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models - [2305.15023] [QA].
  • EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - [2305.15021] [QA].
  • Reasoning with Language Model is Planning with World Model - [2305.14992] [QA].
  • IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models - [2305.14985] [QA].
  • Benchmarking Arabic AI with Large Language Models - [2305.14982] [QA].
  • Assessment of the Reliablity of a Model's Decision by Generalizing Attribution to the Wavelet Domain - [2305.14979] [QA].
  • Discriminator-Guided Multi-step Reasoning with Language Models - [2305.14934] [QA].
  • Leveraging GPT-4 for Automatic Translation Post-Editing - [2305.14878] [QA].
  • PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts - [2305.14839] [QA].
  • Adapting Language Models to Compress Contexts - [2305.14788] [QA].
  • Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models - [2305.14710] [QA].
  • ExpertPrompting: Instructing Large Language Models to be Distinguished Experts - [2305.14688] [QA].
  • Barkour: Benchmarking Animal-level Agility with Quadruped Robots - [2305.14654] [QA].
  • Enabling Large Language Models to Generate Text with Citations - [2305.14627] [QA].
  • Think Before You Act: Decision Transformers with Internal Working Memory - [2305.16338] [QA].
  • Attentiveness to Answer Choices Doesn't Always Entail High QA Accuracy - [2305.14596] [QA].
  • PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents - [2305.14564] [QA].
  • LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond - [2305.14540] [QA].
  • Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement - [2305.14497] [QA].
  • Video Prediction Models as Rewards for Reinforcement Learning - [2305.14343] [QA].
  • Automatic Model Selection with Large Language Models for Reasoning - [2305.14333] [QA].
  • Improving Factuality and Reasoning in Language Models through Multiagent Debate - [2305.14325] [QA].
  • ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models - [2305.14323] [QA].
  • RET-LLM: Towards a General Read-Write Memory for Large Language Models - [2305.14322] [QA].
  • CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation - [2305.14318] [QA].
  • QLoRA: Efficient Finetuning of Quantized LLMs - [2305.14314] [QA].
  • On Learning to Summarize with Large Language Models as References - [2305.14239] [QA].
  • REC-MV: REconstructing 3D Dynamic Cloth from Monocular Videos - [2305.14236] [QA].
  • Enhancing Chat Language Models by Scaling High-quality Instructional Conversations - [2305.14233] [QA].
  • Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks - [2305.14201] [QA].
  • DetGPT: Detect What You Need via Reasoning - [2305.14167] [QA].
  • Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction - [2305.13903] [QA].
  • PaD: Program-aided Distillation Specializes Large Models in Reasoning - [2305.13888] [QA].
  • OlaGPT: Empowering LLMs With Human-like Problem-Solving Abilities - [2305.16334] [QA].
  • Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models - [2305.13840] [QA].
  • Can Large Language Models Infer and Disagree Like Humans? - [2305.13788] [QA].
  • Perception Test: A Diagnostic Benchmark for Multimodal Video Models - [2305.13786] [QA].
  • Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks - [2305.13782] [QA].
  • Aligning Large Language Models through Synthetic Feedback - [2305.13735] [QA].
  • Text Is All You Need: Learning Language Representations for Sequential Recommendation - [2305.13731] [QA].
  • Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration - [2305.13626] [QA].
  • Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning? - [2306.01754] [QA].
  • Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach - [2305.13579] [QA].
  • How Language Model Hallucinations Can Snowball - [2305.13534] [QA].
  • RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text - [2305.13304] [QA].
  • Training Diffusion Models with Reinforcement Learning - [2305.13301] [QA].
  • Interactive Natural Language Processing - [2305.13246] [QA].
  • LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities - [2305.13168] [QA].
  • ControlVideo: Training-free Controllable Text-to-Video Generation - [2305.13077] [QA].
  • Making Language Models Better Tool Learners with Execution Feedback - [2305.13068] [QA].
  • AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation - [2305.13050] [QA].
  • RWKV: Reinventing RNNs for the Transformer Era - [2305.13048] [QA].
  • Textually Pretrained Speech Language Models - [2305.13009] [QA].
  • Boosting Long-tailed Object Detection via Step-wise Learning on Smooth-tail Data - [2305.12833] [QA].
  • Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models - [2305.12620] [QA].
  • GMD: Controllable Human Motion Synthesis via Guided Diffusion Models - [2305.12577] [QA].
  • Conditional Generative Modeling is All You Need for Marked Temporal Point Processes - [2305.12569] [QA].
  • Augmenting Autotelic Agents with Large Language Models - [2305.12487] [QA].
  • Advancing Referring Expression Segmentation Beyond Single Image - [2305.12452] [QA].
  • CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring - [2305.12050] [QA].
  • OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models - [2305.12001] [QA].
  • Exploring the Viability of Synthetic Query Generation for Relevance Prediction - [2305.11944] [QA].
  • XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages - [2305.11938] [QA].
  • Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models - [2305.11870] [QA].
  • Scaling laws for language encoding models in fMRI - [2305.11863] [QA].
  • Multimodal Web Navigation with Instruction-Finetuned Foundation Models - [2305.11854] [QA].
  • Any-to-Any Generation via Composable Diffusion - [2305.11846] [QA].
  • How Does Generative Retrieval Scale to Millions of Passages? - [2305.11841] [QA].
  • SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models - [2305.11840] [QA].
  • Comparing Software Developers with ChatGPT: An Empirical Investigation - [2305.11837] [QA].
  • Pengi: An Audio Language Model for Audio Tasks - [2305.11834] [QA].
  • Cross-Lingual Supervision improves Large Language Models Pre-training - [2305.11778] [QA].
  • Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes - [2305.11772] [QA].
  • Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning - [2305.11759] [QA].
  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing - [2305.11738] [QA].
  • QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations - [2305.11694] [QA].
  • Learning Global-aware Kernel for Image Harmonization - [2305.11676] [QA].
  • Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity - [2305.11675] [QA].
  • Introspective Tips: Large Language Model for In-Context Decision Making - [2305.11598] [QA].
  • Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields - [2305.11588] [QA].
  • ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - [2305.11554] [QA].
  • Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering - [2305.11541] [QA].
  • RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought - [2305.11499] [QA].
  • Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense Persona - [2305.11482] [QA].
  • Towards Human-AI Collaborative Urban Science Research Enabled by Pre-trained Large Language Models - [2305.11418] [QA].
  • Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models - [2305.11364] [QA].
  • RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture - [2305.11337] [QA].
  • Counterfactuals for Design: A Model-Agnostic Method For Design Recommendations - [2305.11308] [QA].
  • Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue - [2305.11271] [QA].
  • Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model - [2305.11176] [QA].
  • VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks - [2305.11175] [QA].
  • Going Denser with Open-Vocabulary Part Segmentation - [2305.11173] [QA].
  • TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models - [2305.11171] [QA].
  • Evidence of Meaning in Language Models Trained on Programs - [2305.11169] [QA].
  • TOME: A Two-stage Approach for Model-based Retrieval - [2305.11161] [QA].
  • LIMA: Less Is More for Alignment - [2305.11206] [QA].
  • UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild - [2305.11147] [QA].
  • SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation - [2305.11130] [QA].
  • mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences - [2305.11129] [QA].
  • LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation - [2305.11116] [QA].
  • PDP: Parameter-free Differentiable Pruning is All You Need - [2305.11203] [QA].
  • DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs - [2309.03907] [QA].
  • Inspecting the Geographical Representativeness of Images from Text-to-Image Models - [2305.11080] [QA].
  • SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation - [2305.11012] [QA].
  • SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities - [2305.11000] [QA].
  • Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold - [2305.10973] [QA].
  • An Android Robot Head as Embodied Conversational Agent - [2305.10945] [QA].
  • A Generalist Dynamics Model for Control - [2305.10912] [QA].
  • VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation - [2305.10874] [QA].
  • TextDiffuser: Diffusion Models as Text Painters - [2305.10855] [QA].
  • 3D Registration with Maximal Cliques - [2305.10854] [QA].
  • LDM3D: Latent Diffusion Model for 3D - [2305.10853] [QA].
  • GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework - [2305.10841] [QA].
  • Listen, Think, and Understand - [2305.10790] [QA].
  • OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding - [2305.10764] [QA].
  • CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training - [2305.10763] [QA].
  • Boost Vision Transformer with GPU-Friendly Sparsity and Quantization - [2305.10727] [QA].
  • Discriminative Diffusion Models as Few-shot Vision and Language Learners - [2305.10722] [QA].
  • Zero-Day Backdoor Attack against Text-to-Image Diffusion Models via Personalization - [2305.10701] [QA].
  • MolXPT: Wrapping Molecules with Text for Generative Pre-training - [2305.10688] [QA].
  • Language Models Meet World Models: Embodied Experiences Enhance Language Models - [2305.10626] [QA].
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models - [2305.10601] [QA].
  • Instruction Tuned Models are Quick Learners - [2306.05539] [QA].
  • IMAD: IMage-Augmented multi-modal Dialogue - [2305.10512] [QA].
  • FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention - [2305.10431] [QA].
  • Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models - [2305.10474] [QA].
  • DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining - [2305.10429] [QA].
  • SLiC-HF: Sequence Likelihood Calibration with Human Feedback - [2305.10425] [QA].
  • PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering - [2305.10415] [QA].
  • PaLM 2 Technical Report - [2305.10403] [QA].
  • What You See is What You Read? Improving Text-Image Alignment Evaluation - [2305.10400] [QA].
  • Elaborative Simplification as Implicit Questions Under Discussion - [2305.10387] [QA].
  • Evaluating Object Hallucination in Large Vision-Language Models - [2305.10355] [QA].
  • CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo - [2305.10320] [QA].
  • Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability - [2305.10266] [QA].
  • MemoryBank: Enhancing Large Language Models with Long-Term Memory - [2305.10250] [QA].
  • Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations - [2305.10172] [QA].
  • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback - [2305.10142] [QA].
  • Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers - [2305.10018] [QA].
  • DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning - [2305.10005] [QA].
  • Dual Semantic Knowledge Composed Multimodal Dialog Systems - [2305.09990] [QA].
  • Smart Word Suggestions for Writing Assistance - [2305.09975] [QA].
  • Towards Generalist Robots: A Promising Paradigm via Generative Simulation - [2305.10455] [QA].
  • Explaining black box text modules in natural language with language models - [2305.09863] [QA].
  • CoEdIT: Text Editing by Task-Specific Instruction Tuning - [2305.09857] [QA].
  • ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing - [2305.09770] [QA].
  • Application-Agnostic Language Modeling for On-Device ASR - [2305.09764] [QA].
  • NerfBridge: Bringing Real-time, Online Neural Radiance Field Training to Robotics - [2305.09761] [QA].
  • A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot - [2305.09758] [QA].
  • Understanding 3D Object Interaction from a Single Image - [2305.09664] [QA].
  • Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation - [2305.09662] [QA].
  • FitMe: Deep Photorealistic 3D Morphable Model Avatars - [2305.09641] [QA].
  • SoundStorm: Efficient Parallel Audio Generation - [2305.09636] [QA].
  • Towards Expert-Level Medical Question Answering with Large Language Models - [2305.09617] [QA].
  • Large Language Models are Built-in Autoregressive Search Engines - [2305.09612] [QA].
  • Cooperation Is All You Need - [2305.10449] [QA].
  • AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation - [2305.09515] [QA].
  • Online Continual Learning Without the Storage Constraint - [2305.09253] [QA].
  • Dual-Alignment Pre-training for Cross-lingual Sentence Embedding - [2305.09148] [QA].
  • Pre-Training to Learn in Context - [2305.09137] [QA].
  • SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification - [2305.09062] [QA].
  • MV-Map: Offboard HD-Map Generation with Multi-view Consistency - [2305.08851] [QA].
  • Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts - [2305.08850] [QA].
  • Small Models are Valuable Plug-ins for Large Language Models - [2305.08848] [QA].
  • RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs - [2305.08844] [QA].
  • Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks - [2305.08842] [QA].
  • Attacking Perceptual Similarity Metrics - [2305.08840] [QA].
  • AutoRecon: Automated 3D Object Discovery and Reconstruction - [2305.08810] [QA].
  • Interpretability at Scale: Identifying Causal Mechanisms in Alpaca - [2305.08809] [QA].
  • A Reproducible Extraction of Training Images from Diffusion Models - [2305.08694] [QA].
  • Natural Language Decomposition and Interpretation of Complex Utterances - [2305.08677] [QA].
  • DarkBERT: A Language Model for the Dark Side of the Internet - [2305.08596] [QA].
  • Common Diffusion Noise Schedules and Sample Steps are Flawed - [2305.08891] [QA].
  • TESS: Text-to-Text Self-Conditioned Simplex Diffusion - [2305.08379] [QA].
  • Symbol tuning improves in-context learning in language models - [2305.08298] [QA].
  • ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding - [2305.08275] [QA].
  • A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment - [2305.08200] [QA].
  • GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content - [2305.07969] [QA].
  • Leveraging Large Language Models in Conversational Recommender Systems - [2305.07961] [QA].
  • CodeT5+: Open Code Large Language Models for Code Understanding and Generation - [2305.07922] [QA].
  • Improving Small Language Models on PubMedQA via Generative Data Augmentation - [2305.07804] [QA].
  • ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems - [2305.07797] [QA].
  • TinyStories: How Small Can Language Models Be and Still Speak Coherent English? - [2305.07759] [QA].
  • In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making - [2305.07722] [QA].
  • What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization - [2305.07615] [QA].
  • Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation - [2305.07609] [QA].
  • Measuring Progress in Fine-grained Vision-and-Language Understanding - [2305.07558] [QA].
  • BlendFields: Few-Shot Example-Driven Facial Modeling - [2305.07514] [QA].
  • ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4 - [2305.07490] [QA].
  • Surfacing Biases in Large Language Models using Contrastive Input Decoding - [2305.07378] [QA].
  • Better speech synthesis through scaling - [2305.07243] [QA].
  • MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition - [2305.07214] [QA].
  • MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers - [2305.07185] [QA].
  • Masked Audio Text Encoders are Effective Multi-Modal Rescorers - [2305.07677] [QA].
  • Towards best practices in AGI safety and governance: A survey of expert opinion - [2305.07153] [QA].
  • EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention - [2305.07027] [QA].
  • Simple Token-Level Confidence Improves Caption Correctness - [2305.07021] [QA].
  • An Inverse Scaling Law for CLIP Training - [2305.07017] [QA].
  • Exploiting Diffusion Prior for Real-World Image Super-Resolution - [2305.07015] [QA].
  • Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers - [2305.07011] [QA].
  • Learning the Visualness of Text Using Large Vision-Language Models - [2305.10434] [QA].
  • Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting - [2305.07004] [QA].
  • Universal Source Separation with Weakly Labelled Data - [2305.07447] [QA].
  • CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model - [2305.06908] [QA].
  • A Category-theoretical Meta-analysis of Definitions of Disentanglement - [2305.06886] [QA].
  • Optimizing Memory Mapping Using Deep Reinforcement Learning - [2305.07440] [QA].
  • Distracting Downpour: Adversarial Weather Attacks for Motion Estimation - [2305.06716] [QA].
  • V2Meow: Meowing to the Visual Beat via Music Generation - [2305.06594] [QA].
  • Chain-of-Dictionary Prompting Elicits Translation in Large Language Models - [2305.06575] [QA].
  • How to Index Item IDs for Recommendation Foundation Models - [2305.06569] [QA].
  • Segment and Track Anything - [2305.06558] [QA].
  • Domain Incremental Lifelong Learning in an Open World - [2305.06555] [QA].
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning - [2305.06500] [QA].
  • Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction - [2305.06474] [QA].
  • Perpetual Humanoid Control for Real-time Simulated Avatars - [2305.06456] [QA].
  • Bot or Human? Detecting ChatGPT Imposters with A Single Question - [2305.06424] [QA].
  • LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM - [2305.06404] [QA].
  • HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion - [2305.06356] [QA].
  • VideoChat: Chat-Centric Video Understanding - [2305.06355] [QA].
  • Reconstructing Animatable Categories from Videos - [2305.06351] [QA].
  • Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception - [2305.06324] [QA].
  • Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success) - [2305.06299] [QA].
  • Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era - [2305.06131] [QA].
  • The Compositional Structure of Bayesian Inference - [2305.06112] [QA].
  • Relightify: Relightable 3D Faces from a Single Image via Diffusion Models - [2305.06077] [QA].
  • GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System - [2306.01741] [QA].
  • Privacy-Preserving Recommender Systems with Synthetic Query Generation using Differentially Private Large Language Models - [2305.05973] [QA].
  • Fast Distributed Inference Serving for Large Language Models - [2305.05920] [QA].
  • SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds - [2305.05873] [QA].
  • Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks - [2305.05862] [QA].
  • Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models - [2305.05845] [QA].
  • DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects - [2305.05706] [QA].
  • InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language - [2305.05662] [QA].
  • TidyBot: Personalized Robot Assistance with Large Language Models - [2305.05658] [QA].
  • Towards Building the Federated GPT: Federated Instruction Tuning - [2305.05644] [QA].
  • AudioSlots: A slot-centric generative model for audio separation - [2305.05591] [QA].
  • Recursions Are All You Need: Towards Efficient Deep Unfolding Networks - [2305.05505] [QA].
  • WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset - [2305.05432] [QA].
  • Large Language Model Programs - [2305.05364] [QA].
  • Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue - [2305.05290] [QA].
  • Distilling Script Knowledge from Large Language Models for Constrained Language Planning - [2305.05252] [QA].
  • SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models - [2305.05189] [QA].
  • FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance - [2305.05176] [QA].
  • Knowledge-enhanced Agents for Interactive Text Games - [2305.05091] [QA].
  • Multi-Task End-to-End Training Improves Conversational Recommendation - [2305.06218] [QA].
  • Recommender Systems with Generative Retrieval - [2305.05065] [QA].
  • NerfAcc: Efficient Sampling Accelerates NeRFs - [2305.04966] [QA].
  • A Drop of Ink Makes a Million Think: The Spread of False Information in Large Language Models - [2305.04812] [QA].
  • MultiModal-GPT: A Vision and Language Model for Dialogue with Humans - [2305.04790] [QA].
  • AvatarReX: Real-time Expressive Full-body Avatars - [2305.04789] [QA].
  • Controllable Light Diffusion for Portraits - [2305.04745] [QA].
  • Code Execution with Pre-trained Language Models - [2305.05383] [QA].
  • LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition - [2305.04536] [QA].
  • Video Object Segmentation in Panoptic Wild Scenes - [2305.04470] [QA].
  • Locally Attentional SDF Diffusion for Controllable 3D Shape Generation - [2305.04461] [QA].
  • Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models - [2305.04441] [QA].
  • A Variational Perspective on Solving Inverse Problems with Diffusion Models - [2305.04391] [QA].
  • Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting - [2305.04388] [QA].
  • Unified Demonstration Retriever for In-Context Learning - [2305.04320] [QA].
  • Multi-Space Neural Radiance Fields - [2305.04268] [QA].
  • Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens - [2305.04241] [QA].
  • Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning - [2305.04175] [QA].
  • X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - [2305.04160] [QA].
  • Exploring Human-Like Translation Strategy with Large Language Models - [2305.04118] [QA].
  • Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models - [2305.04091] [QA].
  • Pre-training Language Model as a Multi-perspective Course Learner - [2305.03981] [QA].
  • Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization - [2305.03937] [QA].
  • Otter: A Multi-Modal Model with In-Context Instruction Tuning - [2305.03726] [QA].
  • Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos - [2305.03713] [QA].
  • LMEye: An Interactive Perception Network for Large Language Models - [2305.03701] [QA].
  • Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements - [2305.03695] [QA].
  • Mining bias-target Alignment from Voronoi Cells - [2305.03691] [QA].
  • COLA: A Benchmark for Compositional Text-to-image Retrieval - [2305.03689] [QA].
  • A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding - [2305.03668] [QA].
  • Query Expansion by Prompting Large Language Models - [2305.03653] [QA].
  • T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering - [2305.03453] [QA].
  • TransESC: Smoothing Emotional Support Conversation via Turn-Level State Transition - [2305.03296] [QA].
  • Composite Motion Learning with Task Control - [2305.03286] [QA].
  • Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework - [2305.03268] [QA].
  • AttentionViz: A Global View of Transformer Attention - [2305.03210] [QA].
  • Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs - [2305.03111] [QA].
  • ZipIt! Merging Models from Different Tasks without Training - [2305.03053] [QA].
  • Tracking through Containers and Occluders in the Wild - [2305.03052] [QA].
  • Controllable Visual-Tactile Synthesis - [2305.03051] [QA].
  • NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds - [2305.03049] [QA].
  • Personalize Segment Anything Model with One Shot - [2305.03048] [QA].
  • Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision - [2305.03047] [QA].
  • Single-Shot Implicit Morphable Faces with Consistent Texture Parameterization - [2305.03043] [QA].
  • TUVF: Learning Generalizable Texture UV Radiance Fields - [2305.03040] [QA].
  • NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads - [2305.03027] [QA].
  • Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion - [2305.03509] [QA].
  • Masked Trajectory Models for Prediction, Representation, and Control - [2305.02968] [QA].
  • BranchNorm: Robustly Scaling Extremely Deep Transformers - [2305.02790] [QA].
  • A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects - [2305.02750] [QA].
  • Real-Time Neural Appearance Models - [2305.02678] [QA].
  • Caption Anything: Interactive Image Description with Diverse Multimodal Controls - [2305.02677] [QA].
  • Learning Language-Specific Layers for Multilingual Machine Translation - [2305.02665] [QA].
  • Semantically Structured Image Compression via Irregular Group-Based Decoupling - [2305.02586] [QA].
  • Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era - [2305.02555] [QA].
  • FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction - [2305.02549] [QA].
  • AutoML-GPT: Automatic Machine Learning with GPT - [2305.02499] [QA].
  • ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization - [2305.02483] [QA].
  • Shap-E: Generating Conditional 3D Implicit Functions - [2305.02463] [QA].
  • Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs - [2305.02440] [QA].
  • Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents - [2305.02412] [QA].
  • Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts - [2305.02320] [QA].
  • Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings - [2305.02317] [QA].
  • Uncovering ChatGPT's Capabilities in Recommender Systems - [2305.02182] [QA].
  • Zero-Shot Listwise Document Reranking with a Large Language Model - [2305.02156] [QA].
  • Multimodal Procedural Planning via Dual Text-Image Prompting - [2305.01795] [QA].
  • Automated Code generation for Information Technology Tasks in YAML through Large Language Models - [2305.02783] [QA].
  • Stars Are All You Need: A Distantly Supervised Pyramid Network for Document-Level End-to-End Sentiment Analysis - [2305.01710] [QA].
  • TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis - [2305.00976] [QA].
  • Unlimiformer: Long-Range Transformers with Unlimited Length Input - [2305.01625] [QA].
  • Transfer Visual Prompt Generator across LLMs - [2305.01278] [QA].
  • The Role of Summarization in Generative Agents: A Preliminary Perspective - [2305.01253] [QA].
  • ArK: Augmented Reality with Knowledge Interactive Emergent Ability - [2305.00970] [QA].
  • Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation - [2305.00955] [QA].
  • Hypernuclear event detection in the nuclear emulsion with Monte Carlo simulation and machine learning - [2305.0884] [QA].
  • Learning to Reason and Memorize with Self-Notes - [2305.00833] [QA].
  • Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation - [2305.00673] [QA].

April 2023

  • TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation - [2305.00447] [QA].
  • LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model - [2304.15010] [QA].
  • Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models - [2304.14867] [QA].
  • A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning - [2304.14856] [QA].
  • IMP: Iterative Matching and Pose Estimation with Adaptive Pooling - [2304.14837] [QA].
  • Multivariate Representation Learning for Information Retrieval - [2304.14522] [QA].
  • Framing the News:From Human Perception to Large Language Model Inferences - [2304.14456] [QA].
  • ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System - [2304.14407] [QA].
  • Large Language Models are Strong Zero-Shot Retriever - [2304.14233] [QA].
  • mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - [2304.14178] [QA].
  • Categorification of Group Equivariant Neural Networks - [2304.14144] [QA].
  • ChatLog: Recording and Analyzing ChatGPT Across Time - [2304.14106] [QA].
  • Learning Human-Human Interactions in Images from Weak Textual Supervision - [2304.14104] [QA].
  • Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks - [2304.13861] [QA].
  • Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models - [2304.13835] [QA].
  • Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond - [2304.13712] [QA].
  • Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning - [2304.13676] [QA].
  • Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System - [2304.13343] [QA].
  • EverLight: Indoor-Outdoor Editable HDR Lighting Estimation - [2304.13207] [QA].
  • SAFE: Machine Unlearning With Shard Graphs - [2304.13169] [QA].
  • Generative Relevance Feedback with Large Language Models - [2304.13157] [QA].
  • Answering Questions by Meta-Reasoning over Multiple Chains of Thought - [2304.13007] [QA].
  • Patch-based 3D Natural Scene Generation from a Single Example - [2304.12670] [QA].
  • Bayesian Optimization Meets Self-Distillation - [2304.12666] [QA].
  • Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks - [2304.12567] [QA].
  • GlyphDiffusion: Text Generation as Image Generation - [2304.12519] [QA].
  • On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research - [2304.12397] [QA].
  • Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Prediction - [2304.12372] [QA].
  • WizardLM: Empowering Large Language Models to Follow Complex Instructions - [2304.12244] [QA].
  • Track Anything: Segment Anything Meets Videos - [2304.11968] [QA].
  • ChatLLM Network: More brains, More intelligence - [2304.12998] [QA].
  • Universal Domain Adaptation via Compressive Attention Matching - [2304.11862] [QA].
  • Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization - [2304.11823] [QA].
  • Score-Based Diffusion Models as Principled Priors for Inverse Imaging - [2304.11751] [QA].
  • SketchXAI: A First Look at Explainability for Human Sketches - [2304.11744] [QA].
  • Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation - [2304.11705] [QA].
  • SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models - [2304.11619] [QA].
  • Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations - [2304.11267] [QA].
  • Emergent and Predictable Memorization in Large Language Models - [2304.11158] [QA].
  • ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT - [2304.11107] [QA].
  • Can GPT-4 Perform Neural Architecture Search? - [2304.10970] [QA].
  • Auditing and Generating Synthetic Data with Controllable Trust Trade-offs - [2304.10819] [QA].
  • Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models - [2304.10700] [QA].
  • HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer - [2304.10628] [QA].
  • Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels - [2304.10539] [QA].
  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - [2304.10592] [QA].
  • Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance - [2304.10528] [QA].
  • Phoenix: Democratizing ChatGPT across Languages - [2304.10453] [QA].
  • SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation - [2304.10417] [QA].
  • SCoDA: Domain Adaptive Shape Completion for Real Scans - [2304.10179] [QA].
  • Learning Bottleneck Concepts in Image Classification - [2304.10131] [QA].
  • Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation - [2304.10066] [QA].
  • MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation - [2304.09913] [QA].
  • Evaluating Verifiability in Generative Search Engines - [2304.09848] [QA].
  • Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models - [2304.09842] [QA].
  • MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation - [2304.09801] [QA].
  • DarSwin: Distortion Aware Radial Swin Transformer - [2304.09691] [QA].
  • Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent - [2304.09542] [QA].
  • Network Pruning Spaces - [2304.09453] [QA].
  • ASM: Adaptive Skinning Model for High-Quality 3D Face Modeling - [2304.09423] [QA].
  • To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review - [2304.09355] [QA].
  • Fast Neural Scene Flow - [2304.09121] [QA].
  • Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions - [2304.11063] [QA].
  • In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT - [2304.08979] [QA].
  • SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes - [2304.08971] [QA].
  • Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections - [2304.08706] [QA].
  • An Evaluation on Large Language Model Outputs: Discourse and Memorization - [2304.08637] [QA].
  • Visual Instruction Tuning - [2304.08485] [QA].
  • Towards Robust Prompts on Vision-Language Models - [2304.08479] [QA].
  • Learning to Compress Prompts with Gist Tokens - [2304.08467] [QA].
  • Efficient Video Action Detection with Token Dropout and Context Refinement - [2304.08451] [QA].
  • Tool Learning with Foundation Models - [2304.08354] [QA].
  • Magnitude of arithmetic scalar and matrix categories - [2304.08334] [QA].
  • Chain of Thought Prompt Tuning in Vision Language Models - [2304.07919] [QA].
  • Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation - [2304.07854] [QA].
  • EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation - [2304.07803] [QA].
  • Self-collaboration Code Generation via ChatGPT - [2304.07590] [QA].
  • Tractable Control for Autoregressive Language Generation - [2304.07438] [QA].
  • DINOv2: Learning Robust Visual Features without Supervision - [2304.07193] [QA].
  • M2T: Masking Transformers Twice for Faster Decoding - [2304.07313] [QA].
  • Delta Denoising Score - [2304.07090] [QA].
  • DCFace: Synthetic Face Generation with Dual Condition Diffusion Model - [2304.07060] [QA].
  • DeePoint: Visual Pointing Recognition and Direction Estimation - [2304.06977] [QA].
  • Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text - [2304.06939] [QA].
  • Unified Out-Of-Distribution Detection: A Model-Specific Perspective - [2304.06813] [QA].
  • RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment - [2304.06767] [QA].
  • Expressive Text-to-Image Generation with Rich Text - [2304.06720] [QA].
  • Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction - [2304.06714] [QA].
  • What does CLIP know about a red circle? Visual prompt engineering for VLMs - [2304.06712] [QA].
  • DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer - [2304.06668] [QA].
  • DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning - [2304.06648] [QA].
  • Are LLMs All You Need for Task-Oriented Dialogue? - [2304.06556] [QA].
  • Perspectives on Large Language Models for Relevance Judgment - [2304.09161] [QA].
  • Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning - [2304.06461] [QA].
  • AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models - [2304.06364] [QA].
  • NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds - [2304.06287] [QA].
  • Language Instructed Reinforcement Learning for Human-AI Coordination - [2304.07297] [QA].
  • Asymmetrically-powered Neural Image Compression with Shallow Decoders - [2304.06244] [QA].
  • [CLS] Token is All You Need for Zero-Shot Semantic Segmentation - [2304.06212] [QA].
  • Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views - [2304.06024] [QA].
  • VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs - [2304.06020] [QA].
  • Can Large Language Models Transform Computational Social Science? - [2305.03514] [QA].
  • Hard Patches Mining for Masked Image Modeling - [2304.05919] [QA].
  • Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL - [2304.05889] [QA].
  • Are Local Features All You Need for Cross-Domain Visual Place Recognition? - [2304.05887] [QA].
  • Mesh2Tex: Generating Mesh Textures from Image Queries - [2304.05868] [QA].
  • Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation - [2304.05669] [QA].
  • Instance-Aware Domain Generalization for Face Anti-Spoofing - [2304.05640] [QA].
  • ChatGPT is all you need to decolonize sub-Saharan Vocational Education - [2304.13728] [QA].
  • ChemCrow: Augmenting large-language models with chemistry tools - [2304.05376] [QA].
  • Toxicity in ChatGPT: Analyzing Persona-assigned Language Models - [2304.05335] [QA].
  • OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction - [2304.05316] [QA].
  • SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes - [2304.05170] [QA].
  • Teaching Large Language Models to Self-Debug - [2304.05128] [QA].
  • StageInteractor: Query-based Object Detector with Cross-stage Interaction - [2304.04978] [QA].
  • Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning - [2304.04824] [QA].
  • A Cheaper and Better Diffusion Language Model with Soft-Masked Noise - [2304.04746] [QA].
  • Ambiguous Medical Image Segmentation using Diffusion Models - [2304.04745] [QA].
  • Detection Transformer with Stable Matching - [2304.04742] [QA].
  • Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition - [2304.04704] [QA].
  • Improved Test-Time Adaptation for Domain Generalization - [2304.04494] [QA].
  • Instance Neural Radiance Field - [2304.04395] [QA].
  • Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT - [2304.11116] [QA].
  • OpenAGI: When LLM Meets Domain Experts - [2304.04370] [QA].
  • Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions - [2304.04227] [QA].
  • Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification - [2304.04205] [QA].
  • Token Boosting for Robust Self-Supervised Visual Transformer Pre-training - [2304.04175] [QA].
  • Hi Sheldon! Creating Deep Personalized Characters from TV Shows - [2304.11093] [QA].
  • Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder - [2304.04052] [QA].
  • ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application - [2304.03893] [QA].
  • Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis - [2304.03869] [QA].
  • Why think step by step? Reasoning emerges from the locality of experience - [2304.03843] [QA].
  • Meta-causal Learning for Single Domain Generalization - [2304.03709] [QA].
  • Model-Agnostic Gender Debiased Image Captioning - [2304.03693] [QA].
  • Attention: Marginal Probability is All You Need? - [2304.04556] [QA].
  • Sheaf Neural Networks for Graph-based Recommender Systems - [2304.09097] [QA].
  • RED-PSM: Regularization by Denoising of Partially Separable Models for Dynamic Imaging - [2304.03483] [QA].
  • Generative Agents: Interactive Simulacra of Human Behavior - [2304.03442] [QA].
  • TopNet: Transformer-based Object Placement Network for Image Compositing - [2304.03372] [QA].
  • SegGPT: Segmenting Everything In Context - [2304.03284] [QA].
  • Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention - [2304.03282] [QA].
  • Retention Is All You Need - [2304.03103] [QA].
  • MULLER: Multilayer Laplacian Resizer for Vision - [2304.02859] [QA].
  • Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation - [2304.02841] [QA].
  • Segment Anything - [2304.02643] [QA].
  • ENTL: Embodied Navigation Trajectory Learner - [2304.02639] [QA].
  • HNeRV: A Hybrid Neural Representation for Videos - [2304.02633] [QA].
  • Dynamic Point Fields - [2304.02626] [QA].
  • Generative Novel View Synthesis with 3D-Aware Diffusion Models - [2304.02602] [QA].
  • Detecting and Grounding Multi-Modal Media Manipulation - [2304.02556] [QA].
  • TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration - [2304.02419] [QA].
  • Effective control of two-dimensional Rayleigh--Bénard convection: invariant multi-agent reinforcement learning is all you need - [2304.02370] [QA].
  • SMPConv: Self-moving Point Representations for Continuous Convolution - [2304.02330] [QA].
  • Few-shot Semantic Image Synthesis with Class Affinity Transfer - [2304.02321] [QA].
  • How to choose your best allies for a transferable attack? - [2304.02312] [QA].
  • ERRA: An Embodied Representation and Reasoning Architecture for Long-horizon Language-conditioned Manipulation Tasks - [2304.02251] [QA].
  • GINA-3D: Learning to Generate Implicit Neural Assets in the Wild - [2304.02163] [QA].
  • FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding - [2304.02135] [QA].
  • Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing - [2304.02051] [QA].
  • GlueStick: Robust Image Matching by Sticking Points and Lines Together - [2304.02008] [QA].
  • MonoHuman: Animatable Human Neural Field from Monocular Video - [2304.02001] [QA].
  • LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models - [2304.01933] [QA].
  • Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion - [2304.01893] [QA].
  • Learning to Name Classes for Vision and Language Models - [2304.01830] [QA].
  • Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation - [2304.01816] [QA].
  • Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification - [2304.01804] [QA].
  • Towards Open-Vocabulary Video Instance Segmentation - [2304.01715] [QA].
  • HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering - [2304.01686] [QA].
  • On the Stability-Plasticity Dilemma of Class-Incremental Learning - [2304.01663] [QA].
  • Cross-Domain Image Captioning with Discriminative Finetuning - [2304.01662] [QA].
  • IterativePFN: True Iterative Point Cloud Filtering - [2304.01529] [QA].
  • Robust Outlier Rejection for 3D Registration with Variational Bayes - [2304.01514] [QA].
  • Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning - [2304.01482] [QA].
  • Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection - [2304.01464] [QA].
  • Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos - [2304.01436] [QA].
  • VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution - [2304.01434] [QA].
  • Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling - [2304.01373] [QA].
  • Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver - [2304.01289] [QA].
  • Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation - [2304.01279] [QA].
  • Asymptotic expansions for the maximum likelihood estimation errors of the rotating parameter of the gravitational wave from core-collapse supernovae - [2304.1267] [QA].
  • Neural Volumetric Memory for Visual Locomotion Control - [2304.01201] [QA].
  • Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data - [2304.01196] [QA].
  • Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement - [2304.01195] [QA].
  • Burstormer: Burst Image Restoration and Enhancement Transformer - [2304.01194] [QA].
  • Navigating to Objects Specified by Images - [2304.01192] [QA].
  • Generative Multiplane Neural Radiance for 3D-Aware Image Generation - [2304.01172] [QA].
  • Generative Diffusion Prior for Unified Image Restoration and Enhancement - [2304.01247] [QA].
  • ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model - [2304.01116] [QA].
  • DivClust: Controlling Diversity in Deep Clustering - [2304.01042] [QA].
  • Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction - [2304.00967] [QA].
  • Astroformer: More Data Might not be all you need for Classification - [2304.05350] [QA].
  • Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation - [2304.00792] [QA].
  • Multi-Modal Representation Learning with Text-Driven Soft Masks - [2304.00719] [QA].
  • 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds - [2304.00690] [QA].
  • Metrological detection of multipartite entanglement through dynamical symmetries - [2304.0564] [QA].
  • UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning - [2304.00464] [QA].
  • Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild - [2304.00451] [QA].
  • When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus - [2304.00350] [QA].
  • Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization - [2304.00212] [QA].

March 2023

  • Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation - [2304.00152] [QA].
  • On stochastic MPC formulations with closed-loop guarantees: Analysis and a unifying framework - [2304.0069] [QA].
  • Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding - [2304.00058] [QA].
  • LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses - [2304.00054] [QA].
  • Accelerating exploration and representation learning with offline pre-training - [2304.00046] [QA].
  • Choose Your Weapon: Survival Strategies for Depressed AI Academics - [2304.06035] [QA].
  • A Survey of Large Language Models - [2303.18223] [QA].
  • Assessing Language Model Deployment with Risk Cards - [2303.18190] [QA].
  • Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction - [2303.18125] [QA].
  • VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization - [2303.17968] [QA].
  • Diffusion Action Segmentation - [2303.17959] [QA].
  • 3D-aware Image Generation using 2D Diffusion Models - [2303.17905] [QA].
  • Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning - [2303.17842] [QA].
  • Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations - [2303.17839] [QA].
  • Neural Microfacet Fields for Inverse Rendering - [2303.17806] [QA].
  • CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition - [2303.17778] [QA].
  • CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society - [2303.17760] [QA].
  • Optimal Input Gain: All You Need to Supercharge a Feed-Forward Neural Network - [2303.17732] [QA].
  • S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces - [2303.17712] [QA].
  • Self-Refine: Iterative Refinement with Self-Feedback - [2303.17651] [QA].
  • SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer - [2303.17605] [QA].
  • TiDy-PSFs: Computational Imaging with Time-Averaged Dynamic Point-Spread-Functions - [2303.17583] [QA].
  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face - [2303.17580] [QA].
  • Iterative Prompt Learning for Unsupervised Backlit Image Enhancement - [2303.17569] [QA].
  • Whose Opinions Do Language Models Reflect? - [2303.17548] [QA].
  • Language Models can Solve Computer Tasks - [2303.17491] [QA].
  • All You Need Is Sex for Diversity - [2303.17441] [QA].
  • WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - [2303.17395] [QA].
  • Social Biases through the Text-to-Image Generation Lens - [2304.06034] [QA].
  • Mixed Autoencoder for Self-supervised Visual Representation Learning - [2303.17152] [QA].
  • NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation - [2303.17147] [QA].
  • ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing - [2303.17096] [QA].
  • AutoAD: Movie Description in Context - [2303.16899] [QA].
  • ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance - [2303.16894] [QA].
  • Adaptive Superpixel for Active Learning in Semantic Segmentation - [2303.16817] [QA].
  • TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation - [2303.16730] [QA].
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment - [2303.16634] [QA].
  • Adaptive Spot-Guided Transformer for Consistent Local Feature Matching - [2303.16624] [QA].
  • Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations - [2303.16618] [QA].
  • Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks - [2303.16563] [QA].
  • Fair Federated Medical Image Segmentation via Client Contribution Estimation - [2303.16520] [QA].
  • Multi-View Azimuth Stereo via Tangent Space Consistency - [2303.16447] [QA].
  • TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - [2303.16434] [QA].
  • ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models - [2303.16421] [QA].
  • Are Data-driven Explanations Robust against Out-of-distribution Data? - [2303.16390] [QA].
  • Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples - [2303.16270] [QA].
  • Your Diffusion Model is Secretly a Zero-Shot Classifier - [2303.16203] [QA].
  • ASIC: Aligning Sparse in-the-wild Image Collections - [2303.16201] [QA].
  • LLaMA-Adapter: Efficient Fine-tuning o

About

WIP - Automated Question Answering for ArXiv Papers with Large Language Models

https://huggingface.co/datasets/taesiri/arxiv_qa


Languages

Language:Jupyter Notebook 86.2%Language:Python 13.8%