Vivek Ramanujan

When Worse is Better: Navigating the Compression-Generation Tradeoff in Visual Tokenization

Vivek Ramanujan, Kushal Tirumala, Armen Aghajanyan, Luke Zettlemoyer, Ali Farhadi

arXiv, 2024

We challenge the assumption that better image reconstruction leads to better generation in two-stage image generation models. We introduce Causally Regularized Tokenization (CRT), which optimizes the compression-generation trade-off by incorporating stage 2 generation knowledge into stage 1 training. Despite worse reconstruction, CRT achieves state-of-the-art ImageNet generation (2.18 FID) with 2-3× improved compute efficiency, using fewer tokens and parameters than previous methods.

From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos

Matthew Wallingford, Anand Bhattad, Aditya Kusupati, Vivek Ramanujan, Matt Deitke, Sham Kakade, Aniruddha Kembhavi, Roozbeh Mottaghi, Wei-Chiu Ma, Ali Farhadi

In Proceedings at NeurIPS, 2024

We introduce 360-1M, a large-scale 360-degree video dataset, and Odin, a diffusion-based model for novel view synthesis. By leveraging the largest real-world, multi-view dataset to date, Odin can generate novel views of real-world scenes and infer scene geometry and layout, showing improved performance on standard view synthesis and 3D reconstruction benchmarks.

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh, Ranjay Krishna

In Proceedings at NeurIPS, 2024

We investigate the effectiveness of synthetic images for training vision models by comparing them against retrieved real images from the generator's training data (LAION-2B). Our findings show that while synthetic data can be beneficial, it is consistently matched or outperformed by real images from a simple retrieval baseline, partly due to generator artifacts and inaccurate visual details in synthetic images.

Code

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Vivek Ramanujan*, Thao Nguyen*, Sewoong Oh, Ludwig Schmidt, Ali Farhadi

(Spotlight) In Proceedings at NeurIPS, 2023

We investigate how pre-training data properties affect the robustness of fine-tuned models. Through extensive experiments across natural and synthetic datasets, we find that data quantity is the primary factor influencing downstream robustness, while other factors like label space, semantics, and image diversity have limited impact. We demonstrate this using the iWildCam-WILDS distribution shift benchmark, showing that even significant changes to pre-training class distribution don't affect robustness when total data quantity is preserved.

DataComp: In Search of the Next Generation of Multimodal Datasets

Samir Yitzhak Gadre*, Gabriel Ilharco*, Alex Fang*, (many more important authors) Vivek Ramanujan, (many more important authors), Vaishaal Shankar, Ludwig Schmidt

In Proceedings at NeurIPS (Datasets and Benchmarks Track), 2023

We introduce DataComp, a benchmark for multimodal dataset creation with a candidate pool of 12.8B image-text pairs. Our testbed enables systematic evaluation of dataset design choices through standardized CLIP training and evaluation on 38 downstream tasks. Our best baseline, DataComp-1B, achieves 79.2% zero-shot ImageNet accuracy with CLIP ViT-L/14, surpassing OpenAI's CLIP by 3.7%.

Code

Neural Priming for Sample-Efficient Adaptation

Matthew Wallingford*, Vivek Ramanujan*, Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi, Ludwig Schmidt, Ali Farhadi

In Proceedings at NeurIPS, 2023

We introduce Neural Priming, a technique that enables large pretrained models to adapt to distribution shifts and downstream tasks with minimal labeled data. By recalling and conditioning on relevant pretraining data when presented with class names or unlabeled samples, Neural Priming achieves significant improvements across various benchmarks: 2.45% on ImageNet zero-shot, 3.81% on transfer learning tasks, and 1.41% on ImageNetV2 using test-time adaptation.

Code

Neural Radiance Field Codebooks

Matthew Wallingford, Aditya Kusupati, Alex Fang, Vivek Ramanujan, Aniruddha Kembhavi, Roozbeh Mottaghi, Ali Farhadi

International Conference on Representation Learning 2023

We introduce Neural Radiance Field Codebooks (NRC), a method for learning object-centric representations through novel view reconstruction. NRC learns to reconstruct scenes using a dictionary of object codes decoded through a volumetric renderer, enabling discovery of reoccurring visual and geometric patterns. We demonstrate superior performance in object navigation, unsupervised segmentation, and depth ordering tasks across both synthetic and real scenes.

Matryoshka Representations for Adaptive Deployment

Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

In Proceedings at NeurIPS, 2022

We introduce Matryoshka Representation Learning (MRL), a method for learning flexible representations that can adapt to multiple downstream tasks with varying computational resources. MRL encodes information at different granularities, allowing a single embedding to adapt to computational constraints without additional inference cost. We demonstrate significant improvements in efficiency and accuracy across various tasks and modalities, including up to 14× smaller embedding sizes for ImageNet classification and retrieval.

Code

LLC: Accurate, Multi-Purpose Learnt Low-Dimensional Binary Codes

Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

Advances in Neural Information Processing Systems (NeurIPS), 2021

We propose a novel method for learning low-dimensional binary codes for instances and classes without requiring side-information. Our method learns extremely low-dimensional binary codes (~20 bits for ImageNet-1K) while maintaining near-optimal classification accuracy. The codes capture intrinsic data features, enabling efficient image retrieval and out-of-distribution detection tasks.

Code

Forward Compatible Training for Representation Learning

Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

In Proceedings at CVPR, 2022

In real world visual retrieval systems, the embedding model is consistently updated. This requires embeddings for all images in the gallery to be recomputed for every new model, an expensive process known as backfilling. We present a method for forward compatible training (FCT) in which we prepare for the future version of a model by saving cheap auxiliary information about the present training task. We show empirically that this improves performance on model compatibility on common largescale datasets (ImageNet, Places-365, VGGFace2).

Code

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

Will Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith

In Proceedings at EMNLP, (Oral) 2022

The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically a variant of gradient descent (GD). As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such "saturated" networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an inductive bias implicit in GD of particular interest for NLP. We leverage the emergent discrete structure in a saturated transformer to analyze the role of different attention heads, finding that some focus locally on a small number of positions, while other heads compute global averages, allowing counting.

Supermasks in Superposition

Mitchell Wortsman*, Vivek Ramanujan*, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

In Proceedings at NeurIPS, 2020

We present an application of hidden networks for continual learning, capable of learning thousands of tasks without catastrophic forgetting. We solve tasks individually, each solution corresponding to a subnetwork of a randomly initialized neural network. Using a superposition of these subnetworks, we demonstrate that the viability of this model for task inference. Finally, we introduce a coherent hierarchy for continual learning problems.

Code / Blog

Soft Threshold Weight Reparameterization for Learnable Sparsity

Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi

International Conference on Machine Learning, 2020

We introduce a new strategy for pruning neural networks based off of the soft threshold reparametrization technique from signal processing. The layerwise sparsity budgets allow for very sparse but still highly performant trained models across a variety of architectures and tasks.

What's Hidden in a Randomly Weighted Neural Network?

Vivek Ramanujan*, Mitchell Wortsman*, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari

Computer Vision and Pattern Recognition, 2020

We demonstrate that you can find untrained subnetworks of common overparametrized convolutional neural networks at initialization that achieve performance similar to their densely trained counterparts.

Code and Project Page

Improving Shape Deformation in Unsupervised Image-to-Image Translation

Aaron Gokaslan, Vivek Ramanujan, Kwang-In Kim, Daniel Ritchie, James Tompkin

European Conference for Computer Vision, 2018

We improve on CycleGAN by allowing for better shape deformation between more disparate domains.

Vivek Ramanujan

Research

Service