Vivek Ramanujan

Vivek Ramanujan

I am currently a PhD student at the University of Washington working with Ali Farhadi and Ludwig Schmidt on problems related to robust machine learning. Previously, I was a predoctoral researcher on the PRIOR (vision) group at the Allen Institute for Artificial Intelligence (AI2), where I was advised by Mohammad Rastegari and Aniruddha Kembhavi.

Research

I'm broadly interested in computer vision, machine learning, and optimization. See my Google Scholar for a consistently up-to-date publication list.

* denotes equal contribution
When Worse is Better: Navigating the Compression-Generation Tradeoff in Visual Tokenization
Vivek Ramanujan, Kushal Tirumala, Armen Aghajanyan, Luke Zettlemoyer, Ali Farhadi
arXiv, 2024
We challenge the assumption that better image reconstruction leads to better generation in two-stage image generation models. We introduce Causally Regularized Tokenization (CRT), which optimizes the compression-generation trade-off by incorporating stage 2 generation knowledge into stage 1 training. Despite worse reconstruction, CRT achieves state-of-the-art ImageNet generation (2.18 FID) with 2-3× improved compute efficiency, using fewer tokens and parameters than previous methods.
From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos
Matthew Wallingford, Anand Bhattad, Aditya Kusupati, Vivek Ramanujan, Matt Deitke, Sham Kakade, Aniruddha Kembhavi, Roozbeh Mottaghi, Wei-Chiu Ma, Ali Farhadi
In Proceedings at NeurIPS, 2024
We introduce 360-1M, a large-scale 360-degree video dataset, and Odin, a diffusion-based model for novel view synthesis. By leveraging the largest real-world, multi-view dataset to date, Odin can generate novel views of real-world scenes and infer scene geometry and layout, showing improved performance on standard view synthesis and 3D reconstruction benchmarks.
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh, Ranjay Krishna
In Proceedings at NeurIPS, 2024
We investigate the effectiveness of synthetic images for training vision models by comparing them against retrieved real images from the generator's training data (LAION-2B). Our findings show that while synthetic data can be beneficial, it is consistently matched or outperformed by real images from a simple retrieval baseline, partly due to generator artifacts and inaccurate visual details in synthetic images.
On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
Vivek Ramanujan*, Thao Nguyen*, Sewoong Oh, Ludwig Schmidt, Ali Farhadi
(Spotlight) In Proceedings at NeurIPS, 2023
We investigate how pre-training data properties affect the robustness of fine-tuned models. Through extensive experiments across natural and synthetic datasets, we find that data quantity is the primary factor influencing downstream robustness, while other factors like label space, semantics, and image diversity have limited impact. We demonstrate this using the iWildCam-WILDS distribution shift benchmark, showing that even significant changes to pre-training class distribution don't affect robustness when total data quantity is preserved.
DataComp: In Search of the Next Generation of Multimodal Datasets
Samir Yitzhak Gadre*, Gabriel Ilharco*, Alex Fang*, (many more important authors) Vivek Ramanujan, (many more important authors), Vaishaal Shankar, Ludwig Schmidt
In Proceedings at NeurIPS (Datasets and Benchmarks Track), 2023
We introduce DataComp, a benchmark for multimodal dataset creation with a candidate pool of 12.8B image-text pairs. Our testbed enables systematic evaluation of dataset design choices through standardized CLIP training and evaluation on 38 downstream tasks. Our best baseline, DataComp-1B, achieves 79.2% zero-shot ImageNet accuracy with CLIP ViT-L/14, surpassing OpenAI's CLIP by 3.7%.
Neural Priming for Sample-Efficient Adaptation
Matthew Wallingford*, Vivek Ramanujan*, Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi, Ludwig Schmidt, Ali Farhadi
In Proceedings at NeurIPS, 2023
We introduce Neural Priming, a technique that enables large pretrained models to adapt to distribution shifts and downstream tasks with minimal labeled data. By recalling and conditioning on relevant pretraining data when presented with class names or unlabeled samples, Neural Priming achieves significant improvements across various benchmarks: 2.45% on ImageNet zero-shot, 3.81% on transfer learning tasks, and 1.41% on ImageNetV2 using test-time adaptation.
Neural Radiance Field Codebooks
Matthew Wallingford, Aditya Kusupati, Alex Fang, Vivek Ramanujan, Aniruddha Kembhavi, Roozbeh Mottaghi, Ali Farhadi
International Conference on Representation Learning 2023
We introduce Neural Radiance Field Codebooks (NRC), a method for learning object-centric representations through novel view reconstruction. NRC learns to reconstruct scenes using a dictionary of object codes decoded through a volumetric renderer, enabling discovery of reoccurring visual and geometric patterns. We demonstrate superior performance in object navigation, unsupervised segmentation, and depth ordering tasks across both synthetic and real scenes.
Matryoshka Representations for Adaptive Deployment
Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi
In Proceedings at NeurIPS, 2022
We introduce Matryoshka Representation Learning (MRL), a method for learning flexible representations that can adapt to multiple downstream tasks with varying computational resources. MRL encodes information at different granularities, allowing a single embedding to adapt to computational constraints without additional inference cost. We demonstrate significant improvements in efficiency and accuracy across various tasks and modalities, including up to 14× smaller embedding sizes for ImageNet classification and retrieval.
LLC: Accurate, Multi-Purpose Learnt Low-Dimensional Binary Codes
Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi
Advances in Neural Information Processing Systems (NeurIPS), 2021
We propose a novel method for learning low-dimensional binary codes for instances and classes without requiring side-information. Our method learns extremely low-dimensional binary codes (~20 bits for ImageNet-1K) while maintaining near-optimal classification accuracy. The codes capture intrinsic data features, enabling efficient image retrieval and out-of-distribution detection tasks.
Forward Compatible Training for Representation Learning
Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
In Proceedings at CVPR, 2022
In real world visual retrieval systems, the embedding model is consistently updated. This requires embeddings for all images in the gallery to be recomputed for every new model, an expensive process known as backfilling. We present a method for forward compatible training (FCT) in which we prepare for the future version of a model by saving cheap auxiliary information about the present training task. We show empirically that this improves performance on model compatibility on common largescale datasets (ImageNet, Places-365, VGGFace2).
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
Will Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith
In Proceedings at EMNLP, (Oral) 2022
The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically a variant of gradient descent (GD). As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such "saturated" networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an inductive bias implicit in GD of particular interest for NLP. We leverage the emergent discrete structure in a saturated transformer to analyze the role of different attention heads, finding that some focus locally on a small number of positions, while other heads compute global averages, allowing counting.
Supermasks in Superposition
Mitchell Wortsman*, Vivek Ramanujan*, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi
In Proceedings at NeurIPS, 2020
We present an application of hidden networks for continual learning, capable of learning thousands of tasks without catastrophic forgetting. We solve tasks individually, each solution corresponding to a subnetwork of a randomly initialized neural network. Using a superposition of these subnetworks, we demonstrate that the viability of this model for task inference. Finally, we introduce a coherent hierarchy for continual learning problems.
Soft Threshold Weight Reparameterization for Learnable Sparsity
Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi
International Conference on Machine Learning, 2020
We introduce a new strategy for pruning neural networks based off of the soft threshold reparametrization technique from signal processing. The layerwise sparsity budgets allow for very sparse but still highly performant trained models across a variety of architectures and tasks.
What's Hidden in a Randomly Weighted Neural Network?
Vivek Ramanujan*, Mitchell Wortsman*, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari
Computer Vision and Pattern Recognition, 2020
We demonstrate that you can find untrained subnetworks of common overparametrized convolutional neural networks at initialization that achieve performance similar to their densely trained counterparts.
Improving Shape Deformation in Unsupervised Image-to-Image Translation
Aaron Gokaslan, Vivek Ramanujan, Kwang-In Kim, Daniel Ritchie, James Tompkin
European Conference for Computer Vision, 2018
We improve on CycleGAN by allowing for better shape deformation between more disparate domains.

Service

Reviewer CVPR 2025
Reviewer ICLR 2024
Reviewer NeurIPS 2024
Teaching Assistant Computer Vision CS146 Spring 2018
Teaching Assistant Machine Learning CS142 Spring 2018
Teaching Assistant Applied Artificial Intelligence CS141 Spring 2017
Teaching Assistant Deep Learning CS2951K, Fall 2016