profile photo

Vivek Ramanujan

I am currently a PhD student at the University of Washington working with Ali Farhadi and Ludwig Schmidt on problems related to robust machine learning. Previously, I was a predoctoral researcher on the PRIOR (vision) group at the Allen Institute for Artificial Intelligence (AI2), where I was advised by Mohammad Rastegari and Aniruddha Kembhavi.

Email  /  CV  /  Github  /  Google Scholar

Research

I'm broadly interested in computer vision, machine learning, and optimization. See my Google Scholar for a consistently up-to-date publication list.

clean-usnob Forward Compatible Training for Representation Learning
Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
In Proceedings at CVPR, 2022

In real world visual retrieval systems, the embedding model is consistently updated. This requires embeddings for all images in the gallery to be recomputed for every new model, an expensive process known as backfilling. We present a method for forward compatible training (FCT) in which we prepare for the future version of a model by saving cheap auxiliary information about the present training task. We show empirically that this improves performance on model compatibility on common largescale datasets (ImageNet, Places-365, VGGFace2).

Code

clean-usnob Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
Will Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith
In Proceedings at EMNLP, (Oral) 2022

The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically a variant of gradient descent (GD). As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such "saturated" networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an inductive bias implicit in GD of particular interest for NLP. We leverage the emergent discrete structure in a saturated transformer to analyze the role of different attention heads, finding that some focus locally on a small number of positions, while other heads compute global averages, allowing counting.

clean-usnob Supermasks in Superposition
Mitchell Wortsman*, Vivek Ramanujan*, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi
In Proceedings at NeurIPS, 2020

We present an application of hidden networks for continual learning, capable of learning thousands of tasks without catastrophic forgetting. We solve tasks individually, each solution corresponding to a subnetwork of a randomly initialized neural network. Using a superposition of these subnetworks, we demonstrate that the viability of this model for task inference. Finally, we introduce a coherent hierarchy for continual learning problems.

Code / Blog

clean-usnob Soft Threshold Weight Reparameterization for Learnable Sparsity
Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi
To appear at the International Conference on Machine Learning 2020

We introduce a new strategy for pruning neural networks based off of the soft threshold reparametrization technique from signal processing. The layerwise sparsity budgets allow for very sparse but still highly performant trained models across a variety of architectures and tasks.

clean-usnob What's Hidden in a Randomly Weighted Neural Network?
Vivek Ramanujan*, Mitchell Wortsman*, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari
Computer Vision and Pattern Recognition 2020

We demonstrate that you can find untrained subnetworks of common overparametrized convolutional neural networks at initialization that achieve performance similar to their densely trained counterparts.

Code and Project Page

clean-usnob Improving Shape Deformation in Unsupervised Image-to-Image Translation
Aaron Gokaslan, Vivek Ramanujan, Kwang-In Kim, Daniel Ritchie, James Tompkin
European Conference for Computer Vision, 2018

We improve on CycleGAN by allowing for better shape deformation between more disparate domains.

Service
Brown CS Teaching Assistant, Computer Vision CS146 Spring 2018

Teaching Assistant, Machine Learning CS142 Spring 2018

Teaching Assistant, Applied Artificial Intelligence CS141 Spring 2017

Teaching Assistant, Deep Learning CS2951K, Fall 2016

This website's source is slightly modified from Jonathan Barron's website