Research
I'm broadly interested in computer vision, machine learning, and optimization. See my Google Scholar for a consistently up-to-date publication list.
|
|
Forward Compatible Training for Representation Learning
Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
In Proceedings at CVPR, 2022
In real world visual retrieval systems, the embedding model is consistently updated. This requires embeddings for all images in the gallery to be recomputed for every new model, an expensive process known as backfilling. We present a method for forward compatible training (FCT) in which we prepare for the future version of a model by saving cheap auxiliary information about the present training task. We show empirically that this improves performance on model compatibility on common largescale datasets (ImageNet, Places-365, VGGFace2).
Code
|
|
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
Will Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith
In Proceedings at EMNLP, (Oral) 2022
The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically a variant of gradient descent (GD). As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such "saturated" networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an inductive bias implicit in GD of particular interest for NLP. We leverage the emergent discrete structure in a saturated transformer to analyze the role of different attention heads, finding that some focus locally on a small number of positions, while other heads compute global averages, allowing counting.
|
|
Supermasks in Superposition
Mitchell Wortsman*, Vivek Ramanujan*, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi
In Proceedings at NeurIPS, 2020
We present an application of hidden networks for continual learning, capable of learning thousands of tasks without catastrophic forgetting. We solve tasks individually, each solution corresponding to a subnetwork of a randomly initialized neural network. Using a superposition of these subnetworks, we demonstrate that the viability of this model for task inference. Finally, we introduce a coherent hierarchy for continual learning problems.
Code / Blog
|
|
Soft Threshold Weight Reparameterization for Learnable Sparsity
Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi
To appear at the International Conference on Machine Learning 2020
We introduce a new strategy for pruning neural networks based off of the soft threshold reparametrization technique from signal processing. The layerwise sparsity budgets allow for very sparse but still highly performant trained models across a variety of architectures and tasks.
|
|
What's Hidden in a Randomly Weighted Neural Network?
Vivek Ramanujan*, Mitchell Wortsman*, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari
Computer Vision and Pattern Recognition 2020
We demonstrate that you can find untrained subnetworks of common overparametrized convolutional neural networks at initialization that achieve performance similar to their densely trained counterparts.
Code and Project Page
|
|
Improving Shape Deformation in Unsupervised Image-to-Image Translation
Aaron Gokaslan, Vivek Ramanujan, Kwang-In Kim, Daniel Ritchie, James Tompkin
European Conference for Computer Vision, 2018
We improve on CycleGAN by allowing for better shape deformation between more disparate domains.
|
This website's source is slightly modified from Jonathan Barron's website
|
|