Sharath Chandra Raparthy

I am currently a Member of Technical Staff at Reka AI, working on general purpose multi-modal AI agents.

Prior to joining Reka AI, I was an AI Resident at FAIR at Meta, collaborated closely with Roberta Raileanu. I was a core contributor for Llama 3 where I worked on tool-use and mathematical reasoning capabilities for Llama-3 models. My research at FAIR primarily include LLM reasoning/refinement, open-ended learning and in-context reinforcement learning. I co-led Rainbow Teaming, a method identifies vulnerabilities in LLMs and generates high-quality and diverse synthetic data to improve LLM robustness.

Before joining FAIR, I completed a Master's (with thesis) at Mila under the guidance of Prof. Irina Rish. My academic journey also included a valuable stint at Recursion, where I worked on GFlowNets for Drug Discovery.

Outside of AI research, my passions include photography, long-distance running, reading and cooking.

Email  /  GitHub  /  Google Scholar

profile photo

News

Research

kts The Llama 3 herd of models
Llama Team
[ Blog / Arxiv / Model Card ]

We open-source Llama 3.1, a new family of foundation models with native support for multilinguality, coding, reasoning, and tool usage, featuring a 405B-parameter architecture with 128K context window. The models show comparable performance to GPT-4 across various tasks, and include Llama Guard 3 for safety.

kts Llama-3 Preview Models
Llama Team
[ Blog ]

We introduce Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. We achieve SOTA performance for LLM models at these scales.

kts Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan*, Sharath Chandra Raparthy*, Andrei Lupu*, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu
Neural Information Processing Systems (NeurIPS), 2024

[ Paper / Website / tl;dr ]

Introducing Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs. It's a versatile tool 🛠️ for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety.

kts GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla, Sharath Chandra Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Railneau
International Conference on Machine Learning (ICML), 2024

[ Paper / tl;dr ]

How to bootstrap the reasoning refinement capabilities of LLMs using synthetic data? Introducing "GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements". Applied on GSM8K we can improve a strong RL finetuned LLama-2 13B by 12%

kts Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu
Arxiv.

[ Paper / tl;dr ]

In this work, we set out to understand how different algorithms fare at improving LLM reasoning from feedback. We compare expert iteration, PPO, and return-conditioned RL using Llama-2 as the base model.

kts Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu
International Conference on Machine Learning (ICML), 2024

[ Paper / Code ]

Training autonomous agents to learn new tasks from few demonstrations is challenging, especially for sequential decision making which is sensitive to errors. In this paper, we show that training transformers on diverse offline datasets of trajectories enables in-context learning of out-of-distribution sequential decision tasks from just a handful of demonstrations.

kts Multi-Objective GFlowNets
Moksh Jain, Sharath Chandra Raparthy, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio, Santiago Miret, Emmanuel Bengio
International Conference on Machine Learning (ICML), 2024

[ Paper / Code ]

We examine the standard approach to multi-objective optimization in machine learning applications like drug discovery and material design from a fresh perspective, noting the failure of existing methods to achieve a diverse set of Pareto-optimal candidates. Motivated by the successful use of GFlowNets in single-objective settings, we introduce a new approach, Multi-Objective GFlowNets (MOGFNs), which features a novel Conditional GFlowNet to handle a variety of single-objective sub-problems derived from decomposing the multi-objective problem. Our research, the first to empirically test Conditional GFlowNets, shows that MOGFNs outperform existing methods in Hypervolume, R2-distance, and candidate diversity, even demonstrating their effectiveness in active learning settings.

kts Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio and Guillaume Lajoie
International Conference for Learning Representations (ICLR) 2022
Spotlight Presentation
[ Paper / Code ]

We view the standard Multi-Head attention mechanism from the "Search-Retrieval" perspective and highlight the rigid associations of keys and values. We propose a new drop-in replacement mechanism, Compositional Attention, where the redundancies highlighted are addressed by disentangling the Searches and Retrievals and composing them dynamically in a context dependent way.

hpp Continual Learning In Environments With Polynomial Mixing Times
Matthew Riemer*, Sharath Chandra Raparthy*, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel and Irina Rish
Neural Information Processing Systems (NeurIPS) 2022
[ Paper / Code]

In this work, we concentrate on the major contributor to poor scaling, "Mixing time" of a markov chain induced by a policy. Mixing times, when ignored, can create myopic biases in the optimization and hence is an impediment to the success in the continual RL problems of greatest interest. We categorize the continual RL problems as Scalable MDPs and formally demonstrate that these exhibit polynomial mixing times. We comment on how exisiting RL algorithms face difficulties in this regime and propose three algorithms which clearly demonstrate sample efficiency.

hpp Curriculum in Gradient-Based Meta-Reinforcement Learning
Bhairav Mehta, Tristan Deleu*, Sharath Chandra Raparthy* Christopher Pal, Liam Paull
ICLR BeTR-RL workshop (2021)
[ Paper]

In this work we study the under-studied parameter in meta learning, "Task Distributions". We show that Model Agnostic Meta-Learning (MAML) is sensitive to task distributions, and learning a curriculum of tasks instead of uniformly sampling helps the adaptation performance substantially.

hpp CuNAS - CUriosity-driven Neural-Augmented Simulator
Sharath Chandra Raparthy, Melissa Mozifian, Liam Paull and Florian Golemo
RSS Sim2Real workshop (2021)
[ Slides / Talk]

Transfer of policies from simulation to physical robots is an important open problem in deep reinforcement learning. Prior work has introduced the model-based Neural-Augmented Simulator (NAS) method, which uses task-independent data to create a model of the differences between simulated and real robot. In this work, we show that this method is sensitive to the sampling of motor actions and the control frequency. To overcome this problem, we propose a simple extension based on artificial curiosity. We demonstrate on a physical robot, that this leads to a better exploration of the state space and consequently better transfer performance when compared to the NAS baseline.

Template from this website.