Sharath Chandra Raparthy

I am a Research Engineer at Google DeepMind, working in the Open-Endedness team.

Previously, I was a Member of Technical Staff at Reka AI, building general-purpose multimodal agents. Before that, I was an AI Resident at FAIR (Meta), where I was a core contributor to Llama 3 — shipping tool-use and mathematical reasoning capabilities — and co-led Rainbow Teaming, a method for stress-testing and improving LLM robustness at scale. My research spans LLM reasoning, open-ended learning, and in-context reinforcement learning.

I hold a Master's (with thesis) from Mila, advised by Irina Rish, and spent time at Recursion applying GFlowNets to drug discovery.

When not training models, you'll find me running long distances, cooking, reading, or out with a camera.

Sharath Chandra Raparthy

News

Research

Llama 3.1
The Llama 3 Herd of Models
Llama Team

We open-source Llama 3.1, a new family of foundation models with native support for multilinguality, coding, reasoning, and tool usage, featuring a 405B-parameter architecture with 128K context window. The models show comparable performance to GPT-4 across various tasks, and include Llama Guard 3 for safety.

Llama 3
Llama-3 Preview Models
Llama Team

We introduce Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. We achieve SOTA performance for LLM models at these scales.

Rainbow Teaming
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan*, Sharath Chandra Raparthy*, Andrei Lupu*, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu
Neural Information Processing Systems (NeurIPS), 2024

Introducing Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs. It's a versatile tool for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety.

GLoRe
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla, Sharath Chandra Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Raileanu
International Conference on Machine Learning (ICML), 2024

How to bootstrap the reasoning refinement capabilities of LLMs using synthetic data? We introduce GLoRe — applied on GSM8K, we can improve a strong RL finetuned Llama-2 13B by 12%.

Teaching LLMs to Reason
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu
Arxiv

In this work, we set out to understand how different algorithms fare at improving LLM reasoning from feedback. We compare expert iteration, PPO, and return-conditioned RL using Llama-2 as the base model.

In-Context Learning for SDM
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu
International Conference on Machine Learning (ICML), 2024

Training autonomous agents to learn new tasks from few demonstrations is challenging, especially for sequential decision making which is sensitive to errors. We show that training transformers on diverse offline datasets of trajectories enables in-context learning of out-of-distribution sequential decision tasks from just a handful of demonstrations.

Multi-Objective GFlowNets
Multi-Objective GFlowNets
Moksh Jain, Sharath Chandra Raparthy, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio, Santiago Miret, Emmanuel Bengio
International Conference on Machine Learning (ICML), 2023

We examine multi-objective optimization in applications like drug discovery and material design, noting the failure of existing methods to achieve diverse Pareto-optimal candidates. We introduce Multi-Objective GFlowNets (MOGFNs), featuring a novel Conditional GFlowNet that outperforms existing methods in Hypervolume, R2-distance, and candidate diversity.

Compositional Attention
Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio and Guillaume Lajoie
International Conference for Learning Representations (ICLR), 2022 Spotlight

We view the standard Multi-Head attention mechanism from the "Search-Retrieval" perspective and highlight the rigid associations of keys and values. We propose Compositional Attention, a drop-in replacement where redundancies are addressed by disentangling Searches and Retrievals and composing them dynamically in a context-dependent way.

Continual Learning
Continual Learning In Environments With Polynomial Mixing Times
Matthew Riemer*, Sharath Chandra Raparthy*, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel and Irina Rish
Neural Information Processing Systems (NeurIPS), 2022

We concentrate on "Mixing time" of a Markov chain induced by a policy as a major contributor to poor scaling. We categorize continual RL problems as Scalable MDPs, formally demonstrate that these exhibit polynomial mixing times, and propose three algorithms which clearly demonstrate sample efficiency.

Curriculum in Meta-RL
Curriculum in Gradient-Based Meta-Reinforcement Learning
Bhairav Mehta, Tristan Deleu*, Sharath Chandra Raparthy*, Christopher Pal, Liam Paull
ICLR BeTR-RL Workshop, 2021

In this work we study the under-studied parameter in meta learning, "Task Distributions". We show that MAML is sensitive to task distributions, and learning a curriculum of tasks instead of uniformly sampling helps the adaptation performance substantially.

CuNAS
CuNAS — CUriosity-driven Neural-Augmented Simulator
Sharath Chandra Raparthy, Melissa Mozifian, Liam Paull and Florian Golemo
RSS Sim2Real Workshop, 2021

Transfer of policies from simulation to physical robots is an important open problem in deep RL. We propose a simple extension to Neural-Augmented Simulators based on artificial curiosity, leading to better exploration and consequently better sim-to-real transfer performance.