Sharath

Sharath Chandra Raparthy

I am currently a Member of Technical Staff at Reka AI, working on general purpose multi-modal AI agents.

Prior to joining Reka AI, I was an AI Resident at FAIR at Meta, collaborated closely with Roberta Raileanu. I was a core contributor for Llama 3 where I worked on tool-use and mathematical reasoning capabilities for Llama-3 models. My research at FAIR primarily include LLM reasoning/refinement, open-ended learning and in-context reinforcement learning. I co-led Rainbow Teaming, a method identifies vulnerabilities in LLMs and generates high-quality and diverse synthetic data to improve LLM robustness.

Before joining FAIR, I completed a Master's (with thesis) at Mila under the guidance of Prof. Irina Rish. My academic journey also included a valuable stint at Recursion, where I worked on GFlowNets for Drug Discovery.

Outside of AI research, my passions include photography, long-distance running, reading and cooking.

Email / GitHub / Google Scholar

News

Oct 2024: Joined Reka AI as a Member Technical Staff.
Sep 2024: Rainbow Teaming got accepted into NeurIPS 2024
Jul 2024: Excited to share that I'm a core contributor to The Llama 3 herd of models paper, now available on arXiv.
Jun 2024: GLoRe and In-context RL papers got accepted to ICML 2024
Apr 2024: Super happy to release Llama-3 preview models .
Mar 2024: New preprint on Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts is out.
Mar 2024: New preprint on GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements is out.
Mar 2024: New preprint on Teaching large language models to reason with reinforcement learning is out.
Feb 2024: Featured on TalkRL podcast to discuss our work on In-context Learning for Sequential Decision Making.
Dec 2023: New preprint on Generalization to New Sequential Decision Making Tasks with In-Context Learning is out.
Oct 2022: Our work Multi-Objective GFlowNets got accepted at ICML 2023
Aug 2022: Our work Continual Learning In Environments With Polynomial Mixing Times got accepted at NeurIPS 2022
Aug 2022: Co-organizing Machine Learning Reproducibility Challenge - 2022
Aug 2022: Joining MetaAI as an AI Resident
Apr 2022: Joining Recursion as a research intern
Oct 2021: Co-organizing Machine Learning Reproducibility Challenge - 2021
Oct 2021: Our work on compositional attention got accepted at ICLR 2022 as a spotlight presentation.
Oct 2021: New preprint out: Continual Learning In Environments With Polynomial Mixing Times
Sep 2020: Started my masters at Mila

Research

	The Llama 3 herd of models Llama Team [ Blog / Arxiv / Model Card ] We open-source Llama 3.1, a new family of foundation models with native support for multilinguality, coding, reasoning, and tool usage, featuring a 405B-parameter architecture with 128K context window. The models show comparable performance to GPT-4 across various tasks, and include Llama Guard 3 for safety.
	Llama-3 Preview Models Llama Team [ Blog ] We introduce Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. We achieve SOTA performance for LLM models at these scales.
	Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu Neural Information Processing Systems (NeurIPS), 2024* [ Paper / Website / tl;dr ] Introducing Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs. It's a versatile tool 🛠️ for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety.
	GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements Alex Havrilla, Sharath Chandra Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Railneau International Conference on Machine Learning (ICML), 2024 [ Paper / tl;dr ] How to bootstrap the reasoning refinement capabilities of LLMs using synthetic data? Introducing "GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements". Applied on GSM8K we can improve a strong RL finetuned LLama-2 13B by 12%
	Teaching Large Language Models to Reason with Reinforcement Learning Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu Arxiv. [ Paper / tl;dr ] In this work, we set out to understand how different algorithms fare at improving LLM reasoning from feedback. We compare expert iteration, PPO, and return-conditioned RL using Llama-2 as the base model.
	Generalization to New Sequential Decision Making Tasks with In-Context Learning Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu International Conference on Machine Learning (ICML), 2024 [ Paper / Code ] Training autonomous agents to learn new tasks from few demonstrations is challenging, especially for sequential decision making which is sensitive to errors. In this paper, we show that training transformers on diverse offline datasets of trajectories enables in-context learning of out-of-distribution sequential decision tasks from just a handful of demonstrations.
	Multi-Objective GFlowNets Moksh Jain, Sharath Chandra Raparthy, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio, Santiago Miret, Emmanuel Bengio International Conference on Machine Learning (ICML), 2024 [ Paper / Code ] We examine the standard approach to multi-objective optimization in machine learning applications like drug discovery and material design from a fresh perspective, noting the failure of existing methods to achieve a diverse set of Pareto-optimal candidates. Motivated by the successful use of GFlowNets in single-objective settings, we introduce a new approach, Multi-Objective GFlowNets (MOGFNs), which features a novel Conditional GFlowNet to handle a variety of single-objective sub-problems derived from decomposing the multi-objective problem. Our research, the first to empirically test Conditional GFlowNets, shows that MOGFNs outperform existing methods in Hypervolume, R2-distance, and candidate diversity, even demonstrating their effectiveness in active learning settings.
	Compositional Attention: Disentangling Search and Retrieval Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio and Guillaume Lajoie International Conference for Learning Representations (ICLR) 2022 Spotlight Presentation [ Paper / Code ] We view the standard Multi-Head attention mechanism from the "Search-Retrieval" perspective and highlight the rigid associations of keys and values. We propose a new drop-in replacement mechanism, Compositional Attention, where the redundancies highlighted are addressed by disentangling the Searches and Retrievals and composing them dynamically in a context dependent way.
	Continual Learning In Environments With Polynomial Mixing Times Matthew Riemer, Sharath Chandra Raparthy, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel and Irina Rish Neural Information Processing Systems (NeurIPS) 2022 [ Paper / Code] In this work, we concentrate on the major contributor to poor scaling, "Mixing time" of a markov chain induced by a policy. Mixing times, when ignored, can create myopic biases in the optimization and hence is an impediment to the success in the continual RL problems of greatest interest. We categorize the continual RL problems as Scalable MDPs and formally demonstrate that these exhibit polynomial mixing times. We comment on how exisiting RL algorithms face difficulties in this regime and propose three algorithms which clearly demonstrate sample efficiency.
	Curriculum in Gradient-Based Meta-Reinforcement Learning Bhairav Mehta, Tristan Deleu, Sharath Chandra Raparthy Christopher Pal, Liam Paull ICLR BeTR-RL workshop (2021) [ Paper] In this work we study the under-studied parameter in meta learning, "Task Distributions". We show that Model Agnostic Meta-Learning (MAML) is sensitive to task distributions, and learning a curriculum of tasks instead of uniformly sampling helps the adaptation performance substantially.
	CuNAS - CUriosity-driven Neural-Augmented Simulator Sharath Chandra Raparthy, Melissa Mozifian, Liam Paull and Florian Golemo RSS Sim2Real workshop (2021) [ Slides / Talk] Transfer of policies from simulation to physical robots is an important open problem in deep reinforcement learning. Prior work has introduced the model-based Neural-Augmented Simulator (NAS) method, which uses task-independent data to create a model of the differences between simulated and real robot. In this work, we show that this method is sensitive to the sampling of motor actions and the control frequency. To overcome this problem, we propose a simple extension based on artificial curiosity. We demonstrate on a physical robot, that this leads to a better exploration of the state space and consequently better transfer performance when compared to the NAS baseline.

Template from this website.