Sharath Chandra Raparthy
I am currently a Member of Technical Staff at
Reka AI, working on
general purpose multi-modal AI agents.
Prior to joining Reka AI, I was an AI Resident at
FAIR at Meta,
collaborated closely with Roberta Raileanu. I was a core
contributor for Llama 3 where I worked on tool-use and
mathematical reasoning capabilities for Llama-3 models. My
research at FAIR primarily include LLM
reasoning/refinement, open-ended learning and in-context
reinforcement learning. I co-led Rainbow Teaming, a method
identifies vulnerabilities in LLMs and generates
high-quality and diverse synthetic data to improve LLM
robustness.
Before joining FAIR, I completed a Master's (with thesis)
at Mila under the
guidance of Prof.
Irina Rish. My academic journey also included a valuable stint at
Recursion, where I
worked on GFlowNets for Drug Discovery.
Outside of AI research, my passions include photography,
long-distance running, reading and cooking.
Email
/
GitHub
/
Google Scholar
|
|
News
Research
|
The Llama 3 herd of models
Llama Team
[
Blog
/
Arxiv
/
Model Card
]
We open-source Llama 3.1, a new family of foundation
models with native support for multilinguality, coding,
reasoning, and tool usage, featuring a 405B-parameter
architecture with 128K context window. The models show
comparable performance to GPT-4 across various tasks, and
include Llama Guard 3 for safety.
|
|
Llama-3 Preview Models
Llama Team
[
Blog
]
We introduce Llama 3 family of large language models
(LLMs), a collection of pretrained and instruction tuned
generative text models in 8 and 70B sizes. We achieve SOTA
performance for LLM models at these scales.
|
|
Rainbow Teaming: Open-Ended Generation of Diverse
Adversarial Prompts
Mikayel Samvelyan*,
Sharath Chandra Raparthy*, Andrei Lupu*,
Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao,
Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim
Rocktäschel, Roberta Raileanu
Neural Information Processing Systems (NeurIPS), 2024
[
Paper
/
Website
/
tl;dr
]
Introducing Rainbow Teaming, a new method for generating
diverse adversarial prompts for LLMs via LLMs. It's a
versatile tool 🛠️ for diagnosing model vulnerabilities
across domains and creating data to enhance robustness &
safety.
|
|
GLoRe: When, Where, and How to Improve LLM Reasoning
via Global and Local Refinements
Alex Havrilla, Sharath Chandra Raparthy,
Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym
Zhuravinskyi, Eric Hambro, Roberta Railneau
International Conference on Machine Learning (ICML),
2024
[
Paper
/
tl;dr
]
How to bootstrap the reasoning refinement capabilities of
LLMs using synthetic data? Introducing "GLoRe: When,
Where, and How to Improve LLM Reasoning via Global and
Local Refinements". Applied on GSM8K we can improve a
strong RL finetuned LLama-2 13B by 12%
|
|
Teaching Large Language Models to Reason with
Reinforcement Learning
Alex Havrilla, Yuqing Du,
Sharath Chandra Raparthy, Christoforos
Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric
Hambro, Sainbayar Sukhbaatar, Roberta Raileanu
Arxiv.
[
Paper
/
tl;dr
]
In this work, we set out to understand how different
algorithms fare at improving LLM reasoning from feedback.
We compare expert iteration, PPO, and return-conditioned
RL using Llama-2 as the base model.
|
|
Generalization to New Sequential Decision Making Tasks
with In-Context Learning
Sharath Chandra Raparthy, Eric Hambro,
Robert Kirk, Mikael Henaff, Roberta Raileanu
International Conference on Machine Learning (ICML),
2024
[
Paper
/
Code
]
Training autonomous agents to learn new tasks from few
demonstrations is challenging, especially for sequential
decision making which is sensitive to errors. In this
paper, we show that training transformers on diverse
offline datasets of trajectories enables in-context
learning of out-of-distribution sequential decision tasks
from just a handful of demonstrations.
|
|
Multi-Objective GFlowNets
Moksh Jain, Sharath Chandra Raparthy, Alex
Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio,
Santiago Miret, Emmanuel Bengio
International Conference on Machine Learning (ICML),
2024
[
Paper
/
Code
]
We examine the standard approach to multi-objective
optimization in machine learning applications like drug
discovery and material design from a fresh perspective,
noting the failure of existing methods to achieve a
diverse set of Pareto-optimal candidates. Motivated by the
successful use of GFlowNets in single-objective settings,
we introduce a new approach, Multi-Objective GFlowNets
(MOGFNs), which features a novel Conditional GFlowNet to
handle a variety of single-objective sub-problems derived
from decomposing the multi-objective problem. Our
research, the first to empirically test Conditional
GFlowNets, shows that MOGFNs outperform existing methods
in Hypervolume, R2-distance, and candidate diversity, even
demonstrating their effectiveness in active learning
settings.
|
|
Compositional Attention: Disentangling Search and
Retrieval
Sarthak Mittal,
Sharath Chandra Raparthy, Irina Rish,
Yoshua Bengio and Guillaume Lajoie
International Conference for Learning Representations
(ICLR) 2022
Spotlight Presentation
[
Paper
/
Code
]
We view the standard Multi-Head attention mechanism from
the "Search-Retrieval" perspective and highlight the rigid
associations of keys and values. We propose a new drop-in
replacement mechanism, Compositional Attention, where the
redundancies highlighted are addressed by disentangling
the Searches and Retrievals and composing them dynamically
in a context dependent way.
|
|
Continual Learning In Environments With Polynomial
Mixing Times
Matthew Riemer*, Sharath Chandra Raparthy*,
Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel
and Irina Rish
Neural Information Processing Systems (NeurIPS) 2022
[
Paper /
Code]
In this work, we concentrate on the major contributor to
poor scaling, "Mixing time" of a markov chain induced by a
policy. Mixing times, when ignored, can create myopic
biases in the optimization and hence is an impediment to
the success in the continual RL problems of greatest
interest. We categorize the continual RL problems as
Scalable MDPs and formally demonstrate that these exhibit
polynomial mixing times. We comment on how exisiting RL
algorithms face difficulties in this regime and propose
three algorithms which clearly demonstrate sample
efficiency.
|
|
Curriculum in Gradient-Based Meta-Reinforcement
Learning
Bhairav Mehta, Tristan Deleu*,
Sharath Chandra Raparthy* Christopher Pal,
Liam Paull
ICLR BeTR-RL workshop (2021)
[
Paper]
In this work we study the under-studied parameter in meta
learning, "Task Distributions". We show that Model
Agnostic Meta-Learning (MAML) is sensitive to task
distributions, and learning a curriculum of tasks instead
of uniformly sampling helps the adaptation performance
substantially.
|
|
CuNAS - CUriosity-driven Neural-Augmented
Simulator
Sharath Chandra Raparthy, Melissa Mozifian,
Liam Paull and Florian Golemo
RSS Sim2Real workshop (2021)
[
Slides
/
Talk]
Transfer of policies from simulation to physical robots is
an important open problem in deep reinforcement learning.
Prior work has introduced the model-based Neural-Augmented
Simulator (NAS) method, which uses task-independent data
to create a model of the differences between simulated and
real robot. In this work, we show that this method is
sensitive to the sampling of motor actions and the control
frequency. To overcome this problem, we propose a simple
extension based on artificial curiosity. We demonstrate on
a physical robot, that this leads to a better exploration
of the state space and consequently better transfer
performance when compared to the NAS baseline.
|
|