ray-project/ray

▶ doc/source/data/shuffling-data.rst

1

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, Tie-Yan Liu

2017

1 reference

When using stochastic gradient descent to solve large-scale machine learning problems, a common practice of data processing is to shuffle the training data, partition the data across multiple machines...

View Paper PDF View on GitHub

▶ doc/source/ray-contribute/whitepaper.rst

1

Exoshuffle: An Extensible Shuffle Architecture

Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, San...

2022

4 references

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithi...

View Paper PDF View on GitHub

▶ doc/source/ray-overview/examples/e2e-timeseries/README.md

1

Are Transformers Effective for Time Series Forecasting?

Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu

2022

1 reference

Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity...

View Paper PDF View on GitHub

▶ doc/source/ray-overview/getting-started.md

6

Exoshuffle: An Extensible Shuffle Architecture

Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, San...

2022

4 references

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithi...

View Paper PDF View on GitHub

RLlib: Abstractions for Distributed Reinforcement Learning

Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzale...

2017

3 references

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distr...

View Paper PDF View on GitHub

RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E. Gonzalez, Ion Stoica

2020

1 reference

Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In ...

View Paper PDF View on GitHub

Tune: A Research Platform for Distributed Model Selection and Training

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

2018

6 references

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many h...

View Paper PDF View on GitHub

Ray: A Distributed Framework for Emerging AI Applications

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih El...

2017

2 references

The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in term...

View Paper PDF View on GitHub

Real-Time Machine Learning: The Missing Pieces

Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smit...

2017

2 references

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time deci...

View Paper PDF View on GitHub

▶ doc/source/rllib/index.rst

1

RLlib: Abstractions for Distributed Reinforcement Learning

Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzale...

2017

3 references

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distr...

View Paper PDF View on GitHub

▶ doc/source/rllib/rllib-algorithms.rst

15

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

2017

4 references

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective fu...

View Paper PDF View on GitHub

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Mar...

2013

1 reference

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, ...

View Paper PDF View on GitHub

Rainbow: Combining Improvements in Deep Reinforcement Learning

Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horga...

2017

3 references

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combi...

View Paper PDF View on GitHub

Rainbow: Combining Improvements in Deep Reinforcement Learning

Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horga...

2017

3 references

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combi...

View Paper PDF View on GitHub

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challe...

View Paper View on GitHub

Soft Actor-Critic Algorithms and Applications

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar,...

2018

4 references

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer fr...

View Paper PDF View on GitHub

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica

2019

6 references

The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning arch...

View Paper PDF View on GitHub

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica

2019

6 references

The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning arch...

View Paper PDF View on GitHub

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vla...

2018

6 references

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and exte...

View Paper PDF View on GitHub

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vla...

2018

6 references

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and exte...

View Paper PDF View on GitHub

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vla...

2018

6 references

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and exte...

View Paper PDF View on GitHub

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

2020

3 references

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective ...

View Paper PDF View on GitHub

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, Sergey Levine

2021

1 reference

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the dev...

View Paper PDF View on GitHub

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

2017

6 references

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore...

View Paper PDF View on GitHub

▶ doc/source/rllib/rllib-examples.rst

4

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

2017

6 references

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore...

View Paper PDF View on GitHub

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shi...

2018

2 references

In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion i...

View Paper PDF View on GitHub

The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning

Jan Blumenkamp, Amanda Prorok

2020

1 reference

Many real-world problems require the coordination of multiple autonomous agents. Recent work has shown the promise of Graph Neural Networks (GNNs) to learn explicit communication strategies that enabl...

View Paper PDF View on GitHub

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel

2017

1 reference

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social di...

View Paper PDF View on GitHub

▶ doc/source/rllib/rllib-offline.rst

3

Horizon: Facebook's Open Source Applied Reinforcement Learning Platform

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Naraya...

2018

2 references

In this paper we present Horizon, Facebook's open source applied reinforcement learning (RL) platform. Horizon is an end-to-end platform designed to solve industry applied RL problems where datasets a...

View Paper PDF View on GitHub

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

2015

6 references

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators...

View Paper PDF View on GitHub

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

2015

6 references

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators...

View Paper PDF View on GitHub

▶ doc/source/rllib/rllib-replay-buffers.rst

2

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

2015

5 references

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, th...

View Paper PDF View on GitHub

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

2015

5 references

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, th...

View Paper PDF View on GitHub

▶ doc/source/rllib/rl-modules.rst

1

Horizon: Facebook's Open Source Applied Reinforcement Learning Platform

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Naraya...

2018

2 references

In this paper we present Horizon, Facebook's open source applied reinforcement learning (RL) platform. Horizon is an end-to-end platform designed to solve industry applied RL problems where datasets a...

View Paper PDF View on GitHub

▶ doc/source/templates/04_finetuning_llms_with_deepspeed/finetune_hf_llm.py

1

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu...

2021

2 references

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine...

View Paper PDF View on GitHub

▶ doc/source/templates/04_finetuning_llms_with_deepspeed/README.md

1

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu...

2021

2 references

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine...

View Paper PDF View on GitHub

▶ doc/source/tune/api/schedulers.rst

2

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar

2016

2 references

Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we fo...

View Paper PDF View on GitHub

Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

Jack Parker-Holder, Vu Nguyen, Stephen Roberts

2020

2 references

Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL) where a small change in the configuration can...

View Paper PDF View on GitHub

▶ doc/source/tune/api/suggestion.rst

1

BOHB: Robust and Efficient Hyperparameter Optimization at Scale

Stefan Falkner, Aaron Klein, Frank Hutter

2018

1 reference

Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically compu...

View Paper PDF View on GitHub

▶ doc/source/tune/examples/includes/async_hyperband_example.rst

1

A System for Massively Parallel Hyperparameter Tuning

Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, Ameet ...

2018

2 references

Modern learning models are characterized by large hyperparameter spaces and long training times. These properties, coupled with the rise of parallel computing and the growing demand to productionize m...

View Paper PDF View on GitHub

▶ doc/source/tune/index.rst

3

NeuroCard: One Cardinality Estimator for All Tables

2020

1 reference

Query optimizers rely on accurate cardinality estimates to produce good execution plans. Despite decades of research, existing cardinality estimators are inaccurate for complex queries, due to making ...

View Paper DOI View on GitHub

Tune: A Research Platform for Distributed Model Selection and Training

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

2018

6 references

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many h...

View Paper PDF View on GitHub

Tune: A Research Platform for Distributed Model Selection and Training

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

2018

6 references

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many h...

View Paper PDF View on GitHub

▶ doc/source/tune/key-concepts.rst

2

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar

2016

2 references

Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we fo...

View Paper PDF View on GitHub

Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

Jack Parker-Holder, Vu Nguyen, Stephen Roberts

2020

2 references

Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL) where a small change in the configuration can...

View Paper PDF View on GitHub

▶ python/ray/data/_internal/planner/exchange/interfaces.py

1

Volcano - An Extensible and Parallel Query Evaluation System

G. Graefe

1994

1 reference

To investigate the interactions of extensibility and parallelism in database query processing, we have developed a new dataflow query execution system called Volcano. The Volcano effort provides a ric...

View Paper DOI View on GitHub

▶ python/ray/data/_internal/planner/exchange/pull_based_shuffle_task_scheduler.py

1

MapReduce: simplified data processing on large clusters

Jay B. Dean, Sanjay Ghemawat

2008

1 reference

MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in...

View Paper PDF DOI View on GitHub

▶ python/ray/data/_internal/planner/exchange/push_based_shuffle_task_scheduler.py

1

Exoshuffle: An Extensible Shuffle Architecture

Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, San...

2022

4 references

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithi...

View Paper PDF View on GitHub

▶ python/ray/tune/README.rst

2

Tune: A Research Platform for Distributed Model Selection and Training

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

2018

6 references

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many h...

View Paper PDF View on GitHub

Tune: A Research Platform for Distributed Model Selection and Training

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

2018

6 references

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many h...

View Paper PDF View on GitHub

▶ python/ray/tune/schedulers/async_hyperband.py

1

A System for Massively Parallel Hyperparameter Tuning

Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, Ameet ...

2018

2 references

Modern learning models are characterized by large hyperparameter spaces and long training times. These properties, coupled with the rise of parallel computing and the growing demand to productionize m...

View Paper PDF View on GitHub

▶ python/ray/tune/schedulers/pb2.py

1

Population Based Training of Neural Networks

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, O...

2017

2 references

Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss fu...

View Paper PDF View on GitHub

▶ python/ray/tune/schedulers/pbt.py

1

Population Based Training of Neural Networks

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, O...

2017

2 references

Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss fu...

View Paper PDF View on GitHub

▶ README.rst

5

Exoshuffle: An Extensible Shuffle Architecture

Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, San...

2022

4 references

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithi...

View Paper PDF View on GitHub

Ray: A Distributed Framework for Emerging AI Applications

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih El...

2017

2 references

The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in term...

View Paper PDF View on GitHub

Real-Time Machine Learning: The Missing Pieces

Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smit...

2017

2 references

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time deci...

View Paper PDF View on GitHub

RLlib: Abstractions for Distributed Reinforcement Learning

Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzale...

2017

3 references

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distr...

View Paper PDF View on GitHub

Tune: A Research Platform for Distributed Model Selection and Training

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

2018

6 references

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many h...

View Paper PDF View on GitHub

▶ rllib/algorithms/appo/appo.py

2

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica

2019

6 references

The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning arch...

View Paper PDF View on GitHub

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

2015

6 references

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators...

View Paper PDF View on GitHub

▶ rllib/algorithms/appo/README.md

2

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

2017

4 references

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective fu...

View Paper PDF View on GitHub

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica

2019

6 references

The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning arch...

View Paper PDF View on GitHub

▶ rllib/algorithms/appo/torch/appo_torch_learner.py

1

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica

2019

6 references

The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning arch...

View Paper PDF View on GitHub

▶ rllib/algorithms/appo/utils.py

1

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica

2019

6 references

The practical usage of reinforcement learning agents is often bottlenecked by the duration of training time. To accelerate training, practitioners often turn to distributed reinforcement learning arch...

View Paper PDF View on GitHub

▶ rllib/algorithms/cql/README.md

1

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

2020

3 references

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective ...

View Paper PDF View on GitHub

▶ rllib/algorithms/cql/torch/cql_torch_learner.py

2

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challe...

View Paper View on GitHub

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

2020

3 references

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective ...

View Paper PDF View on GitHub

▶ rllib/algorithms/dqn/README.md

5

Deep Reinforcement Learning with Double Q-learning

Hado van Hasselt, Arthur Guez, David Silver

2015

1 reference

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm p...

View Paper PDF View on GitHub

Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

2015

1 reference

In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks...

View Paper PDF View on GitHub

A Distributional Perspective on Reinforcement Learning

Marc G. Bellemare, Will Dabney, Rémi Munos

2017

1 reference

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common app...

View Paper PDF View on GitHub

Distributed Prioritized Experience Replay

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Sil...

2018

2 references

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm deco...

View Paper PDF View on GitHub

Rainbow: Combining Improvements in Deep Reinforcement Learning

Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horga...

2017

3 references

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combi...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/dreamerv3_learner.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/__init__.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/README.md

4

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/torch/dreamerv3_torch_learner.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/torch/dreamerv3_torch_rl_module.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/actor_network.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/cnn_atari.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/continue_predictor.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/conv_transpose_atari.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/dynamics_predictor.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/mlp.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/representation_layer.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/reward_predictor_layer.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/reward_predictor.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/sequence_model.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/components/vector_decoder.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/critic_network.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/dreamer_model.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/torch/models/world_model.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/utils/__init__.py

1

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/algorithms/dreamerv3/utils/summaries.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/algorithms/impala/torch/vtrace_torch_v2.py

1

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vla...

2018

6 references

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and exte...

View Paper PDF View on GitHub

▶ rllib/algorithms/impala/vtrace_tf.py

1

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vla...

2018

6 references

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and exte...

View Paper PDF View on GitHub

▶ rllib/algorithms/impala/vtrace_torch.py

1

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vla...

2018

6 references

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and exte...

View Paper PDF View on GitHub

▶ rllib/algorithms/ppo/ppo.py

1

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

2015

6 references

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators...

View Paper PDF View on GitHub

▶ rllib/algorithms/ppo/README.md

2

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

2017

4 references

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective fu...

View Paper PDF View on GitHub

Trust Region Policy Optimization

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel

2015

1 reference

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical alg...

View Paper PDF View on GitHub

▶ rllib/algorithms/sac/README.md

3

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challe...

View Paper View on GitHub

Soft Actor-Critic for Discrete Action Settings

Petros Christodoulou

2019

1 reference

Soft Actor-Critic is a state-of-the-art reinforcement learning algorithm for continuous action settings that is not applicable to discrete action settings. Many important settings involve discrete act...

View Paper PDF View on GitHub

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challe...

View Paper View on GitHub

▶ rllib/algorithms/sac/torch/sac_torch_learner.py

3

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challe...

View Paper View on GitHub

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challe...

View Paper View on GitHub

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challe...

View Paper View on GitHub

▶ rllib/evaluation/postprocessing.py

1

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

2015

6 references

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators...

View Paper PDF View on GitHub

▶ rllib/examples/algorithms/dqn/benchmark_dqn_atari.py

1

Soft Actor-Critic Algorithms and Applications

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar,...

2018

4 references

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer fr...

View Paper PDF View on GitHub

▶ rllib/examples/algorithms/dqn/benchmark_dqn_atari_rllib_preprocessing.py

1

Soft Actor-Critic Algorithms and Applications

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar,...

2018

4 references

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer fr...

View Paper PDF View on GitHub

▶ rllib/examples/algorithms/dreamerv3/atari_100k_dreamerv3.py

3

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract)

Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, J. Veness, Matthew J. Hausknecht, Michael Bowli...

2018

2 references

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of diffe...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/atari_200M_dreamerv3.py

3

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract)

Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, J. Veness, Matthew J. Hausknecht, Michael Bowli...

2018

2 references

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of diffe...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/cartpole_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/dm_control_suite_vision_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/flappy_bird_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/frozenlake_2x2_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/frozenlake_4x4_deterministic_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/gymnasium_robotics_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/highway_env_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/dreamerv3/pendulum_dreamerv3.py

2

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Atari with Discrete World Models.

Yujin Tang, Duong Nguyen, David Ha

2020

20 references

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on impor...

View Paper PDF DOI View on GitHub

▶ rllib/examples/algorithms/ppo/benchmark_ppo_mujoco.py

1

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

2017

4 references

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective fu...

View Paper PDF View on GitHub

▶ rllib/examples/algorithms/sac/benchmark_sac_mujoco.py

1

Soft Actor-Critic Algorithms and Applications

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar,...

2018

4 references

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer fr...

View Paper PDF View on GitHub

▶ rllib/examples/compute_adapted_gae_on_postprocess_trajectory.py

1

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

2015

6 references

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators...

View Paper PDF View on GitHub

▶ rllib/examples/curiosity/intrinsic_curiosity_model_based_curiosity.py

1

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

2017

6 references

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore...

View Paper PDF View on GitHub

▶ rllib/examples/envs/classes/multi_agent/bandit_envs_discrete.py

1

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Carlos Riquelme, George Tucker, Jasper Snoek

2018

1 reference

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and...

View Paper PDF View on GitHub

▶ rllib/examples/learners/classes/intrinsic_curiosity_learners.py

1

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

2017

6 references

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore...

View Paper PDF View on GitHub

▶ rllib/examples/multi_agent/two_step_game_with_grouped_agents.py

1

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shi...

2018

2 references

In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion i...

View Paper PDF View on GitHub

▶ rllib/examples/rl_modules/classes/intrinsic_curiosity_model_rlm.py

1

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

2017

6 references

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore...

View Paper PDF View on GitHub

▶ rllib/models/tf/attention_net.py

2

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kais...

2017

25 references

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and d...

View Paper PDF DOI View on GitHub

Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakum...

2019

2 references

Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in ...

View Paper PDF View on GitHub

▶ rllib/models/tf/layers/multi_head_attention.py

1

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kais...

2017

25 references

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and d...

View Paper PDF DOI View on GitHub

▶ rllib/models/tf/tf_action_dist.py

2

Categorical Reparameterization with Gumbel-Softmax

Eric Jang, Shixiang Gu, Ben Poole

2016

5 references

Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpro...

View Paper PDF DOI View on GitHub

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

Chris J. Maddison, Andriy Mnih, Yee Whye Teh

2016

9 references

The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable funct...

View Paper PDF DOI View on GitHub

▶ rllib/models/torch/attention_net.py

2

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kais...

2017

25 references

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and d...

View Paper PDF DOI View on GitHub

Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakum...

2019

2 references

Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in ...

View Paper PDF View on GitHub

▶ rllib/models/torch/mingpt.py

1

Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel

2016

13 references

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative dis...

View Paper PDF DOI View on GitHub

▶ rllib/models/torch/modules/multi_head_attention.py

1

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kais...

2017

25 references

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and d...

View Paper PDF DOI View on GitHub

▶ rllib/offline/estimators/direct_method.py

1

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

2019

5 references

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasi...

View Paper PDF View on GitHub

▶ rllib/offline/estimators/doubly_robust.py

1

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

2019

5 references

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasi...

View Paper PDF View on GitHub

▶ rllib/offline/estimators/fqe_torch_model.py

1

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

2019

5 references

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasi...

View Paper PDF View on GitHub

▶ rllib/offline/estimators/importance_sampling.py

1

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

2019

5 references

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasi...

View Paper PDF View on GitHub

▶ rllib/offline/estimators/weighted_importance_sampling.py

1

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

2019

5 references

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasi...

View Paper PDF View on GitHub

▶ rllib/utils/exploration/curiosity.py

1

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell

2017

6 references

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore...

View Paper PDF View on GitHub

▶ rllib/utils/exploration/parameter_noise.py

1

Parameter Space Noise for Exploration

Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim ...

2017

1 reference

Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which ...

View Paper PDF View on GitHub

▶ rllib/utils/exploration/per_worker_epsilon_greedy.py

1

Distributed Prioritized Experience Replay

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Sil...

2018

2 references

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm deco...

View Paper PDF View on GitHub

▶ rllib/utils/exploration/random_encoder.py

1

State Entropy Maximization with Random Encoders for Efficient Exploration

Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee

2021

1 reference

Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still r...

View Paper PDF View on GitHub

▶ rllib/utils/replay_buffers/multi_agent_prioritized_episode_buffer.py

1

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

2015

5 references

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, th...

View Paper PDF View on GitHub

▶ rllib/utils/replay_buffers/prioritized_episode_buffer.py

1

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

2015

5 references

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, th...

View Paper PDF View on GitHub

▶ rllib/utils/replay_buffers/prioritized_replay_buffer.py

1

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

2015

5 references

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, th...

View Paper PDF View on GitHub

▶ rllib/utils/tf_utils.py

3

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ rllib/utils/torch_utils.py

3

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

2023

41 references

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor...

View Paper PDF View on GitHub

▶ src/ray/thirdparty/dlmalloc.c

1

Dynamic storage allocation: A survey and critical review

Paul R. Wilson, Mark S. Johnstone, Michael J. Neely, David B. Boles

1995

3 references

View Paper DOI View on GitHub

Paper References by File