Machine Learning

A Distributional Perspective on Reinforcement Learning

Marc G. Bellemare, Will Dabney, Rémi Munos

2017

1 reference

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the e...

View Paper PDF

Are Transformers Effective for Time Series Forecasting?

Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu

2022

1 reference

Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity of this line of research in this work. Specifical...

View Paper PDF

Deep Reinforcement Learning with Double Q-learning

Hado van Hasselt, Arthur Guez, David Silver

2015

1 reference

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prev...

View Paper PDF

Distributed Prioritized Experience Replay

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Sil...

2018

2 references

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact wi...

View Paper PDF

Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

2015

1 reference

In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we prese...

View Paper PDF

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

2019

5 references

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, t...

View Paper PDF

Exoshuffle: An Extensible Shuffle Architecture

Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, San...

2022

4 references

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are costly to dev...

View Paper PDF

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vla...

2018

6 references

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distri...

View Paper PDF

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu...

2021

2 references

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, beco...

View Paper PDF

Parameter Space Noise for Exploration

Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim ...

2017

1 reference

Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a rich...

View Paper PDF

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

2017

4 references

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas s...

View Paper PDF

Ray: A Distributed Framework for Emerging AI Applications

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih El...

2017

2 references

The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, w...

View Paper PDF

Real-Time Machine Learning: The Missing Pieces

Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smit...

2017

2 references

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of ...

View Paper PDF

RLlib: Abstractions for Distributed Reinforcement Learning

Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzale...

2017

3 references

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapt...

View Paper PDF

RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E. Gonzalez, Ion Stoica

2020

1 reference

Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges posed by ...

View Paper PDF

Soft Actor-Critic for Discrete Action Settings

Petros Christodoulou

2019

1 reference

Soft Actor-Critic is a state-of-the-art reinforcement learning algorithm for continuous action settings that is not applicable to discrete action settings. Many important settings involve discrete actions, however, and so here we derive an alternativ...

View Paper PDF

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine

2018

7 references

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle conv...

View Paper

State Entropy Maximization with Random Encoders for Efficient Exploration

Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee

2021

1 reference

Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still remains a challenge. This paper presents Random Enc...

View Paper PDF

Trust Region Policy Optimization

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel

2015

1 reference

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (T...

View Paper PDF

Tune: A Research Platform for Distributed Model Selection and Training

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

2018

6 references

Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many hyperparameter search algorithms have been proposed...

View Paper PDF

Repositories

huggingface/transformers

microsoft/onnxruntime

mlflow/mlflow

pytorch/pytorch

ray-project/ray

scikit-learn/scikit-learn

tensorflow/tensorflow

Papers

A Distributional Perspective on Reinforcement Learning

Are Transformers Effective for Time Series Forecasting?

Deep Reinforcement Learning with Double Q-learning

Distributed Prioritized Experience Replay

Dueling Network Architectures for Deep Reinforcement Learning

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Exoshuffle: An Extensible Shuffle Architecture

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

LoRA: Low-Rank Adaptation of Large Language Models

Parameter Space Noise for Exploration

Proximal Policy Optimization Algorithms

Ray: A Distributed Framework for Emerging AI Applications

Real-Time Machine Learning: The Missing Pieces

RLlib: Abstractions for Distributed Reinforcement Learning

RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

Soft Actor-Critic for Discrete Action Settings

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

State Entropy Maximization with Random Encoders for Efficient Exploration

Trust Region Policy Optimization

Tune: A Research Platform for Distributed Model Selection and Training