Start Date: 06/02/2019
Course Type: Common Course |
Course Link: https://www.coursera.org/learn/practical-rl
Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.Welcome to the Reinforcement Learning course. Here you will find out about: - foundations of RL methods: value/policy iteration, q-learning, policy gradient, etc. --- with math & batteries included - using deep neural networks for RL tasks --- also known as "the hype train" - state of the art RL algorithms --- and how to apply duct tape to them for practical problems. - and, of course, teaching your neural network to play games --- because that's what everyone thinks RL is about. We'll also use it for seq2seq and contextual bandits. Jump in. It's gonna be fun!
Welcome to the Reinforcement Learning course. Here you will find out about: - foundations of RL m
Article | Example |
---|---|
Reinforcement learning | Successes of reinforcement learning are listed here. |
Reinforcement learning | The basic reinforcement learning model consists of: |
Reinforcement learning | Two components make reinforcement learning powerful: |
Reinforcement learning | There is also a growing interest in real life applications of reinforcement learning. |
Reinforcement learning | A reinforcement learning agent interacts with its environment in discrete time steps. |
Reinforcement learning | Multiagent or Distributed Reinforcement Learning is also a topic of interest in current research. |
Reinforcement learning | In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible. |
Reinforcement learning | Thanks to these two key components, reinforcement learning can be used in large environments in any of the following situations: |
Reinforcement learning | There are multiple applications of reinforcement learning to generate models and train them to play video games, such as Atari games. In these models, reinforcement learning finds the actions with the best reward at each play. This method is a widely used method in combination with deep neural networks to teach computers to play Atari video games. |
Reinforcement | A great many researchers subsequently expanded our understanding of reinforcement and challenged some of Skinner's conclusions. For example, Azrin and Holz defined punishment as a “consequence of behavior that reduces the future probability of that behavior,” and some studies have shown that positive reinforcement and punishment are equally effective in modifying behavior. Research on the effects of positive reinforcement, negative reinforcement and punishment continue today as those concepts are fundamental to learning theory and apply to many practical applications of that theory. |
Reinforcement learning | Reinforcement learning algorithms such as TD learning are also being investigated as a model for Dopamine-based learning in the brain. In this model, the dopaminergic projections from the substantia nigra to the basal ganglia function as the prediction error. Reinforcement learning has also been used as a part of the model for human skill learning, especially in relation to the interaction between implicit and explicit learning in skill acquisition (the first publication on this application was in 1995-1996, and there have been many follow-up studies). |
Reinforcement learning | Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs. |
Reinforcement learning | Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take "actions" in an "environment" so as to maximize some notion of cumulative "reward". The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called "approximate dynamic programming". The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality. |
Reinforcement learning | The first two of these problems could be considered planning problems (since some form of the model is available), while the last one could be considered as a genuine learning problem. However, under a reinforcement learning methodology both planning problems would be converted to machine learning problems. |
Reinforcement learning | In reinforcement learning methods the expectations are approximated by averaging over samples and one uses function approximation techniques to cope with the need to represent value functions over large state-action spaces. |
Reinforcement learning | Thus, reinforcement learning is particularly well-suited to problems which include a long-term versus short-term reward trade-off. It has been applied successfully to various problems, including robot control, elevator scheduling, telecommunications, backgammon, checkers and go (AlphaGo). |
Reinforcement learning | So far, the discussion was restricted to how policy iteration can be used as a basis of the designing reinforcement learning algorithms. Equally importantly, value iteration can also be used as a starting point, giving rise to the Q-Learning algorithm and its many variants. |
Reinforcement learning | The goal of a reinforcement learning agent is to collect as much reward as possible. The agent can choose any action as a function of the history and it can even randomize its action selection. |
Reinforcement learning | Most reinforcement learning papers are published at the major machine learning and AI conferences (ICML, NIPS, AAAI, IJCAI, UAI, AI and Statistics) and journals (JAIR, JMLR, Machine learning journal, IEEE T-CIAIG). Some theory papers are published at COLT and ALT. However, many papers appear in robotics conferences (IROS, ICRA) and the "agent" conference AAMAS. Operations researchers publish their papers at the INFORMS conference and, for example, in the Operation Research, and the Mathematics of Operations Research journals. Control researchers publish their papers at the CDC and ACC conferences, or, e.g., in the journals IEEE Transactions on Automatic Control, or Automatica, although applied works tend to be published in more specialized journals. The Winter Simulation Conference also publishes many relevant papers. Other than this, papers also published in the major conferences of the neural networks, fuzzy, and evolutionary computation communities. The annual IEEE symposium titled Approximate Dynamic Programming and Reinforcement Learning (ADPRL) and the biannual European Workshop on Reinforcement Learning (EWRL) are two regularly held meetings where RL researchers meet. |
Reinforcement learning | In inverse reinforcement learning (IRL), no reward function is given. Instead, one tries to extract the reward function given an observed behavior from an expert. The idea is to mimic the observed behavior which is often optimal or close to optimal |