Συντάχθηκε 22-12-2017 00:20
από Michail Lagoudakis
Email συντάκτη: lagoudakis<στο>tuc.gr
Ενημερώθηκε:
-
Ιδιότητα: ΔΕΠ ΗΜΜΥ.
Αύριο Παρασκευή 22 Δεκεμβρίου 2017 στις 5μμ στην Αίθουσα του Εργαστηρίου Intelligence (141.Α14-2, 1ος όροφος Κτιρίου Επιστημών) ο υποψήφιος διδάκτορας της Σχολής ΗΜΜΥ Ιωάννης Ρεξάκης θα παρουσιάσει το θέμα της διδακτορικής του διατριβής στην τριμελή του επιτροπή (Αν. Καθ. Μ. Λαγουδάκης-επιβλέπων, Καθ. Ε. Πετράκης, Καθ. Α. Ποταμιάνος-μέσω τηλεδιάσκεψης από ΗΠΑ). Όσοι ενδιαφέρεστε να παρακολουθήσετε, είστε ευπρόσδεκτοι. Ακολουθούν λεπτομέρειες (τίτλος, περίληψη). Με την ευκαιρία, πολλές ευχές για Χαρούμενα Χριστούγεννα και Ευτυχισμένο Νέο Έτος!
Directed Exploration of Policy Space in Reinforcement Learning
Ioannis Rexakis, Ph.D. Candidate, School of ECE, TUC
Abstract
Reinforcement learning refers to a broad class of learning problems. Autonomous agents typically try to learn how to achieve their goal solely by interacting with their environment. They perform a trial-and-error search and they receive delayed rewards (or penalties). The challenge is to learn a good or even optimal policy, one that maximizes the total long-term reward.
A decision policy for an autonomous agent is the knowledge of what to do in any possible state in order to achieve the long-term goal efficiently. Several recent learning approaches within decision making under uncertainty suggest the use of classifiers for the compact (approximate) representation of policies. However, the space of possible policies, even under such structured representations, is huge and must be searched carefully to avoid computationally expensive policy simulations.
In this dissertation, our first contribution uncovers the structure that exists in optimal policies by deriving optimal policies for two standard two-dimensional reinforcement learning domains, namely the Inverted Pendulum and the Mountain Car. We found that optimal policies have significant structure and a high degree of locality, i.e. dominant actions persist over large continuous areas within the state space. This observation provides sufficient justification for the appropriateness of classifiers for approximate policy representation.
Our main contribution is the proposal of two Directed Policy Search algorithms for the efficient exploration of policy space provided by Support Vector Machines and Relevance Vector Machines. The first exploits the structure of the classifiers used for policy representation. The second uses an importance function to rank the states, based on action prevalence. In both approaches, the search is focused on areas where there is change of action domination. This directed focus on critical parts of the state space iteratively leads to refinement and improvement of the underlying policy and delivers excellent control policies in only a few iterations with a relatively small rollout budget, yielding significant computational time savings.
We demonstrate the proposed algorithms and compare them to prior work on three standard reinforcement learning domains: Inverted Pendulum (two-dimensional), Mountain Car (two-dimensional), Acrobot (four-dimensional).