Detail výsledku

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

ABATE, A.; ČEŠKA, M.; KWIATKOWSKA, M. Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations. In Proceedings of 14th International Symposium on Automated Technology for Verification and Analysis. Lecture Notes in Computer Science. Heidelberg: Springer Verlag, 2016. p. 13-31. ISBN: 978-3-319-46519-7.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Abate Alessandro
Češka Milan, doc. RNDr., Ph.D., UITS (FIT)
Kwiatkowska Marta

Abstrakt

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

Klíčová slova

Markov Decision Process, Policy Interaction, Approximation, Adaptive aggregation

URL

http://link.springer.com/chapter/10.1007%2F978-3-319-46520-3_2

Rok

2016

Strany

13–31

Sborník

Proceedings of 14th International Symposium on Automated Technology for Verification and Analysis

Řada

Lecture Notes in Computer Science

Svazek

9938

Konference

14th International Symposium on Automated Technology for Verification and Analysis

ISBN

978-3-319-46519-7

Vydavatel

Springer Verlag

Místo

Heidelberg

DOI

10.1007/978-3-319-46520-3_2

UT WoS

000389808100002

EID Scopus

2-s2.0-84992488738

BibTeX

@inproceedings{BUT130999,
  author="Alessandro {Abate} and Milan {Češka} and Marta {Kwiatkowska}",
  title="Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations",
  booktitle="Proceedings of 14th International Symposium on Automated Technology for Verification and Analysis",
  year="2016",
  series="Lecture Notes in Computer Science",
  volume="9938",
  pages="13--31",
  publisher="Springer Verlag",
  address="Heidelberg",
  doi="10.1007/978-3-319-46520-3\{_}2",
  isbn="978-3-319-46519-7",
  url="http://link.springer.com/chapter/10.1007%2F978-3-319-46520-3_2"
}

Projekty

Přibližná ekvivalence pro aproximativní počítání, GAČR, Standardní projekty, GA16-17538S, zahájení: 2016-01-01, ukončení: 2018-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina automatizované analýzy a verifikace - VeriFIT (VZ VERIFIT)

Pracoviště

Ústav inteligentních systémů (UITS)