Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk

Fernández, Milton Raúl Condori

doi:10.11606/D.45.2020.tde-30062021-195910

Home

Facilities

Master's Dissertation

DOI

https://doi.org/10.11606/D.45.2020.tde-30062021-195910

Document

Master's Dissertation

Author

Fernández, Milton Raúl Condori (Catálogo USP)

Full name

Milton Raúl Condori Fernández

Institute/School/College

Instituto de Matemática e Estatística

Knowledge Area

Computer Science

Date of Defense

2020-02-18

Published

São Paulo, 2020

Supervisor

Barros, Leliane Nunes de (Catálogo USP)

Committee

Barros, Leliane Nunes de (President)
Silva, Valdinei Freire da
Trevizan, Felipe Werndl

Title in English

Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk

Keywords in English

Planning as heuristic search
Probabilistic planning
Risk sensitive SSP
SSP as dual linear program

Abstract in English

In probabilistic planning an agent interacts with an environment and the objective is to find an optimal policy (state-action mapping) that allows the agent to achieve a goal state from an initial state, while minimizing the expected accumulated cost. Efficient solutions for large instances of probabilistic planning are, in general, based on Stochastic Shortest Path (ssp) problems, and use heuristic search techniques. However, these approaches have two limitations: (1) they can not guarantee to return optimal policies in the presence of dead-ends (states from which it is not possible to reach the goal) and (2) they may present a high variance in terms of cost. In instances where unavoidable dead-ends exist, we can plan in two phases: maximizing the probability to reach the goal (maxprob) and then minimizing the expected cost (mincost); or yet, we can define a penalty for reaching a dead-end state and only minimize the expected cost (mincost-with-penalty). While there exist several heuristics to solve the mincost problem, there are no efficient heuristics to solve maxprobproblem. A recent work proposed the first heuristic that takes into account the probabilities, called hpom, which solves a relaxed version of an ssp as a linear program in the dual space. In this work we propose two new heuristics based on hpom to solve probabilistic planning problems with unavoidable dead-ends, that includes new variables and constraints for dead-end states. The first, h_p_pom(s), estimates the maximum probability to reach the goal from s, and is used to efficiently solve maxprob problems by ignoring action costs and considering only the probabilities. The second, used to solve mincost-with-penalty problems, called h_pe_pom(s), estimates the minimal cost to reach the goal from state s and adds an expected penalty for reaching dead-ends. In order to deal with the second limitation of traditional ssp solutions, we propose a third heuristic, called h_rs_pom, also based on hpom, for a modified version of an ssp, called risk sensitive ssp (rs-ssp), whose optimization criterion is to minimize an exponential utility function including a risk factor to characterize the agent attitude as: (i) risk-averse ( > 0); (ii) risk-prone ( < 0); or (iii) risk-neutral ( 0). Empirical results show that the proposed heuristics can solve larger planning instances when compared to the state-of-the-art solutions for ssps with dead-ends and rs-ssp problems.

Title in Portuguese

Heurísticas baseadas na projeção de medidas de ocupação para planejamento probabilístico

Keywords in Portuguese

Heurística
Planejamento probabilístico

Abstract in Portuguese

não disponível

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.

Thesis__Milton.pdf (2.54 Mbytes)

Publishing Date

2021-09-03

Derived works

WARNING: Learn what derived works are clicking here.