Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.
Skip Nav Destination
Article navigation
June 2016
Research-Article
Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm
Thai Duong,
Thai Duong
School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: duong@eecs.oregonstate.edu
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: duong@eecs.oregonstate.edu
Search for other works by this author on:
Duong Nguyen-Huu,
Duong Nguyen-Huu
School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: nguyendu@eecs.oregonstate.edu
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: nguyendu@eecs.oregonstate.edu
Search for other works by this author on:
Thinh Nguyen
Thinh Nguyen
School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: thinhq@eecs.oregonstate.edu
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: thinhq@eecs.oregonstate.edu
Search for other works by this author on:
Thai Duong
School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: duong@eecs.oregonstate.edu
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: duong@eecs.oregonstate.edu
Duong Nguyen-Huu
School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: nguyendu@eecs.oregonstate.edu
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: nguyendu@eecs.oregonstate.edu
Thinh Nguyen
School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: thinhq@eecs.oregonstate.edu
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: thinhq@eecs.oregonstate.edu
Contributed by the Dynamic Systems Division of ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CONTROL. Manuscript received November 7, 2014; final manuscript received February 22, 2016; published online April 6, 2016. Assoc. Editor: Srinivasa M. Salapaka.
J. Dyn. Sys., Meas., Control. Jun 2016, 138(6): 061009 (12 pages)
Published Online: April 6, 2016
Article history
Received:
November 7, 2014
Revised:
February 22, 2016
Citation
Duong, T., Nguyen-Huu, D., and Nguyen, T. (April 6, 2016). "Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm." ASME. J. Dyn. Sys., Meas., Control. June 2016; 138(6): 061009. https://doi.org/10.1115/1.4032875
Download citation file:
61
Views
Get Email Alerts
Cited By
Design of Attack Resistant Robust Control Based on Intermediate Estimator Approach for Offshore Steel Jacket Structures
J. Dyn. Sys., Meas., Control (September 2025)
Motion Control Along Spatial Curves for Robot Manipulators: A Noninertial Frame Approach
J. Dyn. Sys., Meas., Control (September 2025)
Associate Editor's Recognition
J. Dyn. Sys., Meas., Control (July 2025)
A Case Study Comparing Both Stochastic and Worst-Case Robust Control Co-Design Under Different Control Structures
J. Dyn. Sys., Meas., Control (September 2025)
Related Articles
Adjoint-Based Optimization Procedure for Active Vibration Control of Nonlinear Mechanical Systems
J. Dyn. Sys., Meas., Control (August,2017)
A New Model-Based Control Structure for Position Tracking in an Electro-Hydraulic Servo System With Acceleration Constraint
J. Dyn. Sys., Meas., Control (December,2017)
An Adaptive Economic Model Predictive Control Approach for Wind Turbines
J. Dyn. Sys., Meas., Control (May,2018)
Observer-Based Feedback Control of Networked Control Systems With Delays and Packet Dropouts
J. Dyn. Sys., Meas., Control (February,2016)
Related Proceedings Papers
Related Chapters
Measuring Graph Similarity Using Node Indexing and Message Passing
International Conference on Computer Technology and Development, 3rd (ICCTD 2011)
Ultra High-Speed Microbridge Chaos Domain
Intelligent Engineering Systems Through Artificial Neural Networks, Volume 17
Study of Metro Station Gathering and Distributing Capacity Based on Hybrid Petri Net
International Conference on Information Technology and Management Engineering (ITME 2011)