IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v211y2011i2p343-351.html
   My bibliography  Save this article

Approximate dynamic programming via direct search in the space of value function approximations

Author

Listed:
  • Arruda, E.F.
  • Fragoso, M.D.
  • do Val, J.B.R.

Abstract

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting.

Suggested Citation

  • Arruda, E.F. & Fragoso, M.D. & do Val, J.B.R., 2011. "Approximate dynamic programming via direct search in the space of value function approximations," European Journal of Operational Research, Elsevier, vol. 211(2), pages 343-351, June.
  • Handle: RePEc:eee:ejores:v:211:y:2011:i:2:p:343-351
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377-2217(10)00791-5
    Download Restriction: Full text for ScienceDirect subscribers only
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sox, Charles R. & Jackson, Peter L. & Bowman, Alan & Muckstadt, John A., 1999. "A review of the stochastic lot scheduling problem," International Journal of Production Economics, Elsevier, vol. 62(3), pages 181-200, September.
    2. Ishai Menache & Shie Mannor & Nahum Shimkin, 2005. "Basis Function Adaptation in Temporal Difference Reinforcement Learning," Annals of Operations Research, Springer, vol. 134(1), pages 215-238, February.
    3. Benjamin Van Roy, 2006. "Performance Loss Bounds for Approximate Value Iteration with State Aggregation," Mathematics of Operations Research, INFORMS, vol. 31(2), pages 234-244, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Arruda, E.F. & Fragoso, M.D., 2015. "Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm," European Journal of Operational Research, Elsevier, vol. 240(3), pages 697-705.
    2. Arruda, Edilson F. & Ourique, Fabrício O. & LaCombe, Jason & Almudevar, Anthony, 2013. "Accelerating the convergence of value iteration by using partial transition functions," European Journal of Operational Research, Elsevier, vol. 229(1), pages 190-198.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kerkkanen, Annastiina, 2007. "Determining semi-finished products to be stocked when changing the MTS-MTO policy: Case of a steel mill," International Journal of Production Economics, Elsevier, vol. 108(1-2), pages 111-118, July.
    2. Beemsterboer, Bart & Land, Martin & Teunter, Ruud, 2016. "Hybrid MTO-MTS production planning: An explorative study," European Journal of Operational Research, Elsevier, vol. 248(2), pages 453-461.
    3. Manuel Castejón-Limas & Joaquín Ordieres-Meré & Ana González-Marcos & Víctor González-Castro, 2011. "Effort estimates through project complexity," Annals of Operations Research, Springer, vol. 186(1), pages 395-406, June.
    4. Soman, Chetan Anil & Pieter van Donk, Dirk & Gaalman, Gerard, 2006. "Comparison of dynamic scheduling policies for hybrid make-to-order and make-to-stock production systems with stochastic demand," International Journal of Production Economics, Elsevier, vol. 104(2), pages 441-453, December.
    5. Brander, Par & Forsberg, Rolf, 2006. "Determination of safety stocks for cyclic schedules with stochastic demands," International Journal of Production Economics, Elsevier, vol. 104(2), pages 271-295, December.
    6. Lopez de Haro, Santiago & Gershwin, Stanley B. & Rosenfield, Donald B., 2009. "Schedule evaluation in unstable manufacturing environments," International Journal of Production Economics, Elsevier, vol. 121(1), pages 183-194, September.
    7. Vaughan, Timothy S., 2007. "Cyclical schedules vs. dynamic sequencing: Replenishment dynamics and inventory efficiency," International Journal of Production Economics, Elsevier, vol. 107(2), pages 518-527, June.
    8. George Liberopoulos & Dimitrios Pandelis & Olympia Hatzikonstantinou, 2013. "The stochastic economic lot sizing problem for non-stop multi-grade production with sequence-restricted setup changeovers," Annals of Operations Research, Springer, vol. 209(1), pages 179-205, October.
    9. Ashayeri, J. & Heuts, R.J.M. & Lansdaal, H.G.L. & Strijbosch, L.W.G., 2006. "Cyclic production-inventory planning and control in the pre-Deco industry: A case study," International Journal of Production Economics, Elsevier, vol. 103(2), pages 715-725, October.
    10. Vidal-Carreras, Pilar I. & Garcia-Sabater, Jose P. & Coronado-Hernandez, Jairo R., 2012. "Economic lot scheduling with deliberated and controlled coproduction," European Journal of Operational Research, Elsevier, vol. 219(2), pages 396-404.
    11. Fujimoto, Takahiro & Park, Young Won, 2014. "Balancing supply chain competitiveness and robustness through “virtual dual sourcing”: Lessons from the Great East Japan Earthquake," International Journal of Production Economics, Elsevier, vol. 147(PB), pages 429-436.
    12. Leven, Erik & Segerstedt, Anders, 2007. "A scheduling policy for adjusting economic lot quantities to a feasible solution," European Journal of Operational Research, Elsevier, vol. 179(2), pages 414-423, June.
    13. Smits, Sanne R. & Wagner, Michael & G. de Kok, Ton, 2004. "Determination of an order-up-to policy in the stochastic economic lot scheduling model," International Journal of Production Economics, Elsevier, vol. 90(3), pages 377-389, August.
    14. Zheng Wen & Benjamin Van Roy, 2017. "Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization," Mathematics of Operations Research, INFORMS, vol. 42(3), pages 762-782, August.
    15. Curcio, Eduardo & Amorim, Pedro & Zhang, Qi & Almada-Lobo, Bernardo, 2018. "Adaptation and approximate strategies for solving the lot-sizing and scheduling problem under multistage demand uncertainty," International Journal of Production Economics, Elsevier, vol. 202(C), pages 81-96.
    16. Tempelmeier, Horst, 2011. "A column generation heuristic for dynamic capacitated lot sizing with random demand under a fill rate constraint," Omega, Elsevier, vol. 39(6), pages 627-633, December.
    17. Chevalier, Philippe & Lamas, Alejandro & Lu, Liang & Mlinar, Tanja, 2015. "Revenue management for operations with urgent orders," European Journal of Operational Research, Elsevier, vol. 240(2), pages 476-487.
    18. Jodlbauer, Herbert & Reitner, Sonja, 2012. "Optimizing service-level and relevant cost for a stochastic multi-item cyclic production system," International Journal of Production Economics, Elsevier, vol. 136(2), pages 306-317.
    19. Donald D. Eisenstein, 2005. "Recovering Cyclic Schedules Using Dynamic Produce-Up-To Policies," Operations Research, INFORMS, vol. 53(4), pages 675-688, August.
    20. Kamath B, Narasimha & Bhattacharya, Subir, 2007. "Lead time minimization of a multi-product, single-processor system: A comparison of cyclic policies," International Journal of Production Economics, Elsevier, vol. 106(1), pages 28-40, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:211:y:2011:i:2:p:343-351. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.