Avoiding Overfitting in Variable-Order Markov Models: a Cross-Validation Approach

My bibliography Save this paper

Avoiding Overfitting in Variable-Order Markov Models: a Cross-Validation Approach

Author

Listed:

Valeria Secchini
Javier Garcia-Bernardo
Petr Jansk'y

Registered:

Abstract

Higher$\text{-}$order Markov chain models are widely used to represent agent transitions in dynamic systems, such as passengers in transport networks. They capture transitions in complex systems by considering not only the current state but also the path of previously visited states. For example, the likelihood of train passengers traveling from Paris (current state) to Rome could increase significantly if their journey originated in Italy (prior state). Although this approach provides a more faithful representation of the system than first$\text{-}$order models, we find that commonly used methods$-$relying on Kullback$\text{-}$Leibler divergence$-$frequently overfit the data, mistaking fluctuations for higher$\text{-}$order dependencies and undermining forecasts and resource allocation. Here, we introduce DIVOP (Detection of Informative Variable$\text{-}$Order Paths), an algorithm that employs cross$\text{-}$validation to robustly distinguish meaningful higher$\text{-}$order dependencies from noise. In both synthetic and real$\text{-}$world datasets, DIVOP outperforms two state$\text{-}$of$\text{-}$the$\text{-}$art algorithms by achieving higher precision, recall, and sparser representations of the underlying dynamics. When applied to global corporate ownership data, DIVOP reveals that tax havens appear in 82$\%$ of all significant higher$\text{-}$order dependencies, underscoring their outsized influence in corporate networks. By mitigating overfitting, DIVOP enables more reliable multi$\text{-}$step predictions and decision$\text{-}$making, paving the way toward deeper insights into the hidden structures that drive modern interconnected systems.

Suggested Citation

Valeria Secchini & Javier Garcia-Bernardo & Petr Jansk'y, 2025. "Avoiding Overfitting in Variable-Order Markov Models: a Cross-Validation Approach," Papers 2501.14476, arXiv.org.

Handle: RePEc:arx:papers:2501.14476

Download full text from publisher

References listed on IDEAS

Guttorm Schjelderup, 2016. "Secrecy jurisdictions," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 23(1), pages 168-189, February.
- Schjelderup, Guttorm, 2015. "Secrecy Jurisdictions," Discussion Papers 2015/12, Norwegian School of Economics, Department of Business and Management Science.
- Guttorm Schjelderup, 2015. "Secrecy Jurisdictions," CESifo Working Paper Series 5239, CESifo.
Tiago P. Peixoto & Martin Rosvall, 2017. "Modelling sequences and temporal networks with dynamic community structures," Nature Communications, Nature, vol. 8(1), pages 1-12, December.
Martin Rosvall & Alcides V. Esquivel & Andrea Lancichinetti & Jevin D. West & Renaud Lambiotte, 2014. "Memory in network flows and its effects on spreading dynamics and community detection," Nature Communications, Nature, vol. 5(1), pages 1-13, December.
Ingo Scholtes & Nicolas Wider & René Pfitzner & Antonios Garas & Claudio J. Tessone & Frank Schweitzer, 2014. "Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal networks," Nature Communications, Nature, vol. 5(1), pages 1-9, December.
Väinö Jääskinen & Jie Xiong & Jukka Corander & Timo Koski, 2014. "Sparse Markov Chains for Sequence Data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(3), pages 639-655, September.
Armin Shmilovici & Irad Ben-Gal, 2007. "Using a VOM model for reconstructing potential coding regions in EST sequences," Computational Statistics, Springer, vol. 22(1), pages 49-69, April.
Javier Garcia-Bernardo & Jan Fichtner & Eelke M. Heemskerk & Frank W. Takes, 2017. "Uncovering Offshore Financial Centers: Conduits and Sinks in the Global Corporate Ownership Network," Papers 1703.03016, arXiv.org, revised May 2017.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Luca Gallo & Lucas Lacasa & Vito Latora & Federico Battiston, 2024. "Higher-order correlations reveal complex memory in temporal hypergraphs," Nature Communications, Nature, vol. 15(1), pages 1-7, December.
Xie, Fengjie & Ma, Mengdi & Ren, Cuiping, 2022. "Research on multilayer network structure characteristics from a higher-order model: The case of a Chinese high-speed railway system," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 586(C).
Andrew Mellor, 2019. "Event Graphs: Advances And Applications Of Second-Order Time-Unfolded Temporal Network Models," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 22(03), pages 1-26, May.
Damgaard, Jannick & Elkjaer, Thomas & Johannesen, Niels, 2024. "What is real and what is not in the global FDI network?," Journal of International Money and Finance, Elsevier, vol. 140(C).
Franch, Fabio & Nocciola, Luca & Vouldis, Angelos, 2024. "Temporal networks and financial contagion," Journal of Financial Stability, Elsevier, vol. 71(C).
- Franch, Fabio & Nocciola, Luca & Vouldis, Angelos, 2022. "Temporal networks in the analysis of financial contagion," Working Paper Series 2667, European Central Bank.
Ivar Kolstad, 2017. "Protected tax havens: Cornering the market through international reform?," CMI Working Papers 7, CMI (Chr. Michelsen Institute), Bergen, Norway.
Funel, Agostino, 2022. "A method to compute the communicability of nodes through causal paths in temporal networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 593(C).
Xiang Li & Chengli Zhao & Zhaolong Hu & Caixia Yu & Xiaojun Duan, 2022. "Revealing the character of journals in higher-order citation networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(11), pages 6315-6338, November.
Menkhoff, Lukas & Miethe, Jakob, 2019. "Tax evasion in new disguise? Examining tax havens' international bank deposits," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 176, pages 53-78.
- Menkhoff, Lukas & Miethe, Jakob, 2019. "Tax evasion in new disguise? Examining tax havens' international bank deposits," Journal of Public Economics, Elsevier, vol. 176(C), pages 53-78.
- Lukas Menkhoff & Jakob Miethe, 2017. "Tax Evasion in New Disguise? Examining Tax Havens’ International Bank Deposits," Discussion Papers of DIW Berlin 1711, DIW Berlin, German Institute for Economic Research.
- Miethe, Jakob & Menkhoff, Lukas, 2017. "Dirty money coming home: Capital flows into and out of tax havens," VfS Annual Conference 2017 (Vienna): Alternative Structures for Money and Banking 168082, Verein für Socialpolitik / German Economic Association.
Chao Min & Qingyu Chen & Erjia Yan & Yi Bu & Jianjun Sun, 2021. "Citation cascade and the evolution of topic relevance," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 72(1), pages 110-127, January.
I. P. Gurova, 2020. "Offshore Investment in the Russian Economy," Studies on Russian Economic Development, Springer, vol. 31(4), pages 449-456, July.
Panayotis Christidis & Álvaro Gomez Losada, 2019. "Email Based Institutional Network Analysis: Applications and Risks," Social Sciences, MDPI, vol. 8(11), pages 1-14, November.
Franz Reiter & Dominika Langenmayr & Svea Holtmann, 2021. "Avoiding taxes: banks’ use of internal debt," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 28(3), pages 717-745, June.
- Reiter, Franz & Langenmayr, Dominika & Holtmann, Svea, 2020. "Avoiding taxes: banks' use of internal debt," Munich Reprints in Economics 84720, University of Munich, Department of Economics.
- Franz Reiter & Dominika Langenmayr & Svea Holtmann, 2020. "Avoiding Taxes: Banks' Use of Internal Debt," CESifo Working Paper Series 8525, CESifo.
- Franz Reiter & Dominika Langenmayr & Svea Holtmann, 2020. "Avoiding Taxes: Banks' Use of Internal Debt," Working Papers 196, Bavarian Graduate Program in Economics (BGPE).
Alex C. Michalos & P. Maurine Hatch, 2020. "Good Societies, Financial Inequality and Secrecy, and a Good Life: from Aristotle to Piketty," Applied Research in Quality of Life, Springer;International Society for Quality-of-Life Studies, vol. 15(4), pages 1005-1054, September.
Sébastien Laffitte & Farid Toubal, 2018. "Firms, Trade and Profit Shifting: Evidence from Aggregate Data," CESifo Working Paper Series 7171, CESifo.
Rabbani, Fereshteh & Khraisha, Tamer & Abbasi, Fatemeh & Jafari, Gholam Reza, 2021. "Memory effects on link formation in temporal networks: A fractional calculus approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 564(C).
Chakraborty, Abhijit & Krichene, Hazem & Inoue, Hiroyasu & Fujiwara, Yoshi, 2019. "Characterization of the community structure in a large-scale production network in Japan," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 513(C), pages 210-221.
- Abhijit Chakraborty & Hazem Krichene & Hiroyasu Inoue & Yoshi Fujiwara, 2017. "Characterization of the community structure in a large-scale production network in Japan," Papers 1706.00203, arXiv.org, revised Sep 2018.
Marco Bardoscia & Fabio Caccioli & Juan Ignacio Perotti & Gianna Vivaldo & Guido Caldarelli, 2016. "Distress Propagation in Complex Networks: The Case of Non-Linear DebtRank," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-12, October.
Pamela Pogliani & Goetz von Peter & Philip Wooldridge, 2022. "The outsize role of cross-border financial centres," BIS Quarterly Review, Bank for International Settlements, June.
Gong, Chang & Li, Jichao & Qian, Liwei & Li, Siwei & Yang, Zhiwei & Yang, Kewei, 2024. "HMSL: Source localization based on higher-order Markov propagation," Chaos, Solitons & Fractals, Elsevier, vol. 182(C).

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2501.14476. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Avoiding Overfitting in Variable-Order Markov Models: a Cross-Validation Approach

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data