IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i18p2927-d1481999.html
   My bibliography  Save this article

A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection

Author

Listed:
  • Hao Li

    (Hunan Institute of Advanced Technology, Changsha 410073, China
    College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
    These authors contributed equally to this work.)

  • Jianjun Zhan

    (College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
    Cainiao Network, Hangzhou 311100, China
    These authors contributed equally to this work.)

  • Haosen Wang

    (College of Systems Engineering, National University of Defense Technology, Changsha 410073, China)

  • Zipeng Zhao

    (College of Systems Engineering, National University of Defense Technology, Changsha 410073, China)

Abstract

The discovery of Markov boundaries is highly effective at identifying features that are causally related to the target variable, providing strong interpretability and robustness. While there are numerous methods for discovering Markov boundaries in real-world applications, no single method is universally applicable to all datasets. Therefore, in order to balance precision and recall, we propose an ensemble framework of divide-and-conquer Markov boundary discovery algorithms based on U-I selection strategy. We put three divide-and-conquer Markov boundary methods into the framework to obtain an ensemble algorithm, focusing on judging controversial parent–child variables to further balance precision and recall. By combining multiple algorithms, the ensemble algorithm can leverage their respective strengths and more thoroughly analyze the cause-and-effect relationships of target variables through various perspectives. Furthermore, it can enhance the robustness of the algorithm and reduce dependence on a single algorithm. In the experiment, we select four advanced Markov boundary discovery algorithms as comparison algorithms and compare them on nine benchmark Bayesian networks and three real-world datasets. The results show that EDMB ranks first in the overall ranking, which illustrates the superiority of the integrated algorithm and the effectiveness of the adopted U-I selection strategy. The main contribution of this paper lies in proposing an ensemble framework for divide-and-conquer Markov boundary discovery algorithms, balancing precision and recall through the U-I selection strategy, and judging controversial parent–child variables to enhance algorithm performance and robustness. The advantage of the U-I selection strategy and its difference from existing methods is the ability to independently obtain the maximum precision and recall of multiple algorithms within the ensemble framework. By assessing controversial parent–child variables, it further balances precision and recall, leading to results that are closer to the true Markov boundary.

Suggested Citation

  • Hao Li & Jianjun Zhan & Haosen Wang & Zipeng Zhao, 2024. "A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection," Mathematics, MDPI, vol. 12(18), pages 1-21, September.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:18:p:2927-:d:1481999
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/18/2927/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/18/2927/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Peter Spirtes & Clark Glymour & Richard Scheines, 2001. "Causation, Prediction, and Search, 2nd Edition," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262194406, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bareinboim Elias & Pearl Judea, 2013. "A General Algorithm for Deciding Transportability of Experimental Results," Journal of Causal Inference, De Gruyter, vol. 1(1), pages 107-134, June.
    2. Bettendorf, Timo & Heinlein, Reinhold, 2019. "Connectedness between G10 currencies: Searching for the causal structure," Discussion Papers 06/2019, Deutsche Bundesbank.
    3. Maarten J. Bijlsma & Rhian M. Daniel & Fanny Janssen & Bianca L. De Stavola, 2017. "An Assessment and Extension of the Mechanism-Based Approach to the Identification of Age-Period-Cohort Models," Demography, Springer;Population Association of America (PAA), vol. 54(2), pages 721-743, April.
    4. Chen, Pu & Hsiao, Chih-Ying, 2008. "What happens to Japan if China catches a cold?: A causal analysis of Chinese growth and Japanese growth," Japan and the World Economy, Elsevier, vol. 20(4), pages 622-638, December.
    5. Chen, Pu & Chihying, Hsiao, 2007. "Learning Causal Relations in Multivariate Time Series Data," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 1, pages 1-43.
    6. Kaiyue Liu & Lihua Liu & Kaiming Xiao & Xuan Li & Hang Zhang & Yun Zhou & Hongbin Huang, 2024. "CL-NOTEARS: Continuous Optimization Algorithm Based on Curriculum Learning Framework," Mathematics, MDPI, vol. 12(17), pages 1-22, August.
    7. Ruijie Tang, 2024. "Trading with Time Series Causal Discovery: An Empirical Study," Papers 2408.15846, arXiv.org, revised Aug 2024.
    8. Benjamin A Logsdon & Jason Mezey, 2010. "Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-13, December.
    9. Stimel Derek, 2009. "A Statistical Analysis of NFL Quarterback Rating Variables," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 5(2), pages 1-26, May.
    10. Xingyu Liao & Xiaoping Liu, 2024. "Hidden Variable Discovery Based on Regression and Entropy," Mathematics, MDPI, vol. 12(9), pages 1-16, April.
    11. Behnam Azhdari & Jean Bonnet & Sébastien Bourdin, 2022. "Towards a Causal Model and Causal Inference of Regional Entrepreneurship Development Index, its antecedents and outcomes in European regions," Economics Working Paper Archive (University of Rennes & University of Caen) 2022-06, Center for Research in Economics and Management (CREM), University of Rennes, University of Caen and CNRS.
    12. C Schultheiss & P Bühlmann, 2023. "Ancestor regression in linear structural equation models," Biometrika, Biometrika Trust, vol. 110(4), pages 1117-1124.
    13. Klimova, Anna & Uhler, Caroline & Rudas, Tamás, 2015. "Faithfulness and learning hypergraphs from discrete distributions," Computational Statistics & Data Analysis, Elsevier, vol. 87(C), pages 57-72.
    14. Pearl Judea, 2017. "Physical and Metaphysical Counterfactuals: Evaluating Disjunctive Actions," Journal of Causal Inference, De Gruyter, vol. 5(2), pages 1-10, September.
    15. Stimel Derek S, 2011. "Dependence Relationships between On Field Performance, Wins, and Payroll in Major League Baseball," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 7(2), pages 1-19, May.
    16. Tyler J. VanderWeele, 2011. "Sensitivity Analysis for Contagion Effects in Social Networks," Sociological Methods & Research, , vol. 40(2), pages 240-255, May.
    17. Jong-Min Kim & Chulhee Jun & Hope H. Han, 2020. "Sustainable Causal Interpretation with Board Characteristics: Caveat Emptor," Sustainability, MDPI, vol. 12(8), pages 1-18, April.
    18. Huang, Wei & Lai, Pei-Chun & Bessler, David A., 2018. "On the changing structure among Chinese equity markets: Hong Kong, Shanghai, and Shenzhen," European Journal of Operational Research, Elsevier, vol. 264(3), pages 1020-1032.
    19. Paul Muentener & Elizabeth Bonawitz & Alexandra Horowitz & Laura Schulz, 2012. "Mind the Gap: Investigating Toddlers’ Sensitivity to Contact Relations in Predictive Events," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-7, April.
    20. Heinlein, Reinhold & Krolzig, Hans-Martin, 2012. "On the construction of two-country cointegrated VAR models with an application to the UK and US," VfS Annual Conference 2012 (Goettingen): New Approaches and Challenges for the Labor Market of the 21st Century 62310, Verein für Socialpolitik / German Economic Association.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:18:p:2927-:d:1481999. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.