IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v193y2024ics0167947324000021.html
   My bibliography  Save this article

Integrated subgroup identification from multi-source data

Author

Listed:
  • Shao, Lihui
  • Wu, Jiaqi
  • Zhang, Weiping
  • Chen, Yu

Abstract

Subgroup identification is crucial in dealing with the heterogeneous population and has wide applications in various areas, such as clinical trials and market segmentation. With the prevalence of multi-source data, there is a practical need to identify subgroups based on multi-source data. This paper proposes a working-independence pseudo-loglikelihood and integrates the parameters of each source into a pairwise fusion penalty for simultaneous parameter estimation and subgroup identification. To implement the proposed method, an alternating direction method of multipliers (ADMM) algorithm is derived. Furthermore, the weak oracle properties of parameter estimation are established, illustrating the latent subgroups can be consistently identified. Finally, numerical simulations and an analysis of a randomized trial on reduced nicotine standards for cigarettes are conducted to evaluate the performance of the proposed method.

Suggested Citation

  • Shao, Lihui & Wu, Jiaqi & Zhang, Weiping & Chen, Yu, 2024. "Integrated subgroup identification from multi-source data," Computational Statistics & Data Analysis, Elsevier, vol. 193(C).
  • Handle: RePEc:eee:csdana:v:193:y:2024:i:c:s0167947324000021
    DOI: 10.1016/j.csda.2024.107918
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324000021
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.107918?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Juan Shen & Xuming He, 2015. "Inference for Subgroup Analysis With a Structured Logistic-Normal Mixture Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 303-312, March.
    2. Susan Wei & Michael R. Kosorok, 2013. "Latent Supervised Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(503), pages 957-970, September.
    3. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    4. Yan Li & Chun Yu & Yize Zhao & Weixin Yao & Robert H. Aseltine & Kun Chen, 2022. "Pursuing sources of heterogeneity in modeling clustered population," Biometrics, The International Biometric Society, vol. 78(2), pages 716-729, June.
    5. Xin Gao & Raymond J. Carroll, 2017. "Data integration with high dimensionality," Biometrika, Biometrika Trust, vol. 104(2), pages 251-272.
    6. Yi Zhao & Lexin Li & Brian S. Caffo, 2021. "Multimodal neuroimaging data integration and pathway analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 879-889, September.
    7. Khalili, Abbas & Chen, Jiahua, 2007. "Variable Selection in Finite Mixture of Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1025-1038, September.
    8. Xiwei Tang & Fei Xue & Annie Qu, 2021. "Individualized Multidirectional Variable Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1280-1296, July.
    9. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    10. Hansheng Wang & Runze Li & Chih-Ling Tsai, 2007. "Tuning parameter selectors for the smoothly clipped absolute deviation method," Biometrika, Biometrika Trust, vol. 94(3), pages 553-568.
    11. Shujie Ma & Jian Huang, 2017. "A Concave Pairwise Fusion Approach to Subgroup Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 410-423, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cai, Tingting & Li, Jianbo & Zhou, Qin & Yin, Songlou & Zhang, Riquan, 2024. "Subgroup detection based on partially linear additive individualized model with missing data in response," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    2. Sakyajit Bhattacharya & Paul McNicholas, 2014. "A LASSO-penalized BIC for mixture model selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(1), pages 45-61, March.
    3. Liu, Lili & Lin, Lu, 2019. "Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data," Computational Statistics & Data Analysis, Elsevier, vol. 138(C), pages 239-259.
    4. Wang, Xin & Zhu, Zhengyuan & Zhang, Hao Helen, 2023. "Spatial heterogeneity automatic detection and estimation," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    5. Pei, Youquan & Peng, Heng & Xu, Jinfeng, 2024. "A latent class Cox model for heterogeneous time-to-event data," Journal of Econometrics, Elsevier, vol. 239(2).
    6. Baosheng Liang & Peng Wu & Xingwei Tong & Yanping Qiu, 2020. "Regression and subgroup detection for heterogeneous samples," Computational Statistics, Springer, vol. 35(4), pages 1853-1878, December.
    7. Shuang Zhang & Xingdong Feng, 2022. "Distributed identification of heterogeneous treatment effects," Computational Statistics, Springer, vol. 37(1), pages 57-89, March.
    8. Zhang, Xiaochen & Zhang, Qingzhao & Ma, Shuangge & Fang, Kuangnan, 2022. "Subgroup analysis for high-dimensional functional regression," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    9. Ping Zeng & Yongyue Wei & Yang Zhao & Jin Liu & Liya Liu & Ruyang Zhang & Jianwei Gou & Shuiping Huang & Feng Chen, 2014. "Variable selection approach for zero-inflated count data via adaptive lasso," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(4), pages 879-894, April.
    10. Li-Pang Chen, 2022. "Network-Based Discriminant Analysis for Multiclassification," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 410-431, November.
    11. Yan Li & Chun Yu & Yize Zhao & Weixin Yao & Robert H. Aseltine & Kun Chen, 2022. "Pursuing sources of heterogeneity in modeling clustered population," Biometrics, The International Biometric Society, vol. 78(2), pages 716-729, June.
    12. Mehrabani, Ali, 2023. "Estimation and identification of latent group structures in panel data," Journal of Econometrics, Elsevier, vol. 235(2), pages 1464-1482.
    13. Weirong Li & Wensheng Zhu, 2024. "Subgroup analysis with concave pairwise fusion penalty for ordinal response," Statistical Papers, Springer, vol. 65(6), pages 3327-3355, August.
    14. Galimberti, Giuliano & Montanari, Angela & Viroli, Cinzia, 2009. "Penalized factor mixture analysis for variable selection in clustered data," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4301-4310, October.
    15. Lu, Wenqi & Qin, Guoyou & Zhu, Zhongyi & Tu, Dongsheng, 2021. "Multiply robust subgroup identification for longitudinal data with dropouts via median regression," Journal of Multivariate Analysis, Elsevier, vol. 181(C).
    16. Okhrin, Ostap & Ristig, Alexander & Sheen, Jeffrey R. & Trück, Stefan, 2015. "Conditional systemic risk with penalized copula," SFB 649 Discussion Papers 2015-038, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
    17. Peng, Heng & Lu, Ying, 2012. "Model selection in linear mixed effect models," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 109-129.
    18. Jun Zhu & Hsin‐Cheng Huang & Perla E. Reyes, 2010. "On selection of spatial linear models for lattice data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(3), pages 389-402, June.
    19. Ye, Mao & Lu, Zhao-Hua & Li, Yimei & Song, Xinyuan, 2019. "Finite mixture of varying coefficient model: Estimation and component selection," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 452-474.
    20. Tang, Linjun & Zhou, Zhangong & Wu, Changchun, 2012. "Weighted composite quantile estimation and variable selection method for censored regression model," Statistics & Probability Letters, Elsevier, vol. 82(3), pages 653-663.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:193:y:2024:i:c:s0167947324000021. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.