IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2502.08875.html
   My bibliography  Save this paper

Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation

Author

Listed:
  • Hsin-Min Lu
  • Yu-Tai Chien
  • Huan-Hsun Yen
  • Yen-Hsiu Chen

Abstract

Extracting specific items from 10-K reports remains challenging due to variations in document formats and item presentation. Traditional rule-based item segmentation approaches often yield suboptimal results. This study introduces two advanced item segmentation methods leveraging language models: (1) GPT4ItemSeg, using a novel line-ID-based prompting mechanism to utilize GPT4 for item segmentation, and (2) BERT4ItemSeg, combining BERT embeddings with a Bi-LSTM model in a hierarchical structure to overcome context window constraints. Trained and evaluated on 3,737 annotated 10-K reports, BERT4ItemSeg achieved a macro-F1 of 0.9825, surpassing GPT4ItemSeg (0.9567), conditional random field (0.9818), and rule-based methods (0.9048) for core items (1, 1A, 3, and 7). These approaches enhance item segmentation performance, improving text analytics in accounting and finance. BERT4ItemSeg offers satisfactory item segmentation performance, while GPT4ItemSeg can easily adapt to regulatory changes. Together, they offer practical benefits for researchers and practitioners, enabling reliable empirical studies and automated 10-K item segmentation functionality.

Suggested Citation

  • Hsin-Min Lu & Yu-Tai Chien & Huan-Hsun Yen & Yen-Hsiu Chen, 2025. "Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation," Papers 2502.08875, arXiv.org.
  • Handle: RePEc:arx:papers:2502.08875
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2502.08875
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Li, Feng, 2008. "Annual report readability, current earnings, and earnings persistence," Journal of Accounting and Economics, Elsevier, vol. 45(2-3), pages 221-247, August.
    2. Ertugrul, Mine & Lei, Jin & Qiu, Jiaping & Wan, Chi, 2017. "Annual Report Readability, Tone Ambiguity, and the Cost of Borrowing," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 52(2), pages 811-836, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lebelle, Martin & Lajili Jarjir, Souad & Sassi, Syrine, 2022. "The effect of issuance documentation disclosure and readability on liquidity: Evidence from green bonds," Global Finance Journal, Elsevier, vol. 51(C).
    2. Drago, Carlo & Ginesti, Gianluca & Pongelli, Claudia & Sciascia, Salvatore, 2018. "Reporting strategies: What makes family firms beat around the bush? Family-related antecedents of annual report readability," Journal of Family Business Strategy, Elsevier, vol. 9(2), pages 142-150.
    3. Bui, Dien Giau & Chen, Yehning & Chen, Yan-Shing & Lin, Chih-Yung, 2023. "Managerial ability and financial statement disaggregation decisions," Journal of Empirical Finance, Elsevier, vol. 74(C).
    4. Minxing Sun & Weike Xu, 2024. "Short selling and readability in financial disclosures: A controlled experiment," The Financial Review, Eastern Finance Association, vol. 59(2), pages 265-292, May.
    5. Muhammad Farhan Malik & Yuan George Shan & Jamie Yixing Tong, 2022. "Do auditors price litigious tone?," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 62(S1), pages 1715-1760, April.
    6. Qiu, Meng & Gu, Kai & Zhang, Zhichao & Zhang, Junrui, 2023. "Political uncertainty and financial statement readability," Research in International Business and Finance, Elsevier, vol. 66(C).
    7. Le Wang & Xiaoyan Chen & Xing Li & Gaoliang Tian, 2021. "MD&A readability, auditor characteristics, and audit fees," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 61(4), pages 5025-5050, December.
    8. Doshi, Hitesh & Patel, Saurin & Ramani, Srikanth & Sooy, Matthew, 2023. "Uncertain tone, asset volatility and credit default swap spreads," Journal of Contemporary Accounting and Economics, Elsevier, vol. 19(3).
    9. Chen, Chen & Hanlon, Dean & Khedmati, Mehdi & Wake, James, 2023. "Annual report readability and equity mispricing," Journal of Contemporary Accounting and Economics, Elsevier, vol. 19(3).
    10. Yin, Shiyan & Chevapatrakul, Thanaset & Yao, Kai, 2022. "The causal effect of improved readability of financial reporting on stock price crash risk: Evidence from the Plain Writing Act of 2010," Economics Letters, Elsevier, vol. 216(C).
    11. Sun, Li, 2023. "Asset redeployability and readability of annual report," Research in International Business and Finance, Elsevier, vol. 64(C).
    12. Rjiba, Hatem & Saadi, Samir & Boubaker, Sabri & Ding, Xiaoya (Sara), 2021. "Annual report readability and the cost of equity capital," Journal of Corporate Finance, Elsevier, vol. 67(C).
    13. Cao Thi Mien Thuy & Trinh Quoc Trung & Nguyen Vinh Khuong & Nguyen Thanh Liem, 2021. "From Corporate Social Responsibility to Stock Price Crash Risk: Modelling the Mediating Role of Firm Performance in an Emerging Market," Sustainability, MDPI, vol. 13(22), pages 1-17, November.
    14. Zhongtian Li & Jing Jia & Larelle J. Chapple, 2022. "Textual characteristics of corporate sustainability disclosure and corporate sustainability performance: evidence from Australia," Meditari Accountancy Research, Emerald Group Publishing Limited, vol. 31(3), pages 786-816, February.
    15. Mousa, Gehan A. & Elamir, Elsayed A.H. & Hussainey, Khaled, 2022. "The effect of annual report narratives on the cost of capital in the Middle East and North Africa: A machine learning approach," Research in International Business and Finance, Elsevier, vol. 62(C).
    16. Meng, Qingxi & He, Yan & Zhang, Anting & Gong, Xiaoyun, 2023. "Does mandatory operating information disclosure affect stock price crash risk? Evidence from China," Pacific-Basin Finance Journal, Elsevier, vol. 82(C).
    17. Xu, Qiao & Fernando, Guy D. & Tam, Kinsun, 2018. "Executive age and the readability of financial reports," Advances in accounting, Elsevier, vol. 43(C), pages 70-81.
    18. Nicolás Gambetta & Laura Sierra‐García & María Antonia García‐Benau & Josefina Novejarque‐Civera, 2023. "The Informative Value of Key Audit Matters in the Audit Report: Understanding the Impact of the Audit Firm and KAM Type," Australian Accounting Review, CPA Australia, vol. 33(2), pages 114-134, June.
    19. Bin Yan Ding & Feng Wei, 2022. "Executive resume information disclosure and corporate innovation: Evidence from China," Managerial and Decision Economics, John Wiley & Sons, Ltd., vol. 43(8), pages 3593-3610, December.
    20. Lin, Sin-Jin & Zeng, Jhih-Hong & Chang, Te-Min & Hsu, Ming-Fu, 2024. "Linguistic complexity consideration for advanced risk decision making and handling," Research in International Business and Finance, Elsevier, vol. 69(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2502.08875. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.