IDEAS home Printed from https://ideas.repec.org/a/abq/ijist1/v4y2022i5p94-102.html
   My bibliography  Save this article

What have you read? based Multi-Document Summarization

Author

Listed:
  • Sabina Irum

    (National University of Modern Languages Islamabad Pakistan)

  • Jamal Abdul Nasir

    (Department of Computer Science Business Information Systems NUI Galway, Ireland)

  • Zakia Jalil

    (Faculty of Basic and Applied Sciences International Islamic University, Islamabad, Pakistan)

Abstract

Due to the tremendous amount of data available today, extracting essential information from such a large volume of data is quite tough. Particularly in the case of text documents, which need a significant amount of time from the user to read the material and extract useful information. The major problem is identifying the user's relevant documents, removing the most significant pieces of information, determining document relevancy, excluding extraneous information, reducing details, and generating a compact, consistent report. For all these issues, we proposed a novel technique that solves the problem of extracting important information from a huge amount of text data and using previously read documents to generate summaries of new documents. Our technique is more focused on extracting topics (also known as topic signatures) from the previously read documents and then selecting the sentences that are more relevant to these topics based on update summary generation. Besides this, the concept of overlapping value is used that digs out the meaningful words and word similarities. Another thing that makes our work better is the Dice Coefficient which measures the intersection of words between document sets and helps to eliminate redundancy. The summary generated is based on more diverse and highly representative sentences with an average length. Empirically, we have observed that our proposed novel technique performed better with baseline competitors on the real-world TAC2008 dataset.

Suggested Citation

  • Sabina Irum & Jamal Abdul Nasir & Zakia Jalil, 2022. "What have you read? based Multi-Document Summarization," International Journal of Innovations in Science & Technology, 50sea, vol. 4(5), pages 94-102, June.
  • Handle: RePEc:abq:ijist1:v:4:y:2022:i:5:p:94-102
    DOI: 10.33411/IJIST/2022040508
    as

    Download full text from publisher

    File URL: https://journal.50sea.com/index.php/IJIST/article/view/331/253
    Download Restriction: no

    File URL: https://journal.50sea.com/index.php/IJIST/article/view/331
    Download Restriction: no

    File URL: https://libkey.io/10.33411/IJIST/2022040508?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Iqra Khan & Muhammad Zohaib Siddique & Ateeq Ur Rehman Butt & AZHAR IMRAN Mudassir & Muhammad Azeem Qadir & Sundus Munir, 2021. "Towards Skin Cancer Classification Using Machine Learning And Deep Learning Algorithms: A Comparison," International Journal of Innovations in Science & Technology, 50sea, vol. 3(4), pages 110-118, December.
    2. Muhammad Shoaib Anjum & Dr. Shahzad Mumtaz & Dr. Omer Riaz & Waqas Sharif, 2021. "Heart Attack Risk Prediction with Duke Treadmill Score with Symptoms using Data Mining," International Journal of Innovations in Science & Technology, 50sea, vol. 3(4), pages 174-185, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rashid Amin & Muzammal Majeed & Farrukh Shoukat Ali & Adeel Ahmed & Mudassar Hussain, 2022. "Reliability Awareness Multiple Path Installation in Software Defined Networking using Machine Learning Algorithm," International Journal of Innovations in Science & Technology, 50sea, vol. 4(5), pages 158-172, July.
    2. Muhammad Adeel Abbasa & Zeshan Iqbal, 2022. "Double Auction used Artificial Neural Network in Cloud Computing," International Journal of Innovations in Science & Technology, 50sea, vol. 4(5), pages 65-76, June.
    3. Muhammad Sardaraz ( & Muhammad Tahir & Usman Aziz, 2022. "Critical Review of Blockchain Consensus Algorithms: challenges and opportunities," International Journal of Innovations in Science & Technology, 50sea, vol. 4(5), pages 52-64, June.

    More about this item

    Keywords

    Data mining; Text mining; Text summarization; Topic Signature; Density peak; Update Summarization;
    All these keywords.

    JEL classification:

    • R00 - Urban, Rural, Regional, Real Estate, and Transportation Economics - - General - - - General
    • Z0 - Other Special Topics - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:abq:ijist1:v:4:y:2022:i:5:p:94-102. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Hafiz Haroon Ahmad, Iqra Nazeer (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.