IDEAS home Printed from https://ideas.repec.org/a/bla/jinfst/v67y2016i5p1138-1152.html
   My bibliography  Save this article

A knowledge‐based approach to Information Extraction for semantic interoperability in the archaeology domain

Author

Listed:
  • Andreas Vlachidis
  • Douglas Tudhope

Abstract

The article presents a method for automatic semantic indexing of archaeological grey‐literature reports using empirical (rule‐based) Information Extraction techniques in combination with domain‐specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word‐Sense Disambiguation using hand‐crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM‐EH. Relation Extraction (RE) performance benefits from a syntactic‐based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word‐Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule‐based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context‐driven RE rules for the recognition of semantic relationships from phrases of unstructured text.

Suggested Citation

  • Andreas Vlachidis & Douglas Tudhope, 2016. "A knowledge‐based approach to Information Extraction for semantic interoperability in the archaeology domain," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(5), pages 1138-1152, May.
  • Handle: RePEc:bla:jinfst:v:67:y:2016:i:5:p:1138-1152
    DOI: 10.1002/asi.23485
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asi.23485
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asi.23485?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Marcia J. Bates, 1986. "Subject access in online catalogs: A design model," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 37(6), pages 357-376, November.
    2. Koraljka Golub & Douglas Tudhope & Marcia Lei Zeng & Maja Žumer, 2014. "Terminology registries for knowledge organization systems: Functionality, use, and attributes," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 65(9), pages 1901-1916, September.
    3. Gondy Leroy & Hsinchun Chen, 2005. "Genescene: An ontology‐enhanced integration of linguistic and co‐occurrence based relations in biomedical texts," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 56(5), pages 457-468, March.
    4. Marcia Lei Zeng & Lois Mai Chan, 2004. "Trends and issues in establishing interoperability among knowledge organization systems," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 55(5), pages 377-395, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wang, Zhenhua & Ren, Ming & Gao, Dong & Li, Zhuang, 2023. "A Zipf's law-based text generation approach for addressing imbalance in entity extraction," Journal of Informetrics, Elsevier, vol. 17(4).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stubkjær, Erik & Çağdaş, Volkan, 2021. "Alignment of standards through semantic tools – The case of land administration," Land Use Policy, Elsevier, vol. 104(C).
    2. Saeedeh Ahmadi & Saeed Khanagha & Luca Berchicci & Justin J. P. Jansen, 2017. "Are Managers Motivated to Explore in the Face of a New Technological Change? The Role of Regulatory Focus, Fit, and Complexity of Decision‐Making," Journal of Management Studies, Wiley Blackwell, vol. 54(2), pages 209-237, March.
    3. Qingqiang Wu & Yichen Kuang & Qingqi Hong & Yingying She, 2019. "Frontier knowledge discovery and visualization in cancer field based on KOS and LDA," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(3), pages 979-1010, March.
    4. Wei Du & Raymond Yiu Keung Lau & Jian Ma & Wei Xu, 2015. "A multi-faceted method for science classification schemes (SCSs) mapping in networking scientific resources," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 2035-2056, December.
    5. Stephann Makri, 2020. "Information informing design: Information Science research with implications for the design of digital information environments," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(11), pages 1402-1412, November.
    6. Jan Pawlowski & Markus Bick & René Peinl & Stefan Thalmann & Ronald Maier & Lars Hetmank & Paul Kruse & Malte Martensen & Henri Pirkkalainen, 2014. "Social Knowledge Environments," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 6(2), pages 81-88, April.
    7. Isto Huvila & Heidi Enwald & Kristina Eriksson‐Backa & Ying‐Hsang Liu & Noora Hirvonen, 2022. "Information behavior and practices research informing information systems design," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 1043-1057, July.
    8. Isto Huvila, 2022. "Making and taking information," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(4), pages 528-541, April.
    9. Wei Du & Xusen Cheng & Chen Yang & Jianshan Sun & Jian Ma, 2017. "Establishing interoperability among knowledge organization systems for research management: a social network approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1489-1506, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jinfst:v:67:y:2016:i:5:p:1138-1152. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.asis.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.