IDEAS home Printed from https://ideas.repec.org/p/zbw/zewdip/23001.html
   My bibliography  Save this paper

The SearchEngine: A holistic approach to matching

Author

Listed:
  • Doherr, Thorsten

Abstract

The SearchEngine is an open source project providing an integrated framework for diverse matching activities, especially the linkage of large scale firm data by fuzzy criteria like company names and addresses. At its core, it utilizes an efficient candidate retrieval mechanism implementing a word respectively token driven heuristic. Every record in one table becomes a search term to retrieve similar candidate records in the base table according to a search strategy replacing blocking strategies of conventional matching efforts. Because similarity is inherently established by the candidate selection, it is only required to filter false positives by using the meta data export file derived from the matching heuristic to implement a machine learning approach. This paper discusses the general foundation of the heuristic and the algorithm while two detailed walkthroughs of company linkages show practical examples.

Suggested Citation

  • Doherr, Thorsten, 2023. "The SearchEngine: A holistic approach to matching," ZEW Discussion Papers 23-001, ZEW - Leibniz Centre for European Economic Research.
  • Handle: RePEc:zbw:zewdip:23001
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/268428/1/1832674266.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Doherr, Thorsten, 2021. "Disambiguation by namesake risk assessment," ZEW Discussion Papers 21-021, ZEW - Leibniz Centre for European Economic Research.
    2. Doherr, Thorsten, 2017. "Inventor mobility index: A method to disambiguate inventor careers," ZEW Discussion Papers 17-018, ZEW - Leibniz Centre for European Economic Research.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gottschalk, Sandra & Hauer, Annegret & Ahrens, Jan-Philipp, 2023. "Die volkswirtschaftliche Bedeutung der Familienunternehmen: 6. Auflage und Schwerpunkt "Ausbildungsengagement"," Studien, Stiftung Familienunternehmen / Foundation for Family Businesses, number 281018, March.
    2. Arabzadeh, Hamzeh & Balleer, Almut & Gehrke, Britta & Taskin, Ahmet Ali, 2024. "Minimum wages, wage dispersion and financial constraints in firms," European Economic Review, Elsevier, vol. 163(C).
    3. Breithaupt, Patrick & Hottenrott, Hanna & Rammer, Christian & Römer, Konstantin, 2023. "Mapping employee mobility and employer networks using professional network data," ZEW Discussion Papers 23-041, ZEW - Leibniz Centre for European Economic Research.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jordan Bisset & Dirk Czarnitzki & Thorsten Doherr, 2022. "High Skilled Mobility Under Uncertainty," Working Papers of Department of Management, Strategy and Innovation, Leuven 700195, KU Leuven, Faculty of Economics and Business (FEB), Department of Management, Strategy and Innovation, Leuven.
    2. Bisset, Jordan & Czarnitzki, Dirk & Doherr, Thorsten, 2022. "Policy uncertainty and inventor mobility," ZEW Discussion Papers 22-044, ZEW - Leibniz Centre for European Economic Research.
    3. Bastian Krieger & Maikel Pellens & Knut Blind & Sonia Gruber & Torben Schubert, 2021. "Are firms withdrawing from basic research? An analysis of firm-level publication behaviour in Germany," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9677-9698, December.
    4. Doherr, Thorsten, 2021. "Disambiguation by namesake risk assessment," ZEW Discussion Papers 21-021, ZEW - Leibniz Centre for European Economic Research.
    5. Bisset, Jordan & Czarnitzki, Dirk & Doherr, Thorsten, 2024. "Inventor mobility under uncertainty," Research Policy, Elsevier, vol. 53(1).

    More about this item

    Keywords

    data linkage; firm matching; entity resolution; machine learning;
    All these keywords.

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:zewdip:23001. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/zemande.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.