IDEAS home Printed from https://ideas.repec.org/p/zbw/glodps/1214.html
   My bibliography  Save this paper

Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML)

Author

Listed:
  • Meisenbacher, Stephen
  • Norlander, Peter

Abstract

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, and expert classification of any documents with any scheme. To demonstrate this process for building data from text with Machine Learning, we publish open-source resources: the software, a new public document corpus, and a replicable analysis to build an interpretable classifier of suspected "no poach" clauses in franchise documents.

Suggested Citation

  • Meisenbacher, Stephen & Norlander, Peter, 2022. "Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML)," GLO Discussion Paper Series 1214, Global Labor Organization (GLO).
  • Handle: RePEc:zbw:glodps:1214
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/267553/1/GLO-DP-1214.pdf
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    machine learning; natural language processing; text classification; big data;
    All these keywords.

    JEL classification:

    • B41 - Schools of Economic Thought and Methodology - - Economic Methodology - - - Economic Methodology
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • J08 - Labor and Demographic Economics - - General - - - Labor Economics Policies
    • J41 - Labor and Demographic Economics - - Particular Labor Markets - - - Labor Contracts
    • J42 - Labor and Demographic Economics - - Particular Labor Markets - - - Monopsony; Segmented Labor Markets
    • J47 - Labor and Demographic Economics - - Particular Labor Markets - - - Coercive Labor Markets
    • J53 - Labor and Demographic Economics - - Labor-Management Relations, Trade Unions, and Collective Bargaining - - - Labor-Management Relations; Industrial Jurisprudence
    • Z13 - Other Special Topics - - Cultural Economics - - - Economic Sociology; Economic Anthropology; Language; Social and Economic Stratification

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:glodps:1214. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/glabode.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.