IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v111y2016i515p1242-1253.html
   My bibliography  Save this article

Classification With Unstructured Predictors and an Application to Sentiment Analysis

Author

Listed:
  • Junhui Wang
  • Xiaotong Shen
  • Yiwen Sun
  • Annie Qu

Abstract

Unstructured data refer to information that lacks certain structures and cannot be organized in a predefined fashion. Unstructured data often involve words, texts, graphs, objects, or multimedia types of files that are difficult to process and analyze with traditional computational tools and statistical methods. This work explores ordinal classification for unstructured predictors with ordered class categories, where imprecise information concerning strengths of association between predictors is available for predicting class labels. However, imprecise information here is expressed in terms of a directed graph, with each node representing a predictor and a directed edge containing pairwise strengths of association between two nodes. One of the targeted applications for unstructured data arises from sentiment analysis, which identifies and extracts the relevant content or opinion of a document concerning a specific event of interest. We integrate the imprecise predictor relations into linear relational constraints over classification function coefficients, where large margin ordinal classifiers are introduced, subject to many quadratically linear constraints. The proposed classifiers are then applied in sentiment analysis using binary word predictors. Computationally, we implement ordinal support vector machines and ψ-learning through a scalable quadratic programming package based on sparse word representations. Theoretically, we show that using relationships among unstructured predictors improves prediction accuracy of classification significantly. We illustrate an application for sentiment analysis using consumer text reviews and movie review data. Supplementary materials for this article are available online.

Suggested Citation

  • Junhui Wang & Xiaotong Shen & Yiwen Sun & Annie Qu, 2016. "Classification With Unstructured Predictors and an Application to Sentiment Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1242-1253, July.
  • Handle: RePEc:taf:jnlasa:v:111:y:2016:i:515:p:1242-1253
    DOI: 10.1080/01621459.2015.1089771
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2015.1089771
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2015.1089771?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Neil Dey & Matthew Singer & Jonathan P. Williams & Srijan Sengupta, 2024. "Word Embeddings as Statistical Estimators," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(2), pages 415-441, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:111:y:2016:i:515:p:1242-1253. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.