IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/27962.html
   My bibliography  Save this paper

Sparse Network Asymptotics for Logistic Regression under Possible Misspecification

Author

Listed:
  • Bryan S. Graham

Abstract

Consider a bipartite network where N consumers choose to buy or not to buy M different products. This paper considers the properties of the logit fit of the N ×M array of “i-buys-j” purchase decisions, Y = [Yij ]1≤i≤N,1≤j≤M, onto a vector of known functions of consumer and product attributes under asymptotic sequences where (i) both N and M grow large, (ii) the average number of products purchased per consumer is finite in the limit, (iii) there exists dependence across elements in the same row or same column of Y (i.e., dyadic dependence) and (iv) the true conditional probability of making a purchase may, or may not, take the assumed logit form. Condition (ii) implies that the limiting network of purchases is sparse: only a vanishing fraction of all possible purchases are actually made. Under sparse network asymptotics, I show that the parameter indexing the logit approximation solves a particular Kullback–Leibler Information Criterion (KLIC) minimization problem (defined with respect to a certain Poisson population). This finding provides a simple characterization of the logit pseudo-true parameter under general misspecification. With respect to sampling theory, sparseness implies that the first and last terms in an extended Hoeffding-type variance decomposition of the score of the logit pseudo composite log-likelihood are of equal order. In contrast, under dense network asymptotics, the last term is asymptotically negligible. Asymptotic normality of the logistic regression coefficients is shown using a martingale central limit theorem (CLT) for triangular arrays. Unlike in the dense case, the normality result derived here also holds under degeneracy of the network graphon. Relatedly, when there “happens to be” no dyadic dependence in the dataset in hand, it specializes to recently derived results on the behavior of logistic regression with rare events and iid data. Simulation results suggest that sparse network asymptotics better approximate the finite network distribution of the logit estimator.

Suggested Citation

  • Bryan S. Graham, 2020. "Sparse Network Asymptotics for Logistic Regression under Possible Misspecification," NBER Working Papers 27962, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:27962
    Note: DEV IO ITI LS TWP
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w27962.pdf
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Harold D Chiang & Yukitoshi Matsushita & Taisuke Otsu, 2021. "Multiway empirical likelihood," Papers 2108.04852, arXiv.org, revised Aug 2024.
    2. St'ephane Bonhomme & Koen Jochmans & Martin Weidner, 2024. "A Neyman-Orthogonalization Approach to the Incidental Parameter Problem," Papers 2412.10304, arXiv.org.
    3. Yong Cai, 2022. "Linear Regression with Centrality Measures," Papers 2210.10024, arXiv.org.
    4. Konrad Menzel, 2021. "Bootstrap With Cluster‐Dependence in Two or More Dimensions," Econometrica, Econometric Society, vol. 89(5), pages 2143-2188, September.
    5. Harold D Chiang & Yukitoshi Matsushita & Taisuke Otsu, 2021. "Multiway empirical likelihood," STICERD - Econometrics Paper Series 617, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.

    More about this item

    JEL classification:

    • C01 - Mathematical and Quantitative Methods - - General - - - Econometrics
    • C31 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Cross-Sectional Models; Spatial Models; Treatment Effect Models; Quantile Regressions; Social Interaction Models
    • C33 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Models with Panel Data; Spatio-temporal Models
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:27962. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.