IDEAS home Printed from https://ideas.repec.org/a/spr/sankhb/v86y2024i1d10.1007_s13571-023-00321-9.html
   My bibliography  Save this article

Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples

Author

Listed:
  • Brajendra C. Sutradhar

    (Memorial University)

Abstract

In a two stage clusters sampling (TSCS) setup, a sample of clusters is chosen at the first stage from a large number of clusters belonging to a finite population (FP), and in the second stage a random sample of individuals is chosen from the selected cluster. In this sampling setup, it is of interest to collect responses along with certain multi-dimensional fixed covariates from all individuals selected in the second stage cluster, and examine the effects of such covariates on the responses. In some studies, the fixed covariates from the so-called sampling frame consisting of all first-stage clustered individuals may be available. Because the responses in a given cluster share a common random cluster effect, they are correlated. Thus, if the first-stage clusters based data were all available, one could estimate the regression parameters/effects by using the standard infinite population based generalized least square (GLS) approach that produces efficient estimates as compared to the simpler OLS (ordinary least square) estimates. But, in the present TSCS setup, the first-stage clustered data are not available, and hence the estimation has to be done using second-stage clusters, where the responses may not be assumed any more arising from the infinite population, rather there is a sampling effect to consider in order to develop appropriate estimating equations for the regression parameters. However, the existing four decades long studies including a pioneer work by Prasad and Rao (J. Am. Stat. Assoc., 85, 163–171 1990) used the same GLS estimation by treating the second stage clusters as the first stage clusters following a super-population model based correlation structure. In this paper, we revisit this important inference issue and find that because the existing second-stage clusters based GLS approach is constructed ignoring the sampling effect (of the first stage clusters), leave alone the efficiency gain, this approach produces biased and hence inconsistent estimates for the regression parameters and other related subsequent effects. As a remedy, on top of sampling weights we introduce an inverse correlation weight to the second stage clustered elements and provide a doubly weighted GLS (DWGLS) estimation approach which produces unbiased and consistent estimates of the regression parameters. The correlation parameters are also consistently estimated. A numerical illustration using a hypothetical two-stage cluster sample is provided to understand the estimation biases caused by sampling mis-specification under a simpler specialized linear cluster model with no covariates without any loss of generality. For the general regression case, the unbiasedness and consistency properties of the proposed estimator of the regression parameter, which is of main interest, are studied analytically in details. The asymptotic normality of the regression estimator is also studied for the construction of confidence intervals when needed.

Suggested Citation

  • Brajendra C. Sutradhar, 2024. "Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(1), pages 55-90, May.
  • Handle: RePEc:spr:sankhb:v:86:y:2024:i:1:d:10.1007_s13571-023-00321-9
    DOI: 10.1007/s13571-023-00321-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13571-023-00321-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13571-023-00321-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sankhb:v:86:y:2024:i:1:d:10.1007_s13571-023-00321-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.