IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v56y2015i1p167-189.html
   My bibliography  Save this article

Using prior information in privacy-protecting survey designs for categorical sensitive variables

Author

Listed:
  • Heiko Groenitz

Abstract

To gather data on sensitive characteristics, such as annual income, tax evasion, insurance fraud or students’ cheating behavior, direct questioning is problematic, because it often results in answer refusal or untruthful responses. For this reason, several randomized response (RR) and nonrandomized response (NRR) survey designs, which increase cooperation by protecting the respondents’ privacy, have been proposed in the literature. In the first part of this paper, we present a Bayesian extension of a recently published, innovative NRR method for multichotomous sensitive variables. With this extension, the investigator is able to incorporate prior information on the parameter, e.g., based on a previous study, into the estimation and to improve the estimation precision. In particular, we derive different point and interval estimates by the EM algorithm and data augmentation. The performance of the considered estimators is evaluated in a simulation study. In the second part of this article, we show that for any RR or NRR model addressing the estimation of the distribution of a categorical sensitive characteristic, the design matrices of the model play the central role for the Bayes estimation whereas the concrete answer scheme is irrelevant. This observation enables us to widely generalize the calculations from the first part and to establish a common approach for Bayes inference in RR and NRR designs for categorical sensitive variables. This unified approach covers even multi-stage models and models that require more than one sample. Copyright Springer-Verlag Berlin Heidelberg 2015

Suggested Citation

  • Heiko Groenitz, 2015. "Using prior information in privacy-protecting survey designs for categorical sensitive variables," Statistical Papers, Springer, vol. 56(1), pages 167-189, February.
  • Handle: RePEc:spr:stpapr:v:56:y:2015:i:1:p:167-189
    DOI: 10.1007/s00362-013-0573-3
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00362-013-0573-3
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00362-013-0573-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Arijit Chaudhuri & Mausumi Bose & Kajal Dihidar, 2011. "Estimation of a sensitive proportion by Warner’s randomized response data through inverse sampling," Statistical Papers, Springer, vol. 52(2), pages 343-354, May.
    2. Lucio Barabesi & Sara Franceschi & Marzia Marcheselli, 2012. "A randomized response procedure for multiple-sensitive questions," Statistical Papers, Springer, vol. 53(3), pages 703-718, August.
    3. Tan, Ming T. & Tian, Guo-Liang & Tang, Man-Lai, 2009. "Sample Surveys With Sensitive Questions: A Nonrandomized Response Approach," The American Statistician, American Statistical Association, vol. 63(1), pages 9-16.
    4. Migon, Helio S. & Tachibana, Vilma M., 1997. "Bayesian approximations in randomized response model," Computational Statistics & Data Analysis, Elsevier, vol. 24(4), pages 401-409, June.
    5. Lucio Barabesi & Marzia Marcheselli, 2010. "Bayesian estimation of proportion and sensitivity level in randomized response procedures," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 72(1), pages 75-88, July.
    6. van den Hout, Ardo & van der Heijden, Peter G.M. & Gilchrist, Robert, 2007. "The logistic regression model with response variables subject to randomized response," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6060-6069, August.
    7. Jun-Wu Yu & Guo-Liang Tian & Man-Lai Tang, 2008. "Two new models for survey sampling with sensitive characteristic: design and analysis," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 67(3), pages 251-263, April.
    8. Gerty J. L. M. Lensvelt‐Mulders & Peter G. M. Van Der Heijden & Olav Laudy & Ger Van Gils, 2006. "A validation of a computer‐assisted randomized response survey to estimate the prevalence of fraud in social security," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 169(2), pages 305-318, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Truong-Nhat Le & Shen-Ming Lee & Phuoc-Loc Tran & Chin-Shang Li, 2023. "Randomized Response Techniques: A Systematic Review from the Pioneering Work of Warner (1965) to the Present," Mathematics, MDPI, vol. 11(7), pages 1-26, April.
    2. Shen‐Ming Lee & Truong‐Nhat Le & Phuoc‐Loc Tran & Chin‐Shang Li, 2022. "Investigating the association of a sensitive attribute with a random variable using the Christofides generalised randomised response design and Bayesian methods," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1471-1502, November.
    3. Shen-Ming Lee & Phuoc-Loc Tran & Truong-Nhat Le & Chin-Shang Li, 2023. "Prediction of a Sensitive Feature under Indirect Questioning via Warner’s Randomized Response Technique and Latent Class Model," Mathematics, MDPI, vol. 11(2), pages 1-21, January.
    4. Balgobin Nandram & Yuan Yu, 2019. "Bayesian Analysis of Sparse Counts Obtained From the Unrelated Question Design," International Journal of Statistics and Probability, Canadian Center of Science and Education, vol. 8(5), pages 66-84, September.
    5. Groenitz, Heiko, 2016. "A covariate nonrandomized response model for multicategorical sensitive variables," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 124-138.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Truong-Nhat Le & Shen-Ming Lee & Phuoc-Loc Tran & Chin-Shang Li, 2023. "Randomized Response Techniques: A Systematic Review from the Pioneering Work of Warner (1965) to the Present," Mathematics, MDPI, vol. 11(7), pages 1-26, April.
    2. Groenitz, Heiko, 2016. "A covariate nonrandomized response model for multicategorical sensitive variables," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 124-138.
    3. Andreas Lagerås & Mathias Lindholm, 2020. "How to ask sensitive multiple‐choice questions," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(2), pages 397-424, June.
    4. Lucio Barabesi & Giancarlo Diana & Pier Perri, 2013. "Design-based distribution function estimation for stigmatized populations," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 76(7), pages 919-935, October.
    5. Sabrina Giordano & Pier Perri, 2012. "Efficiency comparison of unrelated question models based on same privacy protection degree," Statistical Papers, Springer, vol. 53(4), pages 987-999, November.
    6. Guo-Liang Tian, 2014. "A new non-randomized response model: The parallel model," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(4), pages 293-323, November.
    7. Lucio Barabesi & Giancarlo Diana & Pier Perri, 2015. "Gini index estimation in randomized response surveys," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 99(1), pages 45-62, January.
    8. Qiu, Shi-Fang & Zou, G.Y. & Tang, Man-Lai, 2014. "Sample size determination for estimating prevalence and a difference between two prevalences of sensitive attributes using the non-randomized triangular design," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 157-169.
    9. Raghunath Arnab & Dahud Kehinde Shangodoyin & Antonio Arcos, 2019. "Nonrandomized Response Model For Complex Survey Designs," Statistics in Transition New Series, Polish Statistical Association, vol. 20(1), pages 67-86, March.
    10. Liu, Yin & Tian, Guo-Liang, 2013. "A variant of the parallel model for sample surveys with sensitive characteristics," Computational Statistics & Data Analysis, Elsevier, vol. 67(C), pages 115-135.
    11. Andreas Quatember, 2019. "A discussion of the two different aspects of privacy protection in indirect questioning designs," Quality & Quantity: International Journal of Methodology, Springer, vol. 53(1), pages 269-282, January.
    12. Heiko Groenitz, 2014. "A new privacy-protecting survey design for multichotomous sensitive variables," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 77(2), pages 211-224, February.
    13. Carlos Barros, 2012. "Sustainable Tourism in Inhambane-Mozambique," CEsA Working Papers 105, CEsA - Centre for African and Development Studies.
    14. Burgstaller, Lilith & Feld, Lars P. & Pfeil, Katharina, 2022. "Working in the shadow: Survey techniques for measuring and explaining undeclared work," Journal of Economic Behavior & Organization, Elsevier, vol. 200(C), pages 661-671.
    15. Elisabeth Coutts & Ben Jann, 2011. "Sensitive Questions in Online Surveys: Experimental Results for the Randomized Response Technique (RRT) and the Unmatched Count Technique (UCT)," Sociological Methods & Research, , vol. 40(1), pages 169-193, February.
    16. Kazuo Yamaguchi, 2016. "Cross-sectional and Panel Data Analyses of an Incompletely Observed Variable Derived From the Nonrandomized Method for Surveying Sensitive Questions," Sociological Methods & Research, , vol. 45(1), pages 41-68, February.
    17. Pavel Dietz & Anne Quermann & Mireille Nicoline Maria van Poppel & Heiko Striegel & Hannes Schröter & Rolf Ulrich & Perikles Simon, 2018. "Physical and cognitive doping in university students using the unrelated question model (UQM): Assessing the influence of the probability of receiving the sensitive question on prevalence estimation," PLOS ONE, Public Library of Science, vol. 13(5), pages 1-12, May.
    18. Hua Xin & Jianping Zhu & Tzong-Ru Tsai & Chieh-Yi Hung, 2021. "Hierarchical Bayesian Modeling and Randomized Response Method for Inferring the Sensitive-Nature Proportion," Mathematics, MDPI, vol. 9(19), pages 1-12, October.
    19. van den Hout, Ardo & Kooiman, Peter, 2006. "Estimating the linear regression model with categorical covariates subject to randomized response," Computational Statistics & Data Analysis, Elsevier, vol. 50(11), pages 3311-3323, July.
    20. Horng-Jinh Chang & Mei-Pei Kuo, 2012. "Estimation of population proportion in randomized response sampling using weighted confidence interval construction," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 75(5), pages 655-672, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:56:y:2015:i:1:p:167-189. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.