IDEAS home Printed from https://ideas.repec.org/a/eee/matcom/v186y2021icp19-28.html
   My bibliography  Save this article

Evaluating Machine Learning methods for estimation in online surveys with superpopulation modeling

Author

Listed:
  • Ferri-García, Ramón
  • Castro-Martín, Luis
  • Rueda, María del Mar

Abstract

Online surveys, despite their cost and effort advantages, are particularly prone to selection bias due to the differences between target population and potentially covered population (online population). This leads to the unreliability of estimates coming from online samples unless further adjustments are applied. Some techniques have arisen in the last years regarding this issue, among which superpopulation modeling can be useful in Big Data context where censuses are accessible. This technique uses the sample to train a model capturing the behavior of a target variable which is to be estimated, and applies it to the nonsampled individuals to obtain population-level estimates. The modeling step has been usually done with linear regression or LASSO models, but machine learning (ML) algorithms have been pointed out as promising alternatives. In this study we examine the use of these algorithms in the online survey context, in order to evaluate and compare their performance and adequacy to the problem. A simulation study shows that ML algorithms can effectively volunteering bias to a greater extent than traditional methods in several scenarios.

Suggested Citation

  • Ferri-García, Ramón & Castro-Martín, Luis & Rueda, María del Mar, 2021. "Evaluating Machine Learning methods for estimation in online surveys with superpopulation modeling," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 186(C), pages 19-28.
  • Handle: RePEc:eee:matcom:v:186:y:2021:i:c:p:19-28
    DOI: 10.1016/j.matcom.2020.03.005
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378475420300793
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.matcom.2020.03.005?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    2. Bart Buelens & Joep Burger & Jan A. van den Brakel, 2018. "Comparing Inference Methods for Non‐probability Samples," International Statistical Review, International Statistical Institute, vol. 86(2), pages 322-343, August.
    3. Bates, Douglas & Mächler, Martin & Bolker, Ben & Walker, Steve, 2015. "Fitting Linear Mixed-Effects Models Using lme4," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i01).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Luis Castro-Martín & Maria del Mar Rueda & Ramón Ferri-García, 2020. "Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques," Mathematics, MDPI, vol. 8(6), pages 1-19, June.
    2. Luciano Rogério Braatz de Andrade & Massaine Bandeira e Sousa & Eder Jorge Oliveira & Marcos Deon Vilela de Resende & Camila Ferreira Azevedo, 2019. "Cassava yield traits predicted by genomic selection methods," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-22, November.
    3. Vinny Davies & Richard Reeve & William T. Harvey & Francois F. Maree & Dirk Husmeier, 2017. "A sparse hierarchical Bayesian model for detecting relevant antigenic sites in virus evolution," Computational Statistics, Springer, vol. 32(3), pages 803-843, September.
    4. JANSSENS, Jochen & DE CORTE, Annelies & SÖRENSEN, Kenneth, 2016. "Water distribution network design optimisation with respect to reliability," Working Papers 2016007, University of Antwerp, Faculty of Business and Economics.
    5. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).
    6. Raymond Hernandez & Elizabeth A. Pyatak & Cheryl L. P. Vigen & Haomiao Jin & Stefan Schneider & Donna Spruijt-Metz & Shawn C. Roll, 2021. "Understanding Worker Well-Being Relative to High-Workload and Recovery Activities across a Whole Day: Pilot Testing an Ecological Momentary Assessment Technique," IJERPH, MDPI, vol. 18(19), pages 1-17, October.
    7. Elisabeth Beckmann & Lukas Olbrich & Joseph Sakshaug, 2024. "Multivariate assessment of interviewer-related errors in a cross-national economic survey (Lukas Olbrich, Elisabeth Beckmann, Joseph W. Sakshaug)," Working Papers 253, Oesterreichische Nationalbank (Austrian Central Bank).
    8. Armagan, Artin & Dunson, David, 2011. "Sparse variational analysis of linear mixed models for large data sets," Statistics & Probability Letters, Elsevier, vol. 81(8), pages 1056-1062, August.
    9. Valentina Krenz & Arjen Alink & Tobias Sommer & Benno Roozendaal & Lars Schwabe, 2023. "Time-dependent memory transformation in hippocampus and neocortex is semantic in nature," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    10. Morán-Ordóñez, Alejandra & Ameztegui, Aitor & De Cáceres, Miquel & de-Miguel, Sergio & Lefèvre, François & Brotons, Lluís & Coll, Lluís, 2020. "Future trade-offs and synergies among ecosystem services in Mediterranean forests under global change scenarios," Ecosystem Services, Elsevier, vol. 45(C).
    11. Damian M. Herz & Manuel Bange & Gabriel Gonzalez-Escamilla & Miriam Auer & Keyoumars Ashkan & Petra Fischer & Huiling Tan & Rafal Bogacz & Muthuraman Muthuraman & Sergiu Groppa & Peter Brown, 2022. "Dynamic control of decision and movement speed in the human basal ganglia," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    12. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    13. Dongyan Liu & Chongran Zhou & John K. Keesing & Oscar Serrano & Axel Werner & Yin Fang & Yingjun Chen & Pere Masque & Janine Kinloch & Aleksey Sadekov & Yan Du, 2022. "Wildfires enhance phytoplankton production in tropical oceans," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    14. Zhaogeng Yang & Yanhui Li & Peijin Hu & Jun Ma & Yi Song, 2020. "Prevalence of Anemia and its Associated Factors among Chinese 9-, 12-, and 14-Year-Old Children: Results from 2014 Chinese National Survey on Students Constitution and Health," IJERPH, MDPI, vol. 17(5), pages 1-10, February.
    15. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    16. Baumann, Elias & Kern, Jana & Lessmann, Stefan, 2019. "Usage Continuance in Software-as-a-Service," IRTG 1792 Discussion Papers 2019-005, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    17. repec:cup:judgdm:v:16:y:2021:i:1:p:201-237 is not listed on IDEAS
    18. C. Gabriel Hidalgo Pizango & Eurídice N. Honorio Coronado & Jhon del Águila-Pasquel & Gerardo Flores Llampazo & Johan de Jong & César J. Córdova Oroche & José M. Reyna Huaymacari & Steve J. Carver & D, 2022. "Sustainable palm fruit harvesting as a pathway to conserve Amazon peatland forests," Nature Sustainability, Nature, vol. 5(6), pages 479-487, June.
    19. Martin Feldkircher & Florian Huber & Gary Koop & Michael Pfarrhofer, 2022. "APPROXIMATE BAYESIAN INFERENCE AND FORECASTING IN HUGE‐DIMENSIONAL MULTICOUNTRY VARs," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1625-1658, November.
    20. Eliaz, Kfir & Spiegler, Ran, 2022. "On incentive-compatible estimators," Games and Economic Behavior, Elsevier, vol. 132(C), pages 204-220.
    21. Lin-Lin Wang & Zachary Y. Huang & Wen-Fei Dai & Yong-Ping Yang & Yuan-Wen Duan, 2024. "Mixed effects of honey bees on pollination function in the Tibetan alpine grasslands," Nature Communications, Nature, vol. 15(1), pages 1-12, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:matcom:v:186:y:2021:i:c:p:19-28. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/mathematics-and-computers-in-simulation/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.