IDEAS home Printed from https://ideas.repec.org/a/vrs/offsta/v35y2019i1p137-165n7.html
   My bibliography  Save this article

An Evolutionary Schema for Using “it-is-what-it-is” Data in Official Statistics

Author

Listed:
  • Lothian Jack

    (360 Hinton Ave S, OttawaON K1Y1A5Canada.)

  • Holmberg Anders

    (Statistics Norway, Division for Methodology, Akersveien 26 Oslo, Norway.)

  • Seyb Allyson

    (Stats NZ, Statistical Methods, Private Bag 4741, Christchurch8011, New Zealand.)

Abstract

The linking of disparate data sets across time, space and sources is probably the foremost current issue facing Central Statistical Agencies (CSA). If one reviews the current literature looking for the prevalent challenges facing CSAs, three issues stand out: 1) using administrative data effectively; 2) big data and what it means for CSAs; and 3) integrating disparate data set (such as health, education and wealth) to provide measurable facts that can guide policy makers. CSAs are being challenged to explore the same kind of challenges faced by Google, Facebook, and Yahoo, which are using graphical/semantic web models for organizing, searching and analysing data. Additionally, time and space (geography) are becoming more important dimensions (domains) for CSAs as they start to explore new data sources and ways to integrate those to study relationships. Central agency methodologists are being pushed to include these new perspectives into their standard theories, practises and policies. Like most methodologists, the authors see surveys and the publications of their results as a process where estimation is the key tool to achieve the final goal of an accurate statistical output. Randomness and sampling exists to support this goal, and early on it was clear to us that the incoming “it-is-what-it-is” data sources were not randomly selected. These sources were obviously biased and thus would produce biased estimates. So, we set out to design a strategy to deal with this issue.This article presents a schema for integrating and linking traditional and non-traditional datasets. Like all survey methodologies, this schema addresses the fundamental issues of representativeness, estimation and total survey error measurement.

Suggested Citation

  • Lothian Jack & Holmberg Anders & Seyb Allyson, 2019. "An Evolutionary Schema for Using “it-is-what-it-is” Data in Official Statistics," Journal of Official Statistics, Sciendo, vol. 35(1), pages 137-165, March.
  • Handle: RePEc:vrs:offsta:v:35:y:2019:i:1:p:137-165:n:7
    DOI: 10.2478/jos-2019-0007
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/jos-2019-0007
    Download Restriction: no

    File URL: https://libkey.io/10.2478/jos-2019-0007?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jabine, Thomas B & Scheuren, Fritz, 1985. "Goals for Statistical Uses of Administrative Records: The Next 10 Years," Journal of Business & Economic Statistics, American Statistical Association, vol. 3(4), pages 380-391, October.
    2. Wu C. & Sitter R. R, 2001. "A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 185-193, March.
    3. David J. Hand, 2018. "Statistical challenges of administrative and transaction data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 555-605, June.
    4. T Holt, 2000. "The future for Official Statistics," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 51(9), pages 1010-1019, September.
    5. Jabine, Thomas B, 1985. "Goals for Statistical Uses of Administrative Records: The Next 10 Years: Reply," Journal of Business & Economic Statistics, American Statistical Association, vol. 3(4), pages 402-404, October.
    6. Li‐Chun Zhang, 2012. "Topics of statistical theory for register‐based statistics and data integration," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(1), pages 41-63, February.
    7. Alfio Ferrara & Andriy Nikolov & François Scharffe, 2011. "Data Linking for the Semantic Web," International Journal on Semantic Web and Information Systems (IJSWIS), IGI Global, vol. 7(3), pages 46-76, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maciej Berk{e}sewicz & Greta Bia{l}kowska & Krzysztof Marcinkowski & Magdalena Ma'slak & Piotr Opiela & Robert Pater & Katarzyna Zadroga, 2019. "Enhancing the Demand for Labour survey by including skills from online job advertisements using model-assisted calibration," Papers 1908.06731, arXiv.org.
    2. Serena Pattaro & Nick Bailey & Chris Dibben, 2020. "Using Linked Longitudinal Administrative Data to Identify Social Disadvantage," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 147(3), pages 865-895, February.
    3. Jae‐Kwang Kim & Siu‐Ming Tam, 2021. "Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference," International Statistical Review, International Statistical Institute, vol. 89(2), pages 382-401, August.
    4. Debashis Ghosh & Michael S. Sabel, 2022. "A Weighted Sample Framework to Incorporate External Calculators for Risk Modeling," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(3), pages 363-379, December.
    5. Peter G. M. van der Heijden & Maarten Cruyff & Paul A. Smith & Christine Bycroft & Patrick Graham & Nathaniel Matheson‐Dunning, 2022. "Multiple system estimation using covariates having missing values and measurement error: Estimating the size of the Māori population in New Zealand," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 156-177, January.
    6. Domingo Morales & María del Mar Rueda & Dolores Esteban, 2018. "Model-Assisted Estimation of Small Area Poverty Measures: An Application within the Valencia Region in Spain," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 138(3), pages 873-900, August.
    7. Bakker Bart F.M. & Heijden Peter G.M. van der & Scholtus Sander, 2015. "Preface," Journal of Official Statistics, Sciendo, vol. 31(3), pages 349-355, September.
    8. Fulvia Cerroni & Grazia Di Bella & Lorena Galiè, 2014. "Evaluating administrative data quality as inputof the statistical production process," Rivista di statistica ufficiale, ISTAT - Italian National Institute of Statistics - (Rome, ITALY), vol. 16(1-2), pages 117-146.
    9. Ieva Burakauskaitė & Andrius Čiginas, 2023. "An Approach to Integrating a Non-Probability Sample in the Population Census," Mathematics, MDPI, vol. 11(8), pages 1-14, April.
    10. Jonas F. Schenkel & Li‐Chun Zhang, 2022. "Adjusting misclassification using a second classifier with an external validation sample," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1882-1902, October.
    11. Fabrizio Antolini & Laura Grassini, 2020. "Methodological problems in the economic measurement of tourism: the need for new sources of information," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(5), pages 1769-1780, December.
    12. M. Rueda & I. Sánchez-Borrego & A. Arcos & S. Martínez, 2010. "Model-calibration estimation of the distribution function using nonparametric regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 71(1), pages 33-44, January.
    13. Elżbieta Gołata, 2016. "Shift In Methodology And Population Census Quality," Statistics in Transition New Series, Polish Statistical Association, vol. 17(4), pages 631-658, December.
    14. Denis Devaud & Yves Tillé, 2019. "Deville and Särndal’s calibration: revisiting a 25-years-old successful optimization problem," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(4), pages 1033-1065, December.
    15. Stephanie Coffey, PhD. & Jaya Damineni & John Eltinge, PhD. & Anup Mathur, PhD. & Kayla Varela & Allison Zotti, 2023. "Some Open Questions on Multiple-Source Extensions of Adaptive-Survey Design Concepts and Methods," Working Papers 23-03, Center for Economic Studies, U.S. Census Bureau.
    16. Li-Chun Zhang & Ib Thomsen & Øyvin Kleven, 2013. "On the Use of Auxiliary and Paradata for Dealing With Non-sampling Errors in Household Surveys," International Statistical Review, International Statistical Institute, vol. 81(2), pages 270-288, August.
    17. A. Arcos & M. Rueda & M. Martínez-Miranda, 2005. "Using multiparametric auxiliary information at the estimation stage," Statistical Papers, Springer, vol. 46(3), pages 339-358, July.
    18. Barranco-Chamorro, I. & Jiménez-Gamero, M.D. & Moreno-Rebollo, J.L. & Muñoz-Pichardo, J.M., 2012. "Case-deletion type diagnostics for calibration estimators in survey sampling," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2219-2236.
    19. Ton de Waal & Arnout van Delden & Sander Scholtus, 2020. "Multi‐source Statistics: Basic Situations and Methods," International Statistical Review, International Statistical Institute, vol. 88(1), pages 203-228, April.
    20. Kim, Jong-Min & Sungur, Engin A. & Heo, Tae-Young, 2007. "Calibration approach estimators in stratified sampling," Statistics & Probability Letters, Elsevier, vol. 77(1), pages 99-103, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:35:y:2019:i:1:p:137-165:n:7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.