Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms

My bibliography Save this article

Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms

Author

Listed:

Meertens Q.A.
(Statistics Netherlands, Henri Faasdreef 312, 2492 JP The Hague, the Netherlands.)
Diks C.G.H.
(University of Amsterdam, Center for Nonlinear Dynamics in Economics and Finance, Roetersstraat 11, 1018 WB Amsterdam, the Netherlands.)
van den Herik H.J.
(Leiden University, Niels Bohrweg 1, 2333 CA Leiden the Netherlands.)
Takes F.W.
(Leiden University, Niels Bohrweg 1, 2333 CA Leiden the Netherlands.)

Registered:

Abstract

National statistical institutes currently investigate how to improve the output quality of official statistics based on machine learning algorithms. A key issue is concept drift, that is, when the joint distribution of independent variables and a dependent (categorical) variable changes over time. Under concept drift, a statistical model requires regular updating to prevent it from becoming biased. However, updating a model asks for additional data, which are not always available. An alternative is to reduce the bias by means of bias correction methods. In the article, we focus on estimating the proportion (base rate) of a category of interest and we compare two popular bias correction methods: the misclassification estimator and the calibration estimator. For prior probability shift (a specific type of concept drift), we investigate the two methods analytically as well as numerically. Our analytical results are expressions for the bias and variance of both methods. As numerical result, we present a decision boundary for the relative performance of the two methods. Our results provide a better understanding of the effect of prior probability shift on output quality. Consequently, we may recommend a novel approach on how to use machine learning algorithms in the context of official statistics.

Suggested Citation

Meertens Q.A. & Diks C.G.H. & van den Herik H.J. & Takes F.W., 2022. "Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms," Journal of Official Statistics, Sciendo, vol. 38(2), pages 485-508, June.

Handle: RePEc:vrs:offsta:v:38:y:2022:i:2:p:485-508:n:8
DOI: 10.2478/jos-2022-0023

Download full text from publisher

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Younes Saidani & Florian Dumpert & Christian Borgs & Alexander Brand & Andreas Nickl & Alexandra Rittmann & Johannes Rohde & Christian Salwiczek & Nina Storfinger & Selina Straub, 2023. "Qualitätsdimensionen maschinellen Lernens in der amtlichen Statistik [Quality Dimensions of Machine Learning in Official Statistics]," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 17(3), pages 253-303, December.

More about this item

Keywords

Output quality; concept drift; prior probability shift; misclassification bias;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:38:y:2022:i:2:p:485-508:n:8. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Improving the Output Quality of Official Statistics Based on Machine Learning Algorithms

Author

Abstract

Suggested Citation

Download full text from publisher

Citations

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data