IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v144y2020ics0167947319302403.html
   My bibliography  Save this article

Computing confidence intervals from massive data via penalized quantile smoothing splines

Author

Listed:
  • Zhang, Likun
  • Castillo, Enrique del
  • Berglund, Andrew J.
  • Tingley, Martin P.
  • Govind, Nirmal

Abstract

New methodology is presented for the computation of pointwise confidence intervals from massive response data sets in one or two covariates using robust and flexible quantile regression splines. Novel aspects of the method include a new cross-validation procedure for selecting the penalization coefficient and a reformulation of the quantile smoothing problem based on a weighted data representation. These innovations permit for uncertainty quantification and fast parameter selection in very large data sets via a distributed “bag of little bootstraps”. Experiments with synthetic data demonstrate that the computed confidence intervals feature empirical coverage rates that are generally within 2% of the nominal rates. The approach is broadly applicable to the analysis of large data sets in one or two dimensions. Comparative (or “A/B”) experiments conducted at Netflix aimed at optimizing the quality of streaming video originally motivated this work, but the proposed methods have general applicability. The methodology is illustrated using an open source application: the comparison of geo-spatial climate model scenarios from NASA’s Earth Exchange.

Suggested Citation

  • Zhang, Likun & Castillo, Enrique del & Berglund, Andrew J. & Tingley, Martin P. & Govind, Nirmal, 2020. "Computing confidence intervals from massive data via penalized quantile smoothing splines," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
  • Handle: RePEc:eee:csdana:v:144:y:2020:i:c:s0167947319302403
    DOI: 10.1016/j.csda.2019.106885
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947319302403
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2019.106885?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Koenker, Roger W & Bassett, Gilbert, Jr, 1978. "Regression Quantiles," Econometrica, Econometric Society, vol. 46(1), pages 33-50, January.
    2. Bosch, Ronald J. & Ye, Yinyu & Woodworth, George G., 1995. "A convergent algorithm for quantile regression with smoothing splines," Computational Statistics & Data Analysis, Elsevier, vol. 19(6), pages 613-630, June.
    3. Hee‐Seok Oh & Doug Nychka & Tim Brown & Paul Charbonneau, 2004. "Period analysis of variable stars by robust smoothing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 53(1), pages 15-30, January.
    4. Max Sommerfeld & Stephan Sain & Armin Schwartzman, 2018. "Confidence Regions for Spatial Excursion Sets From Repeated Random Field Observations, With an Application to Climate," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1327-1340, July.
    5. Reiss Philip T. & Huang Lei, 2012. "Smoothness Selection for Penalized Quantile Regression Splines," The International Journal of Biostatistics, De Gruyter, vol. 8(1), pages 1-27, May.
    6. Ariel Kleiner & Ameet Talwalkar & Purnamrita Sarkar & Michael I. Jordan, 2014. "A scalable bootstrap for massive data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 795-816, September.
    7. Yuan, Ming, 2006. "GACV for quantile smoothing splines," Computational Statistics & Data Analysis, Elsevier, vol. 50(3), pages 813-829, February.
    8. Detlef Vuuren & Jae Edmonds & Mikiko Kainuma & Keywan Riahi & Allison Thomson & Kathy Hibbard & George Hurtt & Tom Kram & Volker Krey & Jean-Francois Lamarque & Toshihiko Masui & Malte Meinshausen & N, 2011. "The representative concentration pathways: an overview," Climatic Change, Springer, vol. 109(1), pages 5-31, November.
    9. Roger Koenker & Ivan Mizera, 2004. "Penalized triograms: total variation regularization for bivariate smoothing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(1), pages 145-163, February.
    10. Dikta, Gerhard, 1990. "Bootstrap approximation of nearest neighbor regression function estimates," Journal of Multivariate Analysis, Elsevier, vol. 32(2), pages 213-229, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Otto-Sobotka, Fabian & Salvati, Nicola & Ranalli, Maria Giovanna & Kneib, Thomas, 2019. "Adaptive semiparametric M-quantile regression," Econometrics and Statistics, Elsevier, vol. 11(C), pages 116-129.
    2. Charlier, Isabelle & Paindaveine, Davy & Saracco, Jérôme, 2015. "Conditional quantile estimation based on optimal quantization: From theory to practice," Computational Statistics & Data Analysis, Elsevier, vol. 91(C), pages 20-39.
    3. Marcio Laurini, 2007. "A note on the use of quantile regression in beta convergence analysis," Economics Bulletin, AccessEcon, vol. 3(52), pages 1-8.
    4. Sungwan Bang & Soo-Heang Eo & Yong Mee Cho & Myoungshic Jhun & HyungJun Cho, 2016. "Non-crossing weighted kernel quantile regression with right censored data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 22(1), pages 100-121, January.
    5. Elisabeth Waldmann & Thomas Kneib & Yu Ryan Yu & Stefan Lang, 2012. "Bayesian semiparametric additive quantile regression," Working Papers 2012-06, Faculty of Economics and Statistics, Universität Innsbruck.
    6. Monica Pratesi & M. Ranalli & Nicola Salvati, 2009. "Nonparametric -quantile regression using penalised splines," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 21(3), pages 287-304.
    7. Maria Marino & Alessio Farcomeni, 2015. "Linear quantile regression models for longitudinal experiments: an overview," METRON, Springer;Sapienza Università di Roma, vol. 73(2), pages 229-247, August.
    8. Jooyong Shim & Changha Hwang & Kyungha Seok, 2014. "Composite support vector quantile regression estimation," Computational Statistics, Springer, vol. 29(6), pages 1651-1665, December.
    9. Lian, Heng & Meng, Jie & Fan, Zengyan, 2015. "Simultaneous estimation of linear conditional quantiles with penalized splines," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 1-21.
    10. Takuma Yoshida, 2021. "Additive models for extremal quantile regression with Pareto-type distributions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(1), pages 103-134, March.
    11. Park, Jinho, 2017. "Solution path for quantile regression with epsilon-insensitive loss in a reproducing kernel Hilbert space," Statistics & Probability Letters, Elsevier, vol. 126(C), pages 205-211.
    12. Bang, Sungwan & Jhun, Myoungshic, 2012. "Simultaneous estimation and factor selection in quantile regression via adaptive sup-norm regularization," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 813-826.
    13. Poletti Laurini, Márcio & Moura, Marcelo, 2010. "Constrained smoothing B-splines for the term structure of interest rates," Insurance: Mathematics and Economics, Elsevier, vol. 46(2), pages 339-350, April.
    14. Jooyong Shim & Yongtae Kim & Jangtaek Lee & Changha Hwang, 2012. "Estimating value at risk with semiparametric support vector quantile regression," Computational Statistics, Springer, vol. 27(4), pages 685-700, December.
    15. Fritsch, Markus & Haupt, Harry & Ng, Pin T., 2016. "Urban house price surfaces near a World Heritage Site: Modeling conditional price and spatial heterogeneity," Regional Science and Urban Economics, Elsevier, vol. 60(C), pages 260-275.
    16. Park, Jinho & Kim, Jeankyung, 2011. "Quantile regression with an epsilon-insensitive loss in a reproducing kernel Hilbert space," Statistics & Probability Letters, Elsevier, vol. 81(1), pages 62-70, January.
    17. Reiss Philip T. & Huang Lei, 2012. "Smoothness Selection for Penalized Quantile Regression Splines," The International Journal of Biostatistics, De Gruyter, vol. 8(1), pages 1-27, May.
    18. Shim, Jooyong & Hwang, Changha, 2009. "Support vector censored quantile regression under random censoring," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 912-919, February.
    19. Yue, Yu Ryan & Rue, Håvard, 2011. "Bayesian inference for additive mixed quantile regression models," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 84-96, January.
    20. Wu, Chaojiang & Yu, Yan, 2014. "Partially linear modeling of conditional quantiles using penalized splines," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 170-187.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:144:y:2020:i:c:s0167947319302403. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.