IDEAS home Printed from https://ideas.repec.org/a/taf/jpropr/v36y2019i1p59-96.html
   My bibliography  Save this article

A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes

Author

Listed:
  • Jorge Iván Pérez-Rave
  • Juan Carlos Correa-Morales
  • Favián González-Echavarría

Abstract

The hedonic price regressions have mainly been used for inference. In contrast, machine learning employed on big data has a great potential for prediction. To contribute to the integration of these two strategies, this article proposes a machine learning approach to the regression analysis of big data, viz. real estate prices, for both inferential and predictive purposes. The methodology incorporates a new procedure of selecting variables, called ‘incremental sample with resampling’ (MINREM). The methodology is tested on two cases. The first is data from web advertisements selling used homes in Colombia (61,826 observations). The second considers the data (58,888 observations) from a sample of the Metropolitan American Housing Survey 2011 obtained and prepared by a reference study. The methodology consists of two stages. The first chooses the important variables under MINREM; the second focuses on the traditional training and validation procedure for machine learning, adding three activities. In both test cases, the methodology shows its value for obtaining highly parsimonious and stable models for different sample sizes, as well as taking advantage of the inferential and predictive use of the obtained regression functions. This paper contributes to an original methodology for big data regression analysis.

Suggested Citation

  • Jorge Iván Pérez-Rave & Juan Carlos Correa-Morales & Favián González-Echavarría, 2019. "A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes," Journal of Property Research, Taylor & Francis Journals, vol. 36(1), pages 59-96, January.
  • Handle: RePEc:taf:jpropr:v:36:y:2019:i:1:p:59-96
    DOI: 10.1080/09599916.2019.1587489
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/09599916.2019.1587489
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/09599916.2019.1587489?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Juergen Deppner & Marcelo Cajias, 2024. "Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 68(2), pages 235-273, February.
    2. Bricongne, Jean-Charles & Meunier, Baptiste & Pouget, Sylvain, 2023. "Web-scraping housing prices in real-time: The Covid-19 crisis in the UK," Journal of Housing Economics, Elsevier, vol. 59(PB).
    3. Jens Kolbe & Rainer Schulz & Martin Wersing & Axel Werwatz, 2021. "Real estate listings and their usefulness for hedonic regressions," Empirical Economics, Springer, vol. 61(6), pages 3239-3269, December.
    4. Ti-Ching Peng, 2021. "The effect of hazard shock and disclosure information on property and land prices: a machine-learning assessment in the case of Japan," Review of Regional Research: Jahrbuch für Regionalwissenschaft, Springer;Gesellschaft für Regionalforschung (GfR), vol. 41(1), pages 1-32, February.
    5. Dieudonné Tchuente & Serge Nyawa, 2022. "Real estate price estimation in French cities using geocoding and machine learning," Annals of Operations Research, Springer, vol. 308(1), pages 571-608, January.
    6. Jorge Iván Pérez-Rave & Rafael Fernández Guerrero & Andrés Salas Vallina & Favián González Echavarría, 2023. "A measurement model of dynamic capabilities of the continuous improvement project and its role in the renewal of the company’s products/services," Operations Management Research, Springer, vol. 16(1), pages 126-140, March.
    7. Vladimir Vargas-Calder'on & Jorge E. Camargo, 2020. "Towards robust and speculation-reduction real estate pricing models based on a data-driven strategy," Papers 2012.09115, arXiv.org.
    8. Chirilus Alexandru I., 2023. "Forecasting Real Estate Prices in Romania: A Lag Optimized Linear Approach," Baltic Journal of Real Estate Economics and Construction Management, Sciendo, vol. 11(1), pages 120-132, January.
    9. Daniel Lo & Kwong Wing Chau & Siu Kei Wong & Michael McCord & Martin Haran, 2022. "Factors Affecting Spatial Autocorrelation in Residential Property Prices," Land, MDPI, vol. 11(6), pages 1-16, June.
    10. Jorge Iván Pérez Rave & Gloria Patricia Jaramillo Álvarez & Juan Carlos Correa Morales, 2023. "Psycho-managerial text mining (PMTM): a framework for developing and validating psychological/managerial constructs from a theory/text-driven approach," Journal of Marketing Analytics, Palgrave Macmillan, vol. 11(4), pages 777-808, December.
    11. Sebastian Gnat, 2021. "Property Mass Valuation on Small Markets," Land, MDPI, vol. 10(4), pages 1-14, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jpropr:v:36:y:2019:i:1:p:59-96. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/RJPR20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.