IDEAS home Printed from https://ideas.repec.org/a/prg/jnlaip/v2023y2023i2id215p275-295.html
   My bibliography  Save this article

Use of Data Mining for Analysis of Czech Real Estate Market

Author

Listed:
  • Ilya Tsakunov
  • David Chudán

Abstract

This paper analyses data from the real estate market domain. The data were scraped from the bezrealitky.cz portal. The analysis looks at both sales and rental data. A total of 3546 records and 54 attributes were obtained. A basic overview of the data was performed using exploratory data analysis where some basic characteristics of the data were identified, such as the average price of sold and rented flats. More specific results were obtained by applying data mining methods such as regression (linear regression, lasso regression and ridge regression) for predicting the flat prices and payments for utilities, classification (support vector machines, KNN, Gaussian naïve Bayes, decision tree and random forest) for estimating the PENB class (building energy performance certificate) and building condition. Lasso regression performed the most successfully (R2 = 0.76) in predicting the rent price. Among the classification tasks, the best result was achieved with random forest, which had an accuracy over 80% in some cases. Other tasks included clustering (k-means and k-modes) and anomaly detection (isolation forest). The main focus was on descriptive data mining, especially on clustering. Clusters created using the k-means algorithm (silhouette score of 0.78) with flats based on geographic coordinates were identified which show that the most expensive flats are on average in Bohemian regions, followed by Silesia and the cheapest are in central Moravia. Another cluster application identified flats in the Moravian-Silesian region with very high payments for utilities (silhouette score of 0.56). The models can help estimate the value of flats based on their attributes as well as location.

Suggested Citation

  • Ilya Tsakunov & David Chudán, 2023. "Use of Data Mining for Analysis of Czech Real Estate Market," Acta Informatica Pragensia, Prague University of Economics and Business, vol. 2023(2), pages 275-295.
  • Handle: RePEc:prg:jnlaip:v:2023:y:2023:i:2:id:215:p:275-295
    DOI: 10.18267/j.aip.215
    as

    Download full text from publisher

    File URL: http://aip.vse.cz/doi/10.18267/j.aip.215.html
    Download Restriction: free of charge

    File URL: http://aip.vse.cz/doi/10.18267/j.aip.215.pdf
    Download Restriction: free of charge

    File URL: https://libkey.io/10.18267/j.aip.215?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaip:v:2023:y:2023:i:2:id:215:p:275-295. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.