IDEAS home Printed from https://ideas.repec.org/a/wsi/jikmxx/v19y2020i01ns0219649220400134.html
   My bibliography  Save this article

Outlier Detection in High Dimensional Data

Author

Listed:
  • Firuz Kamalov

    (Canadian University Dubai, Dubai, UAE)

  • Ho Hon Leung

    (UAE University, UAE)

Abstract

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by F1-score. Our method also produces better-than-average execution times compared with the benchmark methods.

Suggested Citation

  • Firuz Kamalov & Ho Hon Leung, 2020. "Outlier Detection in High Dimensional Data," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 1-16, March.
  • Handle: RePEc:wsi:jikmxx:v:19:y:2020:i:01:n:s0219649220400134
    DOI: 10.1142/S0219649220400134
    as

    Download full text from publisher

    File URL: https://www.worldscientific.com/doi/abs/10.1142/S0219649220400134
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219649220400134?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Firuz Kamalov & Fadi Thabtah, 2017. "A Feature Selection Method Based on Ranked Vector Scores of Features for Classification," Annals of Data Science, Springer, vol. 4(4), pages 483-502, December.
    2. Ikhlaas Gurrib & Firuz Kamalov, 2019. "The implementation of an adjusted relative strength index model in foreign currency and energy markets of emerging and developed economies," Macroeconomics and Finance in Emerging Market Economies, Taylor & Francis Journals, vol. 12(2), pages 105-123, May.
    3. van Capelleveen, Guido & Poel, Mannes & Mueller, Roland M. & Thornton, Dallas & van Hillegersberg, Jos, 2016. "Outlier detection in healthcare fraud: A case study in the Medicaid dental domain," International Journal of Accounting Information Systems, Elsevier, vol. 21(C), pages 18-31.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Firuz Kamalov & Ho Hon Leung & Sherif Moussa, 2022. "Monotonicity of the $$\chi ^2$$ χ 2 -statistic and Feature Selection," Annals of Data Science, Springer, vol. 9(6), pages 1223-1241, December.
    2. Firuz Kamalov & Linda Smail & Ikhlaas Gurrib, 2021. "Stock price forecast with deep learning," Papers 2103.14081, arXiv.org.
    3. Firuz Kamalov & Linda Smail & Ikhlaas Gurrib, 2021. "Forecasting with Deep Learning: S&P 500 index," Papers 2103.14080, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ikhlaas Gurrib & Mohammad Nourani & Rajesh Kumar Bhaskaran, 2022. "Energy crypto currencies and leading U.S. energy stock prices: are Fibonacci retracements profitable?," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-27, December.
    2. Ikhlaas Gurrib & Firuz Kamalov & Elgilani Elshareif, 2021. "Can the Leading US Energy Stock Prices be Predicted using the Ichimoku Cloud?," International Journal of Energy Economics and Policy, Econjournals, vol. 11(1), pages 41-51.
    3. Firuz Kamalov & Linda Smail & Ikhlaas Gurrib, 2021. "Stock price forecast with deep learning," Papers 2103.14081, arXiv.org.
    4. Ikhlaas Gurrib & Firuz Kamalov & Olga Starkova & Adham Makki & Anita Mirchandani & Namrata Gupta, 2023. "Performance of Equity Investments in Sustainable Environmental Markets," Sustainability, MDPI, vol. 15(9), pages 1-28, May.
    5. Mohammed Rajab & Dennis Wang, 2020. "Practical Challenges and Recommendations of Filter Methods for Feature Selection," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 19(01), pages 1-15, March.
    6. Koreff, Jared & Weisner, Martin & Sutton, Steve G., 2021. "Data analytics (ab) use in healthcare fraud audits," International Journal of Accounting Information Systems, Elsevier, vol. 42(C).
    7. Firuz Kamalov & Fadi Thabtah & Ho Hon Leung, 2023. "Feature Selection in Imbalanced Data," Annals of Data Science, Springer, vol. 10(6), pages 1527-1541, December.
    8. Guohua Zeng & Peiying Wu & Xinxin Yuan, 2023. "Has the Development of the Digital Economy Reduced the Regional Energy Intensity—From the Perspective of Factor Market Distortion, Industrial Structure Upgrading and Technological Progress?," Sustainability, MDPI, vol. 15(7), pages 1-19, March.
    9. Ikhlaas Gurrib, 2022. "Technical Analysis, Energy Cryptos and Energy Equity Markets," International Journal of Energy Economics and Policy, Econjournals, vol. 12(2), pages 249-267, March.
    10. Ikhlaas Gurrib, 2023. "Momentum in Low Carbon and Fossil Fuel Free Equity Investing," International Journal of Energy Economics and Policy, Econjournals, vol. 13(5), pages 461-471, September.
    11. Firuz Kamalov & Ho Hon Leung & Sherif Moussa, 2022. "Monotonicity of the $$\chi ^2$$ χ 2 -statistic and Feature Selection," Annals of Data Science, Springer, vol. 9(6), pages 1223-1241, December.
    12. Majed Rajab, 2019. "Visualisation Model Based on Phishing Features," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 18(01), pages 1-17, March.
    13. Pei, Duo & Vasarhelyi, Miklos A., 2020. "Big data and algorithmic trading against periodic and tangible asset reporting: The need for U-XBRL," International Journal of Accounting Information Systems, Elsevier, vol. 37(C).
    14. Firuz Kamalov, 2019. "Forecasting significant stock price changes using neural networks," Papers 1912.08791, arXiv.org.
    15. Alyssa J. Rolfe, 2021. "Weighted risk models for dynamic healthcare fraud detection," Risk Management and Insurance Review, American Risk and Insurance Association, vol. 24(2), pages 143-150, June.
    16. Ludivia Hernandez Aros & Luisa Ximena Bustamante Molano & Fernando Gutierrez-Portela & John Johver Moreno Hernandez & Mario Samuel Rodríguez Barrero, 2024. "Financial fraud detection through the application of machine learning techniques: a literature review," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-22, December.
    17. O'Malley, A. James & Bubolz, Thomas A. & Skinner, Jonathan S., 2023. "The diffusion of health care fraud: A bipartite network analysis," Social Science & Medicine, Elsevier, vol. 327(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:jikmxx:v:19:y:2020:i:01:n:s0219649220400134. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/jikm/jikm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.