IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v193y2023ics0047259x22001178.html
   My bibliography  Save this article

TNN: A transfer learning classifier based on weighted nearest neighbors

Author

Listed:
  • Sheng, Haiyang
  • Yu, Guan

Abstract

Weighted nearest neighbors (WNN) classifiers are popular non-parametric classifiers. Despite the significant progress in WNN, most existing WNN classifiers are designed for traditional supervised learning problems where both training samples and test samples are assumed to be independent and identically distributed. However, in many real applications, it could be difficult or expensive to obtain training samples from the distribution of interest. Therefore, data collected from some related distributions are often used as supplementary training data for the classification task under the distribution of interest. It is essential to develop effective classification methods that could incorporate both training samples from the distribution of interest (if they exist) and supplementary training samples from a different but related distribution. To address this challenge, we propose a novel Transfer learning weighted Nearest Neighbors (TNN) classifier. As a WNN classifier, TNN determines the weights on the class labels of training samples for different test samples adaptively by minimizing an upper bound on the conditional expectation of the estimation error of the regression function. It puts decreasing weights on the class labels of the successive more distant neighbors. To accommodate the difference between training samples from the distribution of interest and supplementary training samples, TNN adds a non-negative offset to the distance between each supplementary training sample and the test sample, and thus constrains the excessive influence of the supplementary training samples on the prediction. Our theoretical studies show that, under certain conditions, TNN is consistent and minimax optimal (up to a logarithmic factor) in the covariate shift setting. In the posterior drift or the more general setting where both covariate shift and posterior drift exist, the excess risk of TNN depends on the maximum posterior discrepancy between the distribution of the supplementary training samples and the distribution of interest. Both our simulation studies and an application to the land use/land cover mapping problem in geography demonstrate that TNN outperforms other existing methods. It can serve as an effective tool for transfer learning.

Suggested Citation

  • Sheng, Haiyang & Yu, Guan, 2023. "TNN: A transfer learning classifier based on weighted nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 193(C).
  • Handle: RePEc:eee:jmvana:v:193:y:2023:i:c:s0047259x22001178
    DOI: 10.1016/j.jmva.2022.105126
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X22001178
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2022.105126?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Masashi Sugiyama & Taiji Suzuki & Shinichi Nakajima & Hisashi Kashima & Paul Bünau & Motoaki Kawanabe, 2008. "Direct importance estimation for covariate shift adaptation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 60(4), pages 699-746, December.
    2. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dylan Brewer & Alyssa Carlson, 2024. "Addressing sample selection bias for machine learning methods," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(3), pages 383-400, April.
    2. Darima Fotheringham & Michael A. Wiles, 2023. "The effect of implementing chatbot customer service on stock returns: an event study analysis," Journal of the Academy of Marketing Science, Springer, vol. 51(4), pages 802-822, July.
    3. Song, Wei-Ling & Uzmanoglu, Cihan, 2016. "TARP announcement, bank health, and borrowers’ credit risk," Journal of Financial Stability, Elsevier, vol. 22(C), pages 22-32.
    4. Raymundo M. Campos-Vázquez, 2013. "Efectos de los ingresos no reportados en el nivel y tendencia de la pobreza laboral en México," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(2), pages 23-54, November.
    5. Stephen Brown & William Goetzmann & Bing Liang & Christopher Schwarz, 2008. "Mandatory Disclosure and Operational Risk: Evidence from Hedge Fund Registration," Journal of Finance, American Finance Association, vol. 63(6), pages 2785-2815, December.
    6. Paul W. Miller & Barry R. Chiswick, 2002. "Immigrant earnings: Language skills, linguistic concentrations and the business cycle," Journal of Population Economics, Springer;European Society for Population Economics, vol. 15(1), pages 31-57.
    7. Chul‐Woo Kwon & Peter F. Orazem & Daniel M. Otto, 2006. "Off‐farm labor supply responses to permanent and transitory farm income," Agricultural Economics, International Association of Agricultural Economists, vol. 34(1), pages 59-67, January.
    8. Jonathan Gruber & Aaron Yelowitz, 1999. "Public Health Insurance and Private Savings," Journal of Political Economy, University of Chicago Press, vol. 107(6), pages 1249-1274, December.
    9. Jean-Louis Arcand & Linguère M'Baye, 2013. "Braving the waves: the role of time and risk preferences in illegal migration from Senegal," CERDI Working papers halshs-00855937, HAL.
    10. Sandra Müllbacher & Wolfgang Nagl, 2017. "Labour supply in Austria: an assessment of recent developments and the effects of a tax reform," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 44(3), pages 465-486, August.
    11. Campbell, Randall C. & Nagel, Gregory L., 2016. "Private information and limitations of Heckman's estimator in banking and corporate finance research," Journal of Empirical Finance, Elsevier, vol. 37(C), pages 186-195.
    12. Leye Li & Louise Yi Lu & Dongyue Wang, 2022. "External labour market competitions and stock price crash risk: evidence from exposures to competitor CEOs’ award‐winning events," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 62(S1), pages 1421-1460, April.
    13. Jože P. Damijan & Mark Knell, 2005. "How Important Is Trade and Foreign Ownership in Closing the Technology Gap? Evidence from Estonia and Slovenia," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 141(2), pages 271-295, July.
    14. Calcagno, R. & Renneboog, L.D.R., 2004. "Capital Structure and Managerial Compensation : The Effects of Renumeration Seniority," Discussion Paper 2004-120, Tilburg University, Center for Economic Research.
    15. Nakashima, Kiyotaka & Ogawa, Toshiaki, 2020. "The Impacts of Strengthening Regulatory Surveillance on Bank Behavior: A Dynamic Analysis from Incomplete to Complete Enforcement of Capital Regulation in Microprudential Policy," MPRA Paper 99938, University Library of Munich, Germany.
    16. Sarah Bridges & David Lawson, 2008. "Health and Labour Market Participation in Uganda," WIDER Working Paper Series DP2008-07, World Institute for Development Economic Research (UNU-WIDER).
    17. Ahn T. Le, 2003. "Female Labour Market Participation: Differences Between Primary and Tied Movers," Economics Discussion / Working Papers 03-17, The University of Western Australia, Department of Economics.
    18. Inmaculada Garc�a-Mainar & V�ctor M. Montuenga-G�mez, 2017. "Subjective educational mismatch and signalling in Spain," Documentos de Trabajo dt2017-03, Facultad de Ciencias Económicas y Empresariales, Universidad de Zaragoza.
    19. Insik Min & Jong‐Ho Kim, 2003. "Modeling Credit Card Borrowing: A Comparison of Type I and Type II Tobit Approaches," Southern Economic Journal, John Wiley & Sons, vol. 70(1), pages 128-143, July.
    20. Son K. Lam & Thomas E. DeCarlo & Ashish Sharma, 2019. "Salesperson ambidexterity in customer engagement: do customer base characteristics matter?," Journal of the Academy of Marketing Science, Springer, vol. 47(4), pages 659-680, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:193:y:2023:i:c:s0047259x22001178. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.