IDEAS home Printed from https://ideas.repec.org/a/inm/ormksc/v40y2021i3p459-480.html
   My bibliography  Save this article

Scalable Data Fusion with Selection Correction: An Application to Customer Base Analysis

Author

Listed:
  • Daniel Minh McCarthy

    (Department of Marketing, Emory University, Atlanta, Georgia 30322)

  • Elliot Shin Oblander

    (Marketing Division, Columbia University, New York, New York 10027)

Abstract

Increasingly, applied researchers study problems for which multiple sources of data are available. These sources may come with varying degrees of aggregation, and some of them may not be representative of the population of interest. Using multiple data sources could lead to richer insights. However, existing data fusion approaches do not correct for selection bias in data sources that may not be representative and either do not scale to large populations or are statistically inefficient. We propose an aggregate-disaggregate data fusion method that corrects for selection bias and is both computationally scalable and statistically efficient. We apply the method to estimate a model of customer acquisition and churn at subscription-based firms. We bring the model to life using a large credit card panel and public data from Spotify, the music streaming service. This application and supporting simulations show that incorporating the granular data through our data fusion method enhances identification and offers richer insights than extant approaches. We find, for example, that previously churned customers remain with Spotify longer than newly adopted subscribers do, implying a more sanguine view of Spotify’s future retention profile than previous approaches that do not use multiple data sources.

Suggested Citation

  • Daniel Minh McCarthy & Elliot Shin Oblander, 2021. "Scalable Data Fusion with Selection Correction: An Application to Customer Base Analysis," Marketing Science, INFORMS, vol. 40(3), pages 459-480, May.
  • Handle: RePEc:inm:ormksc:v:40:y:2021:i:3:p:459-480
    DOI: 10.1287/mksc.2020.1259
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mksc.2020.1259
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mksc.2020.1259?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hansen, Lars Peter, 1982. "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, Econometric Society, vol. 50(4), pages 1029-1054, July.
    2. Bhat, Chandra R., 2001. "Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model," Transportation Research Part B: Methodological, Elsevier, vol. 35(7), pages 677-693, August.
    3. David A. Schweidel & George Knox, 2013. "Incorporating Direct Marketing Activity into Latent Attrition Models," Marketing Science, INFORMS, vol. 32(3), pages 471-487, May.
    4. Tülin Erdem & Michael P. Keane, 1996. "Decision-Making Under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets," Marketing Science, INFORMS, vol. 15(1), pages 1-20.
    5. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521766555, September.
    6. Steven Berry & James Levinsohn & Ariel Pakes, 2004. "Differentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market," Journal of Political Economy, University of Chicago Press, vol. 112(1), pages 68-105, February.
    7. Massimiliano Bonacchi & Kalin Kolev & Baruch Lev, 2015. "Customer Franchise—A Hidden, Yet Crucial, Asset," Contemporary Accounting Research, John Wiley & Sons, vol. 32(3), pages 1024-1049, September.
    8. Andrés Musalem & Eric T. Bradlow & Jagmohan S. Raju, 2009. "Bayesian estimation of random‐coefficients choice models using aggregate data," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 24(3), pages 490-516, April.
    9. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, October.
    10. Heckman, James J, 1991. "Identifying the Hand of the Past: Distinguishing State Dependence from Heterogeneity," American Economic Review, American Economic Association, vol. 81(2), pages 75-79, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pasirayi, Simbarashe & Fennell, Patrick B. & Sen, Argha, 2023. "The effect of third-party delivery partnerships on firm value," Journal of Business Research, Elsevier, vol. 167(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zenetti, German & Klapper, Daniel, 2016. "Advertising Effects Under Consumer Heterogeneity – The Moderating Role of Brand Experience, Advertising Recall and Attitude," Journal of Retailing, Elsevier, vol. 92(3), pages 352-372.
    2. Lapo Filistrucchi & Tobias J. Klein, 2013. "Price Competition in Two-Sided Markets with Heterogeneous Consumers and Network Effects," Working Papers 13-20, NET Institute.
    3. Daniel Ackerberg, 2009. "A new use of importance sampling to reduce computational burden in simulation estimation," Quantitative Marketing and Economics (QME), Springer, vol. 7(4), pages 343-376, December.
    4. Keane, Michael & Ketcham, Jonathan & Kuminoff, Nicolai & Neal, Timothy, 2021. "Evaluating consumers’ choices of Medicare Part D plans: A study in behavioral welfare economics," Journal of Econometrics, Elsevier, vol. 222(1), pages 107-140.
    5. Train, Kenneth & Wilson, Wesley W., 2008. "Estimation on stated-preference experiments constructed from revealed-preference choices," Transportation Research Part B: Methodological, Elsevier, vol. 42(3), pages 191-203, March.
    6. Staus, Alexander, 2011. "Which household attitudes determine the store type choice for meat?," Journal of Retailing and Consumer Services, Elsevier, vol. 18(3), pages 224-234.
    7. Tat Chan & Chakravarthi Narasimhan & Ying Xie, 2013. "Treatment Effectiveness and Side Effects: A Model of Physician Learning," Management Science, INFORMS, vol. 59(6), pages 1309-1325, June.
    8. Hugo Molina, 2024. "Buyer Alliances in Vertically Related Markets," Working Papers hal-03340176, HAL.
    9. Christopher Conlon & Jeff Gortmaker, 2020. "Best practices for differentiated products demand estimation with PyBLP," RAND Journal of Economics, RAND Corporation, vol. 51(4), pages 1108-1161, December.
    10. Pereira, Pedro & Ribeiro, Tiago, 2011. "The impact on broadband access to the Internet of the dual ownership of telephone and cable networks," International Journal of Industrial Organization, Elsevier, vol. 29(2), pages 283-293, March.
    11. Paleti, Rajesh, 2018. "Generalized multinomial probit Model: Accommodating constrained random parameters," Transportation Research Part B: Methodological, Elsevier, vol. 118(C), pages 248-262.
    12. Nathan H. Miller, 2008. "Competition When Consumers Value Firm Scope," EAG Discussions Papers 200807, Department of Justice, Antitrust Division.
    13. Konrad Menzel, 2021. "Structural Sieves," Papers 2112.01377, arXiv.org, revised Apr 2022.
    14. Andrés Elberg & Pedro M. Gardete & Rosario Macera & Carlos Noton, 2019. "Dynamic effects of price promotions: field evidence, consumer search, and supply-side implications," Quantitative Marketing and Economics (QME), Springer, vol. 17(1), pages 1-58, March.
    15. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    16. Pierre Dubois & Rachel Griffith & Martin O'Connell, 2020. "How Well Targeted Are Soda Taxes?," American Economic Review, American Economic Association, vol. 110(11), pages 3661-3704, November.
    17. Makoto Chikaraishi & Akimasa Fujiwara & Junyi Zhang & Kay Axhausen, 2011. "Identifying variations and co-variations in discrete choice models," Transportation, Springer, vol. 38(6), pages 993-1016, November.
    18. Ida, Takanori & Goto, Rei & Takahashi, Yuko & Nishimura, Shuzo, 2011. "Can economic-psychological parameters predict successful smoking cessation?," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 40(3), pages 285-295, May.
    19. Mittelhammer, Ron C. & Judge, George, 2011. "A family of empirical likelihood functions and estimators for the binary response model," Journal of Econometrics, Elsevier, vol. 164(2), pages 207-217, October.
    20. Steve Berry & Oliver B. Linton & Ariel Pakes, 2004. "Limit Theorems for Estimating the Parameters of Differentiated Product Demand Systems," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 71(3), pages 613-654.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormksc:v:40:y:2021:i:3:p:459-480. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.