IDEAS home Printed from https://ideas.repec.org/a/hin/complx/2818251.html
   My bibliography  Save this article

A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

Author

Listed:
  • Dawen Xia
  • Xiaonan Lu
  • Huaqing Li
  • Wendong Wang
  • Yantao Li
  • Zili Zhang

Abstract

Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP) algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR), CombineFileInputFormat (CFIF), and Sequence Files (SF), to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth) algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP) algorithm in efficiency and scalability.

Suggested Citation

  • Dawen Xia & Xiaonan Lu & Huaqing Li & Wendong Wang & Yantao Li & Zili Zhang, 2018. "A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data," Complexity, Hindawi, vol. 2018, pages 1-16, January.
  • Handle: RePEc:hin:complx:2818251
    DOI: 10.1155/2018/2818251
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/8503/2018/2818251.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/8503/2018/2818251.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2018/2818251?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Robert Ross & Philip Carns & David Metheny, 2009. "Parallel File Systems," International Series in Operations Research & Management Science, in: Yupo Chan & John Talburt & Terry M. Talley (ed.), Data Engineering, chapter 8, pages 143-168, Springer.
    2. Vivien Marx, 2013. "The big challenges of big data," Nature, Nature, vol. 498(7453), pages 255-260, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Maryam Barzegar & Abolghasem Sadeghi-Niaraki & Maryam Shakeri & Soo-Mi Choi, 2019. "An Improved Route-Finding Algorithm Using Ubiquitous Ontology-Based Experiences Modeling," Complexity, Hindawi, vol. 2019, pages 1-15, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lin Zhu & Xiantao Liu & Sha He & Jun Shi & Ming Pang, 2015. "Keywords co-occurrence mapping knowledge domain research base on the theory of Big Data in oil and gas industry," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(1), pages 249-260, October.
    2. Zhang, Yi & Huang, Ying & Porter, Alan L. & Zhang, Guangquan & Lu, Jie, 2019. "Discovering and forecasting interactions in big data research: A learning-enhanced bibliometric study," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 795-807.
    3. Stefano Bianchini & Moritz Müller & Pierre Pelletier, 2022. "Artificial intelligence in science: An emerging general method of invention," Post-Print hal-03958025, HAL.
    4. Jun Feng & Zhenting Li & Shizhen Zhang & Chun Bao & Jingxian Fang & Yun Yin & Bolei Chen & Lei Pan & Bing Wang & Yu Zheng, 2023. "A Microimage-Processing-Based Technique for Detecting Qualitative and Quantitative Characteristics of Plant Cells," Agriculture, MDPI, vol. 13(9), pages 1-16, September.
    5. Tang, Ming & Liao, Huchang, 2021. "From conventional group decision making to large-scale group decision making: What are the challenges and how to meet them in big data era? A state-of-the-art survey," Omega, Elsevier, vol. 100(C).
    6. Janssen, Marijn & van der Voort, Haiko & Wahyudi, Agung, 2017. "Factors influencing big data decision-making quality," Journal of Business Research, Elsevier, vol. 70(C), pages 338-345.
    7. Haitham Nobanee & Mehroz Nida Dilshad & Mona Al Dhanhani & Maitha Al Neyadi & Sultan Al Qubaisi & Saeed Al Shamsi, 2021. "Big Data Applications the Banking Sector: A Bibliometric Analysis Approach," SAGE Open, , vol. 11(4), pages 21582440211, December.
    8. Reza Farrahi Moghaddam & Fereydoun Farrahi Moghaddam & Mohamed Cheriet, 2014. "A Multi-Entity Input Output (MEIO) Approach to Sustainability - Water-Energy-GHG (WEG) Footprint Statements in Use Cases from Auto and Telco Industries," Papers 1404.6227, arXiv.org, revised Apr 2014.
    9. Yoshiyuki Ogata & Kazuto Mannen & Yasuto Kotani & Naohiro Kimura & Nozomu Sakurai & Daisuke Shibata & Hideyuki Suzuki, 2018. "ConfeitoGUI: A toolkit for size-sensitive community detection from a correlation network," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-18, October.
    10. S. Vijayakumar Bharathi, 2017. "Prioritizing and Ranking the Big Data Information Security Risk Spectrum," Global Journal of Flexible Systems Management, Springer;Global Institute of Flexible Systems Management, vol. 18(3), pages 183-201, September.
    11. Jonathan E Butner & Ascher K Munion & Brian R W Baucom & Alexander Wong, 2019. "Ghost hunting in the nonlinear dynamic machine," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-21, December.
    12. Subhroshekhar Ghosh & Soumendu Sundar Mukherjee, 2022. "Learning with latent group sparsity via heat flow dynamics on networks," Papers 2201.08326, arXiv.org.
    13. J. Lars Kirkby & Dang H. Nguyen & Duy Nguyen & Nhu N. Nguyen, 2022. "Inversion-free subsampling Newton’s method for large sample logistic regression," Statistical Papers, Springer, vol. 63(3), pages 943-963, June.
    14. Zbysław Dobrowolski, 2021. "Internet of Things and Other E-Solutions in Supply Chain Management May Generate Threats in the Energy Sector—The Quest for Preventive Measures," Energies, MDPI, vol. 14(17), pages 1-11, August.
    15. Matteo Fontana & Massimo Tavoni & Simone Vantini, 2019. "Functional Data Analysis of high-frequency load curves reveals drivers of residential electricity consumption," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-16, June.
    16. Lu Jiang & Xinyu Kang & Shan Huang & Bo Yang, 2022. "A refinement strategy for identification of scientific software from bioinformatics publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(6), pages 3293-3316, June.
    17. Gamermann, Daniel & Antunes, Felipe Leite, 2018. "Statistical analysis of Brazilian electoral campaigns via Benford’s law," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 496(C), pages 171-188.
    18. Alberto Fernández & Sara Río & Abdullah Bawakid & Francisco Herrera, 2017. "Fuzzy rule based classification systems for big data with MapReduce: granularity analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(4), pages 711-730, December.
    19. Akshay Mani & Resmi Ravindran & Soujanya Mannepalli & Daniel Vang & Paul A Luciw & Michael Hogarth & Imran H Khan & Viswanathan V Krishnan, 2015. "Data Mining Strategies to Improve Multiplex Microbead Immunoassay Tolerance in a Mouse Model of Infectious Diseases," PLOS ONE, Public Library of Science, vol. 10(1), pages 1-19, January.
    20. Felwa Abukhodair & Wafaa Alsaggaf & Amani Tariq Jamal & Sayed Abdel-Khalek & Romany F. Mansour, 2021. "An Intelligent Metaheuristic Binary Pigeon Optimization-Based Feature Selection and Big Data Classification in a MapReduce Environment," Mathematics, MDPI, vol. 9(20), pages 1-14, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:complx:2818251. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.