IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v6y2014i10p6529-6552d40740.html
   My bibliography  Save this article

A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names

Author

Listed:
  • Dongyang Hou

    (School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China
    National Geomatics Center of China, 28 Lianhuachi West Road, Beijing 100830, China)

  • Hao Wu

    (National Geomatics Center of China, 28 Lianhuachi West Road, Beijing 100830, China)

  • Jun Chen

    (School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China
    National Geomatics Center of China, 28 Lianhuachi West Road, Beijing 100830, China)

  • Ran Li

    (National Geomatics Center of China, 28 Lianhuachi West Road, Beijing 100830, China)

Abstract

Place name is an important ingredient of borderlands situation information and plays a significant role in collecting them from the Internet with focused crawlers. However, current focused crawlers treat place name in the same way as any other common keyword, which has no geographical properties. This may reduce the effectiveness of focused crawlers. To solve the problem, this paper firstly discusses the importance of place name in focused crawlers in terms of location and spatial relation, and, then, proposes the two-tuple-based topic representation method to express place name and common keyword, respectively. Afterwards, spatial relations between place names are introduced to calculate the relevance of given topics and webpages, which can make the calculation process more accurately. On the basis of the above, a focused crawler prototype for borderlands situation information collection is designed and implemented. The crawling speed and F-Score are adopted to evaluate its efficiency and effectiveness. Experimental results indicate that the efficiency of our proposed focused crawler is consistent with the polite access interval and it could meet the daily demand of borderlands situation information collection. Additionally, the F-Score value of our proposed focused crawler increases by around 7%, which means that our proposed focused crawler is more effective than the traditional best-first focused crawler.

Suggested Citation

  • Dongyang Hou & Hao Wu & Jun Chen & Ran Li, 2014. "A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names," Sustainability, MDPI, vol. 6(10), pages 1-24, September.
  • Handle: RePEc:gam:jsusta:v:6:y:2014:i:10:p:6529-6552:d:40740
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/6/10/6529/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/6/10/6529/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 3.
    2. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 2.
    3. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 2.
    4. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 4.
    5. Hao Hu & Yuejing Ge & Dongyang Hou, 2014. "Using Web Crawler Technology for Geo-Events Analysis: A Case Study of the Huangyan Island Incident," Sustainability, MDPI, vol. 6(4), pages 1-17, April.
    6. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 4.
    7. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 3.
    8. Editorial Article, 0. "The Information for Authors," Economics of Contemporary Russia, Regional Public Organization for Assistance to the Development of Institutions of the Department of Economics of the Russian Academy of Sciences, issue 1.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kritana Prueksakorn & Cheng-Xu Piao & Hyunchul Ha & Taehyeung Kim, 2015. "Computational and Experimental Investigation for an Optimal Design of Industrial Windows to Allow Natural Ventilation during Wind-Driven Rain," Sustainability, MDPI, vol. 7(8), pages 1-22, August.
    2. Hualin Xie & Jinlang Zou & Hailing Jiang & Ning Zhang & Yongrok Choi, 2014. "Spatiotemporal Pattern and Driving Forces of Arable Land-Use Intensity in China: Toward Sustainable Land Management Using Emergy Analysis," Sustainability, MDPI, vol. 6(6), pages 1-17, May.
    3. Stephan E. Maurer & Andrei V. Potlogea, 2021. "Male‐biased Demand Shocks and Women's Labour Force Participation: Evidence from Large Oil Field Discoveries," Economica, London School of Economics and Political Science, vol. 88(349), pages 167-188, January.
    4. Tie Hua Zhou & Ling Wang & Keun Ho Ryu, 2015. "Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation," Sustainability, MDPI, vol. 7(5), pages 1-18, May.
    5. T. Karski, 2019. "Opinions and Controversies in Problem of The So-Called Idiopathic Scoliosis. Information About Etiology, New Classification and New Therapy," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 12(5), pages 9612-9616, January.
    6. Sung-Won Park & Sung-Yong Son, 2017. "Cost Analysis for a Hybrid Advanced Metering Infrastructure in Korea," Energies, MDPI, vol. 10(9), pages 1-18, September.
    7. Wesley Mendes-da-Silva, 2020. "What Makes an Article be More Cited?," RAC - Revista de Administração Contemporânea (Journal of Contemporary Administration), ANPAD - Associação Nacional de Pós-Graduação e Pesquisa em Administração, vol. 24(6), pages 507-513.
    8. Martin Valtierra-Rodriguez & Juan Pablo Amezquita-Sanchez & Arturo Garcia-Perez & David Camarena-Martinez, 2019. "Complete Ensemble Empirical Mode Decomposition on FPGA for Condition Monitoring of Broken Bars in Induction Motors," Mathematics, MDPI, vol. 7(9), pages 1-19, August.
    9. Akca Yasar & Gokhan Ozer, 2016. "Determination the Factors that Affect the Use of Enterprise Resource Planning Information System through Technology Acceptance Model," International Journal of Business and Management, Canadian Center of Science and Education, vol. 11(10), pages 1-91, September.
    10. Julián Miranda & Angélica Flórez & Gustavo Ospina & Ciro Gamboa & Carlos Flórez & Miguel Altuve, 2020. "Proposal for a System Model for Offline Seismic Event Detection in Colombia," Future Internet, MDPI, vol. 12(12), pages 1-17, December.
    11. Wisdom Akpalu & Mintewab Bezabih, 2015. "Tenure Insecurity, Climate Variability and Renting out Decisions among Female Small-Holder Farmers in Ethiopia," Sustainability, MDPI, vol. 7(6), pages 1-16, June.
    12. Wei Chen & Shu-Yu Liu & Chih-Han Chen & Yi-Shan Lee, 2011. "Bounded Memory, Inertia, Sampling and Weighting Model for Market Entry Games," Games, MDPI, vol. 2(1), pages 1-13, March.
    13. David Harborth & Sebastian Pape, 2020. "Empirically Investigating Extraneous Influences on the “APCO” Model—Childhood Brand Nostalgia and the Positivity Bias," Future Internet, MDPI, vol. 12(12), pages 1-16, December.
    14. Ping Wang & Jie Wang & Guiwu Wei & Cun Wei, 2019. "Similarity Measures of q-Rung Orthopair Fuzzy Sets Based on Cosine Function and Their Applications," Mathematics, MDPI, vol. 7(4), pages 1-23, April.
    15. Peterson, Willis L., 1973. "Publication Productivities Of U.S. Economics Department Graduates," Staff Papers 14105, University of Minnesota, Department of Applied Economics.
    16. Taeyeoun Roh & Yujin Jeong & Byungun Yoon, 2017. "Developing a Methodology of Structuring and Layering Technological Information in Patent Documents through Natural Language Processing," Sustainability, MDPI, vol. 9(11), pages 1-19, November.
    17. He-Yau Kang & Amy H. I. Lee & Tzu-Ting Huang, 2016. "Project Management for a Wind Turbine Construction by Applying Fuzzy Multiple Objective Linear Programming Models," Energies, MDPI, vol. 9(12), pages 1-15, December.
    18. Vasilyeva, Olga, 2021. "Agro-food clusters in the Republic of Kazakhstan: assessment and prospects of development," Economic Consultant, Roman I. Ostapenko, vol. 34(2), pages 13-20.
    19. Chris Lytridis & Anna Lekova & Christos Bazinas & Michail Manios & Vassilis G. Kaburlasos, 2020. "WINkNN: Windowed Intervals’ Number kNN Classifier for Efficient Time-Series Applications," Mathematics, MDPI, vol. 8(3), pages 1-14, March.
    20. Richard J. Ciotola & Jay F. Martin & Juan M. Castańo & Jiyoung Lee & Frederick Michel, 2013. "Microbial Community Response to Seasonal Temperature Variation in a Small-Scale Anaerobic Digester," Energies, MDPI, vol. 6(10), pages 1-18, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:6:y:2014:i:10:p:6529-6552:d:40740. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.