IDEAS home Printed from https://ideas.repec.org/getdata.html
 

RePEc: getting the metadata

RePEc is highly decentralized and pulled together from many sources. It can thus be quite difficult to get all you need even though it is freely available. This document should guide you to get what you need. Some data is available on request only, in particular when privacy concerns come into play. Email addresses are released under no circumstances.

Note that the archives participating in RePEc as well as the people volunteering with RePEc do so with the understanding that the collected data will be put to good use. This does not include commercial use. If you want to use RePEc data for commercial use, please first contact RePEc. Typically, we would require substantial contributions of data to RePEc for a commercial use to have a chance of being tolerated.

We want to discourage you strongly to scrape the data from the websites. This put unnecessary strain on our servers, and we have repeatedly noticed misconfigured scrping scripts running amok. And you very unlikely to get complete data that way.

Principles

The basic metadata is provided by publishers. Every RePEc services gets the metadata directly from the publishers and massages it in various ways for the users. Some services then provide additional data and make it available.

Anybody interested in handling RePEc metadata should familiarize oneself with the Guilford Protocol, which defines how the datafiles can be found and are structures, and the ReDIF format, which defines the metadata fields and conventions.

Publisher data

Each publishers holds its RePEc metadata on its web or anonymous ftp server. The addresses are listed in their archive templates. All those templates are listed at the RePEc:all archive. This is the standard way to acquire the core RePEc data: get the metadata from each of the publisher archives. The remi software is very useful in acquiring this metadata from all the archives. ReDIF-perl is useful to interpret the data.

One can also access the all the data in one place. There is, however, no guarantee that this is accurate or up-to-date. Only the publisher archives can guarantee that:

  1. ReDIF format
  2. AMF format
  3. OAI/PMH (sometimes flaky)
  4. Rsync

Person data

Basic metadata about people registered through the RePEc Author Service is available through the RePEc:per archive. Note that it does not contain citation data and needs to have the full contents of RePEc metadata to be interpretable, as it contains RePEc handles throughout.

Any additional person data is subject to privacy requirements.

Citation data

The citation data from the CitEc project can be obtained in two ways:
  1. AMF format
  2. plain lists of handles

Ranking data

The various impact factors for journals and series are available for all years and last ten years. Download and abstract views numbers can be found at LogEc. Instructions for programmatic access to this data is here. The big files with historic data for the latter for each month are here. Additional ranking data is available at IDEAS, including historic data. Note that due to privacy concerns, data beyond what can be "screenscraped" is only available on request, and generally only in anonymized form for research purposes. We will work with national bodies if they want to use RePEc rankings for evaluation purposes.

Other data

There are two sources if you want to know which paper has been disseminated through which NEP report: the first and the second.

The data output from the CollEc project is here.

We link different versions of the same work to each other. The database with those links is found here.

Handles are supposed to be permanent, but sometimes series or journals move to a different archives. To translate the handles, you want to use this.

The EconPapers syntax checker reports on syntax errors and warnings for ReDIF templates. It does also URL checks for all links in the templates, look for results by archive in the files starting with "url_" in this directory.

There is more compiled data, but it may not be available because it has never been requested.

API

An API is now available. This tool is meant to be a substitute if the above do not work. For example, an API is not good to download all the data, but rather specific slices at regular intervals or repeated quick calls for small bits of data. An API limited to citation and reference data is also available through CitEc.
IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.