slanted W3C logo

OpenAPC - A contribution to transparency in fee-based Open Access publishing

Christoph Broschinski, Bielefeld University Library, <broschinski@uni-bielefeld.de>

This presentation is also available at http://www.ub.uni-bielefeld.de/~cbroschinski





OpenAPC in a nutshell

OpenAPC is an Open Data project focused on Article Processing Charges (APCs)

Key properties

Statistics (as of 09/16)

Timeline

2014: OpenAPC started as a private endeavour by our colleague Najko Jahn (Project Manager at Bielefeld University Library)

2015: Application for DFG funding together with 2 partnering initiatives:

Funding granted (for 3 years, starting in October 2015), partners teamed up under new label INTACT

General Structure

OpenAPC is designed as a git project and hosted on GitHub

Why git?

Why GitHub?

In addition, OpenAPC operates a blog on github pages to report recent data submissions.

Data: Schema, Submission and Storage

column description source input_required
institution Top-level organisation which covered the fee, e.g. Bielefeld University none mandatory
period Year of APC payment (YYYY) none mandatory
euro The amount that was paid in EURO. Includes VAT and additional fees none mandatory
doi Digital Object Identifier none mandatory
is_hybrid Has the article been published in a toll access journal? none mandatory
publisher Name of publication house that has charged the fee CrossRef optional
journal_full_title Full name of periodical that contains the article CrossRef optional
issn International Standard Serial Number CrossRef optional
issn_print International Standard Serial Number - print version CrossRef no
issn_electronic International Standard Serial Number - electronic version CrossRef no
issn_l Linking International Standard Serial Number ISSN International Centre no
license_ref License under which the research paper has been published CrossRef no
indexed_in_crossref checks if the contribution is registered with the DOI agency CrossRef CrossRef no
pmid id for metadata records indexed in Europe Pubmed Central (Europe PMC) Europe PMC no
pmcid id for articles available in Europe PubMed Central full text collection Europe PMC no
ut Web of Science unique item id Web of Science no
url URL to article if no DOI is available none optional
doaj Is the journal indexed in the Directory of Open Access Journals (DOAJ) ? DOAJ no

Institutions are required to follow the schema when contributing data. However:

Institutions may submit their data in one of two ways:

Open Data: The next Level

Managing a large data aggregation project like OpenAPC is time-consuming and prone to mistakes. What techniques can be employed to reduce the workload and improve results?

Problem 1: Submitted files are often inconsistent, yet they have to be enriched with metadata from several external sources

Problem 2: Statistics and Numbers on the GitHub front page have to be kept up-to-date whenever the core data file changes.

Problem 3: Even with the enrichment script taking care of the syntax, every submission may still introduce semantical errors into the core data set

Exploration and reuse: The OLAP server

The core data file is in CSV format. While this makes the data platform independent and easily processable by software tools, it is not really suited for data exploration or querying.

Examples:

Visualisation: Interactive treemaps

OLAP is a mighty tool for data analysis, but it still unsuited to gain a simple overview over the data or identify interesting facts.

To enable a visual approach for data exploration, a web site providing treemap visualisations was set up: http://treemaps.intact-project.org

Further plans & lookout

More participants, more data

Increase reuse options