slanted W3C logo

OpenAPC: Technical Background

Christoph Broschinski, Bielefeld University Library, <broschinski@uni-bielefeld.de>

This presentation is also available at http://www.ub.uni-bielefeld.de/~cbroschinski





Timeline

OpenAPC: Started in 2014 as a private endeavour by our colleague Najko Jahn (Project Manager at Bielefeld University Library)

First Participants: German University Libraries receiving funding for paying APCs from the Open Access Publishing Programme set up by the DFG (German Research Foundation)

Good opportunity to start an open data project on APCs:

2015: Application for DFG funding together with 2 partnering initiatives:

Funding granted (for 3 years, starting in October 2015), partners teamed up under new label INTACT

Data Storage and Schema

column description source input_required
institution Top-level organisation which covered the fee, e.g. Bielefeld University none mandatory
period Year of APC payment (YYYY) none mandatory
euro The amount that was paid in EURO. Includes VAT and additional fees none mandatory
doi Digital Object Identifier none mandatory
is_hybrid Has the article been published in a toll access journal? none mandatory
publisher Name of publication house that has charged the fee CrossRef optional
journal_full_title Full name of periodical that contains the article CrossRef optional
issn International Standard Serial Number. If more than one are available, collapse with ; CrossRef optional
issn_print International Standard Serial Number - print version CrossRef no
issn_electronic International Standard Serial Number - electronic version CrossRef no
license_ref License under which the research paper has been published CrossRef no
indexed_in_crossref checks if the contribution is registered with the DOI agency CrossRef CrossRef no
pmid id for metadata records indexed in Europe Pubmed Central (Europe PMC) Europe PMC no
pmcid id for articles available in Europe PubMed Central full text collection Europe PMC no
ut Web of Science unique item id Web of Science no
url URL to article if no DOI is available none optional
doaj Is the journal indexed in the Directory of Open Access Journals (DOAJ) ? DOAJ no

Institutions are required to follow the schema when contributing data. However:

Open APC schema "philosophy": Keep the set of mandatory fields as small as possible and infer as much data as possible automatically from external services!

Data Submission Workflow

1) Institutions may submit their data in one of two ways:

2) Submitted files will usually require editing in Excel for minor corrections (Institution abbrevations, currency format, column header names)

3) Files are (re)exported to CSV format

4) Files are processed by our automatic metadata enrichment script

5) Contents of script output file can be directly copied into Open APC core data file!

Report Generation

The same principle is used to generate the blog posts on our github.io-Blog

Automated integrity testing

Letting a script generate the contents of the Open APC core data file will ensure its syntactic correctness. But what about semantic correctness?
For example:

Solution: Python-based testing script which will check the core data for these problems.

Advanced Visualisation and Analysis

Problem 1: The Open APC core data file is difficult to analyse without tool support.

Solution: An OLAP (Online Analytical Processing) server was set up to allow more easy querying of OpenAPC data.

Example: http://olap.intact-project.org/cube/openapc/aggregate?drilldown=publisher&cut=period:2014 (Meaning: "How much money did each individual publisher receive in 2014?")

Problem 2: The Open APC data is hard to explore / there are only static visualisations

Solution: The OLAP server is used as a backend for a treemap visualisation project

So I heard you want to start your own Open APC project...

Recommendations:

0. Have an initial group of institutions join and acquire some testing data to start with (Done)

1. Simply copy (most of) the structure and files of our GitHub repository

2. Define a data schema

3. Familiarise yourselves with the processing script, enrich some data and build an initial core data file

4. Adapt the README.Rmd template and use it to generate a custom REAMDE.md file from your data.

Questions?

Otherwise: Thanks for your attention...

...and let's get to work!