Christoph Broschinski, Bielefeld University Library,
<broschinski@uni-bielefeld.de>
This presentation is also available at http://www.ub.uni-bielefeld.de/~cbroschinski
OpenAPC: Started in 2014 as a private endeavour by our colleague Najko Jahn (Project Manager at Bielefeld University Library)
First Participants: German University Libraries receiving funding for paying APCs from the Open Access Publishing Programme set up by the DFG (German Research Foundation)
Good opportunity to start an open data project on APCs:
2015: Application for DFG funding together with 2 partnering initiatives:
Funding granted (for 3 years, starting in October 2015), partners teamed up under new label INTACT
column | description | source | input_required |
---|---|---|---|
institution | Top-level organisation which covered the fee, e.g. Bielefeld University | none | mandatory |
period | Year of APC payment (YYYY) | none | mandatory |
euro | The amount that was paid in EURO. Includes VAT and additional fees | none | mandatory |
doi | Digital Object Identifier | none | mandatory |
is_hybrid | Has the article been published in a toll access journal? | none | mandatory |
publisher | Name of publication house that has charged the fee | CrossRef | optional |
journal_full_title | Full name of periodical that contains the article | CrossRef | optional |
issn | International Standard Serial Number. If more than one are available, collapse with ; |
CrossRef | optional |
issn_print | International Standard Serial Number - print version | CrossRef | no |
issn_electronic | International Standard Serial Number - electronic version | CrossRef | no |
license_ref | License under which the research paper has been published | CrossRef | no |
indexed_in_crossref | checks if the contribution is registered with the DOI agency CrossRef | CrossRef | no |
pmid | id for metadata records indexed in Europe Pubmed Central (Europe PMC) | Europe PMC | no |
pmcid | id for articles available in Europe PubMed Central full text collection | Europe PMC | no |
ut | Web of Science unique item id | Web of Science | no |
url | URL to article if no DOI is available | none | optional |
doaj | Is the journal indexed in the Directory of Open Access Journals (DOAJ) ? | DOAJ | no |
Institutions are required to follow the schema when contributing data. However:
Open APC schema "philosophy": Keep the set of mandatory fields as small as possible and infer as much data as possible automatically from external services!
1) Institutions may submit their data in one of two ways:
2) Submitted files will usually require editing in Excel for minor corrections (Institution abbrevations, currency format, column header names)
3) Files are (re)exported to CSV format
4) Files are processed by our automatic metadata enrichment script
5) Contents of script output file can be directly copied into Open APC core data file!
R -e "knitr::knit('README.Rmd')"
from the Open APC main directory.The same principle is used to generate the blog posts on our github.io-Blog
Letting a script generate the contents of the Open APC core data file will ensure its syntactic correctness. But what about semantic correctness?
For example:
Solution: Python-based testing script which will check the core data for these problems.
Problem 1: The Open APC core data file is difficult to analyse without tool support.
Solution: An OLAP (Online Analytical Processing) server was set up to allow more easy querying of OpenAPC data.
Example: http://olap.intact-project.org/cube/openapc/aggregate?drilldown=publisher&cut=period:2014 (Meaning: "How much money did each individual publisher receive in 2014?")
Problem 2: The Open APC data is hard to explore / there are only static visualisations
Solution: The OLAP server is used as a backend for a treemap visualisation project
Recommendations:
0. Have an initial group of institutions join and acquire some testing data to start with (Done)
1. Simply copy (most of) the structure and files of our GitHub repository
2. Define a data schema
3. Familiarise yourselves with the processing script, enrich some data and build an initial core data file
4. Adapt the README.Rmd template and use it to generate a custom REAMDE.md file from your data.
Otherwise: Thanks for your attention...
...and let's get to work!