The untapped value of research data

Sometimes we lament the fact that UK investment into research and science
is a small fraction of GDP, but it still runs at £4.6Bn per annum [1].
And quite rightly the UK Research Councils [2] expect that the results of
this research, including underpinning research data, will be made openly
accessible where possible, both to enable others to build on the work done
and for the research to be repeatable and verifiable [3].

In some cases, national ‘data centres’ support the long-term availability
and reuse of research outputs, for example the UK Data Archive, the
Archaeology Data Service, and the British Atmospheric Data Centre.

A recent JISC sponsored cost/benefit analysis of some of these data
centres shows very clearly that these services have a very positive impact
[4]. To quote the report:

“Very significant increases in research, teaching and studying efficiency
were realised by the users as a result of their use of the data centres;

The value to users exceeds the investment made in data sharing and
curation via the centres in all three cases; and

By facilitating additional use, the data centres significantly increase
the measurable returns on investment in the creation/collection of the
data hosted.”

But in many disciplines there simply isn’t a national service.  Or, in the
case of the Arts and Humanities Data Service, it no longer exists.  It
then falls to an institution or individual researchers to manage and make
accessible their research outputs.  It’s up to the institution to somehow
achieve these benefits, either alone or in collaboration with others.  I
don’t envy anyone within an institution trying to do this in today’s tough
climate!

Contrast this with a report top the Australian National Data Service,
which showed the economic benefits of research data curation and access
can be huge [5].

“Our estimates suggest that the potential value of research data
repositories for Australia might be at least $1.8 billion and possibly up
to $5.5 billion per annum.”

“If current data curation and sharing is in the range of 10% to 20% of the
research data being produced, then some $1.4 billion to $4.9 billion in
annualised benefits may remain as yet unrealised.”

Back in the UK, even when there is a national data centre, less than 1.5%
of Research Council budgets go into supporting these services.  Not only
is there an apparent shortfall in investment into realising the value of
research data both nationally and within institutions, but there are
worrying consequences if research data isn’t properly curated, preserved
and shared.

A study in published in Current Biology [6] looked at 516 articles
published between 2 and 22 years ago, and found that the odds of a data
set being available fell by 17% per year.

“Our results reinforce the notion that, in the long term, research data
cannot be reliably preserved by individual researchers, and further
demonstrate the urgent need for policies mandating data sharing via public
archives.”

A much bigger study published in PLOS last year [7] showed that one in
five Science, Technology, and Medicine (STM) articles Suffers from
Reference Rot.  “For over one million references to web resources
extracted from over 3.5 million articles … we find one out of five STM
articles suffering from reference rot, meaning it is impossible to revisit
the web context that surrounds them some time after their publication.”

Worrying given that a detailed review [8] of biomedical and life-science
research articles indexed by PubMed as retracted revealed that 67.4% of
retractions were attributable to misconduct, including fraud or suspected
fraud (43.4%), duplicate publication (14.2%), and plagiarism (9.8%).   The
percentage of scientific articles retracted because of fraud has increased
∼ 10-fold since 1975.

Without the availability of data, how can this research be verified and
trusted, let alone shared and reused?

On the one hand, the economic benefits of research data repositories and
curation seem clear.  On the other hand, the consequences of not having
open, persistent and trustworthy data sources seem equally clear.  ‘Grist
to the mill’ as they say when building a case at all levels for more
investment into research data management and services.  With the CREST
RDMS project looking at new and cost effective ways to build these
services, it really does look like a great time to be involved in getting
this off the ground!

Matthew Addis
Chief Technology Officer

web:            www.arkivum.com
twitter:        @arkivum

…..

[1]
https://www.gov.uk/government/publications/2010-to-2015-government-policy-r
esearch-and-development/2010-to-2015-government-policy-research-and-develop
ment

[2] http://www.rcuk.ac.uk/

[3] http://www.rcuk.ac.uk/research/datapolicy/

[4] Jisc (2014): The value and impact of data sharing and curation –
synthesis of three recent UK studies.
http://repository.jisc.ac.uk/5568/1/iDF308_-_Digital_Infrastructure_Directi
ons_Report,_Jan14_v1-04.pdf

[5] Open Research Data Report to the Australian National Data Service
(ANDS). November 2014.
http://ands.org.au/resource/open-research-data-report.pdf

[6] The Availability of Research Data Declines Rapidly with Article Age.
Current Biology, 6 January 2014.
http://www.cell.com/current-biology/abstract/S0960-9822%2813%2901400-0

[7] Scholarly Context Not Found: One in Five Articles Suffers from
Reference Rot.  Plos One.  December 26, 2014 DOI:
10.1371/journal.pone.0115253
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253

[8] Misconduct accounts for the majority of retracted scientific
publications.  Proc. National Academy of Sciences, 2012.
http://www.pnas.org/content/109/42/17028.full