The Natural Asset Register Data Portal (NAR:DP) is a website funded by the Scottish Government that provides open access to research outputs related to Scotland's natural assets.

The spatial datasets listed on the portal are produced by researchers in six Scottish Environment, Food and Agricultural Research Institutes (SEFARI). The portal is built on the CKAN platform and hosted by the James Hutton Institute (JHI).

According to the portal's metadata catalogue, all but one of the 45 research outputs are available for re-use on open terms, because they are covered by an open licence or in the public domain:

I have downloaded each of the datasets, looked at the documentation, and put my notes into a spreadsheet.

In summary, I found that:

  • 7 records only make data available as a web mapping service (WMS) layer,
  • 2 records do not make any data available,
  • 36 records make data available for download,
  • 35 records make data available in an open format, and
  • only 6 records make data available with an open licence or status.

In other words, there is far less open data available from the Natural Asset Register Data Portal than the metadata suggests.


Conditions for open data

'Open data' is defined in the Open Definition, which is maintained by the Open Knowledge Foundation. (OKF is also the 'purpose trustee' for the CKAN project.)

Among other criteria, the Open Definition requires that data must be available in an open format, and provided under an open licence – or be in the public domain, i.e. not covered by any exclusive intellectual property rights.

Data published in an open format may be 'open access', but that is not sufficient to make it open data. Open data requires both open access and open re-use.

SEFARI is plainly aware of the Open Definition. All catalogue records on the Natural Asset Register Data Portal that are labelled with either with specific open licences or as 'Other (Public Domain)' or 'Other (Open)' include one of OKF's open data buttons and a link to the Open Definition.


Non-licensing issues

The Natural Asset Register Data Portal publishes a range of interesting and useful spatial datasets. Most of the data is packaged as downloads as well as served as WMS layers. In some cases that is not obvious from the metadata records – the user has to click the WMS preview to discover the download link.

Most datasets are available in file formats that conform to the Open Definition. The exception is a collection of spatial layers from Glensaugh, one of JHI's research farms, which are only provided in ESRI's proprietary Package Layer (.lpkx) format.

Two catalogue records do not provide any data. One of those links to an ArcGIS storymap about Glentrool, a biosphere community in the Galloway Forest.

The other record relates to a study output, Functional and epiphytic biodiversity differences between nine tree species in the UK. The dataset is described as embargoed and to be "made available by 1 November 2021 at the latest". Using Google, I found the dataset published on CEH's EIDC portal under the Open Government Licence.


Licensing errors

One dataset is clearly labelled as non-open: Enhancing agrobiodiversity through participatory research. The licence is given as Creative Commons Non-Commercial (Any), with a link to the Attribution-NonCommercial 2.0 Generic licence (CC BY-NC 2.0).

However, according to the data file itself, the dataset is covered by the Attribution-NonCommercial-ShareAlike licence (CC BY-NC-SA). That is also a non-commercial licence but includes further conditions.

One dataset is labelled incorrectly as re-usable under the UK Open Government Licence: Daily streamflow data from gauge stations on main rivers of Scotland (1947-2017). The dataset contains records downloaded from CEH's National River Flow Archive website and is therefore subject to the NRFA data licence. The NRFA licence is not open, as it is not perpetual.


Lack of information

Seven catalogue records provide data for download without any information about the terms of re-use. For example, Areas of likely high exposure to radon in groundwater drinking water supplies in Scotland.

Most of those datasets are labelled as 'Other (Public Domain)' but it seems unlikely they are free of copyright. Without more information on the provenance, it would be unwise to re-use those datasets for any serious business purpose.


The James Hutton Institute "Open Data Licence"

22 of the catalogue records describe datasets that are covered by the 'James Hutton Institute Open Data Licence'. This is a bespoke licence that incorporates terms from v2.0 of the Open Government Licence.

OGL v2.0 was deprecated in 2014, but remains conformant with the Open Definition.

The JHI's licence, however, is not conformant with the Open Definition and is not an open data licence.

The licence contains various additional conditions that effectively restrict re-use of the data in some fields of endeavour.

In particular, the terms require licensees to notify JHI when they create new data using JHI's data, and to license any derived data back to JHI:

4.2 You agree to let the Institute know of any data which have been created by you using the Data including (without limitation) any adaptation of the Data but excluding mere replication or reproduction of the Data (Derived Data), by email to: soils@hutton.ac.uk or writing to: Head of Research Support, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA . This will assist the Institute to build a database of applications for the Data and to help improve the Institute's understanding of the uses of the Data.

4.3 If the Institute requests, you agree to provide a copy of any Derived Data to the Institute, to the address in section 4.2, and you hereby grant the Institute an irrevocable, royalty-free, non-exclusive, non-transferable licence to use any Derived Data for the purposes of building a database of applications for the Data and to help improve the Institute's understanding of the uses of the Data. The Institute would only share any Derived Data with third parties with your prior written permission.

Those conditions will be unacceptable in some re-use cases, where work with the data may be commercially confidential or the licensee may not wish to disclose it to JHI for other reasons.


The actual open data

Five datasets on the portal are covered by CC-BY, a conformant open licence. Four of those are open data – the fifth, Peat Surveys, is only available as a WMS layer.

Another dataset, Land Cover, is a collection of mapping outputs derived from the 1988 Land Cover of Scotland project (LCS88). The files are packaged with a copy of the old OS OpenData Licence. I am somewhat sceptical that Ordnance Survey is the only owner of IP rights in the data. However, the OS OpenData Licence is conformant with the Open Definition.

The sixth open dataset is Restoration of portion of Logie Burn, Scotland (2011 to 2014). Although the catalogue record itself does not specify an open licence, it links to further metadata on the PANGAEA website that identifies the licence as CC-BY.


Recommendations

SEFARI should review the metadata catalogue on the Natural Asset Register Data Portal and update the content to make the licensing status of some datasets clearer. This will increase the potential for re-use of the data.

The James Hutton Institute should adopt a recognised open data licence that is conformant with the Open Definition, such as the Creative Commons Attribution licence (CC-BY) or the Open Government Licence. JHI's bespoke licence is not an open data licence. Replacing it will make the majority of the datasets on the portal open data. Adopting a standard licence will also increase the interoperability of JHI's datasets with spatial data from other open sources.