Owen Boswarva's blog

This is the first in a series of posts about open data in biodiversity. All views are personal.

For most of the past year I've been acting as open data lead at the Joint Nature Conservation Committee (JNCC). JNCC is a public body that advises the UK Government and devolved administrations on UK-wide and international nature conservation.

JNCC and the UK's statutory nature conservation bodies have a shared ambition to improve the open availability of biodiversity data by 2020.

In May JNCC produced an Open Data Policy and released it for re-use under the Open Government Licence. This month JNCC also published an inventory of data assets.

I'll talk more about JNCC datasets and areas of work in future posts. In this post I want to make some general observations about the potential for open data in biodiversity.

Data flows and supply chains

Biodiversity has particularly complex data flows. Data is captured, quality assured, curated, aggregated, analysed, and disseminated for use – and will often change hands at each step in that flow.

Parties involved include individual recorders (who are often volunteers or contributors to citizen science schemes), local record centres, academics, government agencies, charities and other non-governmental organisations (NGOs), industry bodies, and repositories.

For useful diagrams of data flows in biodiversity see this GiGL post from 2014 on butterfly data in London, and this 2011 report for NBN on marine data.

Most of this activity needs to be funded, which means the data flows also represent supply chains. Funding determines what data gets collected, where it goes, and what it will be used for. In the UK much but not all of the funding for collection of biodiversity data comes from government.

Squaring the circle in partnership agreements

This dynamic has a number of implications for open data in biodiversity.

Different types of organisation have different motives for releasing data (or not releasing it). When an organisation is funded by the taxpayer we can make the argument that their data belongs to the public. The principle of "open by default" is endorsed in government policy and, to some extent, by law.

Businesses and NGOs, on the other hand, have to find their own reasons – and they are likely to be sector specific. Within biodiversity the strongest arguments for open data are probably public engagement, verifiability, and the persistence of the scientific record.

However government mostly controls the funding for collection of biodiversity data. Within biodiversity there are strong signs of a "second wave" of policy on open data. Public bodies are no longer content with making their own data open – they increasingly expect NGOs to release any data they collect or compile with public funding.

This can be a complex negotiation, particularly as some partnership agreements between government and NGOs are long-standing.

NGOs are not necessarily opposed to open data, of course. Some do continue to make their data available only by request or under non-commercial licences, without understanding the effect that has on the dissemination and impact of their data as scientific evidence. However the complex nature of data flows within biodiversity makes it easy to realise common benefits from open data. Web downloads and permissive licensing remove a lot of friction from the process of sharing data between multiple organisations.

Some things that need doing

These are a few of the challenges we face in opening up biodiversity data.

Some biodiversity datasets have been knocking around for decades and nobody is quite sure who owns them. Within the field that hasn't always been a problem, because much sharing of data is informal. But open data involves sharing of data under licence, for a wide range of purposes. Some of those purposes are only viable if the data has clear provenance and ownership. Better record-keeping and a keener understanding of intellectual property rights are key to the growth of open data in biodiversity.

UK biodiversity is over-reliant on flows that accumulate data in a few hubs to facilitate public access – the NBN Atlas in particular. The wider open data community is moving away from this model, in favour of an approach based on multiple sources of data and unified by common data standards that facilitate discovery through normal web searches. We need more organisations to release data themselves, at every stage of collection and aggregation. We also need fewer sites with access controls, and simpler interfaces for downloading data.

Data protection is a concern. The biodiversity field has always been alert to the importance of safeguarding data that is environmentally sensitive. However, as in other sectors, the privacy aspects of traditional data handling practices have come under fresh scrutiny with the introduction of GDPR. Biodiversity is also an area where there is substantial potential for use of new technologies, such as earth observation and drones, that have privacy considerations.

It's important to judge the audience when talking about open data. In the wider community we like to talk about "data as infrastructure", as a counterpoint to extractive metaphors such as "data is the new oil". The oil metaphor has never been popular in biodiversity data, for obvious reasons. But I'm not sure infrastructure resonates well either. I'm learning to think of data in terms of ecosystems – susceptible to damage, but perpetuated by complex inter-dependencies.

Can we make biodiversity data more open?

Data flows and supply chains

Squaring the circle in partnership agreements

Some things that need doing

Earlier posts