Owen Boswarva's blog

"Openwashing" was identified as a marketing phenomenon by Michelle Thorne in 2009, and defined by Audrey Watters in 2012 as: "having an appearance of open-source and open-licensing for marketing purposes, while continuing proprietary practices." Watters expanded on this idea in a presentation at OpenCon 2014.

The risk of openwashing has been raised in many open contexts – open source, open access, open government, and so on. However openwashing can sometimes be difficult to recognise because the concept of "open" itself has no consistent meaning across those contexts.

Broadly speaking, openwashing is a pejorative term applied when an organisation seeks kudos from association with an open agenda without meeting mininum criteria or making a sincere commitment to that agenda.

Within open data, deliberate openwashing is usually easy to recognise because open data has a widely-adopted definition. However openwashing can also arise unintentionally from a lack of familiarity with open data requirements.

Openwashed data is sometimes referred to as "fauxpen data" (though not very often).

This post highlights recent examples of organisations misrepresenting their datasets and initiatives as "open data".

Image: NOT OPEN DATA stamp

Microsoft

In June Microsoft launched Microsoft Research Open Data, "a new data repository in the cloud dedicated to facilitating collaboration across the global research community."

However most of the free datasets available from Microsoft's repository are licensed under the Microsoft Research Data License Agreement. This agreement is for non-commercial use only, time-limited, doesn't allow modification or redistribution of the data – and gives Microsoft rights to commercially exploit derivative works.

The remaining datasets are licensed under the Linux Foundation's Community Data License Agreement - Permissive. The CDLA licences were launched last year and although the Linux Foundation intends them to be suitable for open data, they are not currently listed as conformant with the Open Definition – and may not be. (The privacy language in Sections 4 and 7 looks potentially problematic to me.)

Microsoft again

In September Microsoft joined with Adobe and SAP to announce an "Open Data Initiative" that has even less to do with open data. This initiative seems to be simply a corporate partnership aimed at making it easier for clients to share their own data and customer data across systems, securely and with some AI-flavoured business intelligence functions.

DigitalGlobe

Earth observation is an area in which the benefits of open data have been widely realised, through publicly funded satellite data programmes such as Landsat and Copernicus. This presents a marketing challenge for commercial providers of higher-resolution satellite data.

DigitalGlobe operates an "Open Data Program" that provides free imagery to support disaster recovery in the wake of natural disasters such as the recent Hurricane Michael landfall in Florida. While the objectives of this programme are laudable, the imagery isn't open as it supports non-commercial use only.

Bloomberg

In September Bloomberg announced the launch of Enterprise Access Point, an "online Open Data and Linked Data Platform" that "provides normalized reference, pricing, regulatory and historical datasets for Bloomberg Data License clients."

This platform has no obvious connection to open data and seems to be simply an API-based delivery system for Bloomberg's proprietary datasets.

Open Banking Limited

Last year Open Banking Limited, a UK banking industry entity set up to support compliance with reforms mandated by the CMA, launched an "Open Data Licence" to cover re-use of financial product information and reference information such as ATM locations. However neither the datasets nor the licence are remotely open.

I've written about this in more detail in an early post. As the banking industry had input from the Open Data Institute in planning this initiative, it's unlikely that this example of openwashing is inadvertent.

Canal & River Trust

The Canal & River Trust invites us to use its "open" GIS data, available for download from a portal built on Esri's Open Data platform. However the datasets are all covered by either an INSPIRE End User Licence or the Trust's own data licence, neither of which are open licences.

Earlier versions of some of the same datasets were available under the Open Government Licence prior to the privatisation of British Waterways in 2012.

Individual datasets

Sometimes organisations present individual datasets as "open data" when publishing them under non-open licences. In such cases it's often difficult to tell whether this is deliberate or simply from a lack of awareness. Publishers may think that all Creative Commons licences are open licences, or even that the act of making data free to download on the web is enough to meet open data criteria.

For example Open University's Listening Experience Database is described as "linked open data" when the licence is CC BY-NC-SA, and CEH's Integrated Hydrological Units of the United Kingdom dataset was incorrectly described as open data when it was released in 2015. Those may just be errors.

The big picture

The above are all examples of real or apparent openwashing at the project or dataset level. But openwashing can also be argued as a criticism of data policy or strategy.

When a government signs up to the principle of "open by default" but excludes much of its national information infrastructure, isn't that openwashing?

What about when a data-rich public authority talks about its "open approach" to data while publishing only relatively minor datasets under an open licence?

In general I think the open data community should be vigilant about openwashing and be prepared to call out organisations that use it cynically as a marketing tool.

However openwashing may also be seen as evidence that the open data "brand" is becoming more widely recognised, and worth cultivating if not co-opting.