ODUG benefits case for open data release of an authoritative GP dataset

Post: 22 July 2014

Updated: 24 July 2014 (see bottom of the post)

This week the Open Data User Group has published a benefits case arguing for open data release of an “authoritative” GP dataset.

ODUG calls on the Department of Health to organise an open dataset of all GP and dental practices, to include practice details, opening times, location, contact details, patient acceptance criteria, and a list of individual practitioners.

Geographic coverage is not mentioned, but as the call is to DoH I’m assuming ODUG is focused only (or at least mainly) on the data for England.

This is the first new benefits case from ODUG since last summer (list of previous benefits cases), so it’s worth taking a look at both the case itself and the related blog post by Giuseppe Sollazzo. My comments are below.

Existing Datasets

The current best sources for core bulk data on GP practices in England (codes, addresses, contacts, etc.) are:

Downloads from the HSCIC site
Downloads from the NHS Choices site
Download from the CQC site

Those datasets are all reusable under the Open Government Licence, i.e. they are open data.

Several side points before I get into the substance of the ODUG case:

1. NHS Choices staff are employed by HSCIC, so the first two datasets are effectively the responsibility of the same public authority. However there are substantial differences between the datasets as they reflect the underlying purposes for which they are maintained.

2. The ODUG criticises the NHS Choices dataset as follows:

“the branding of the NHS Choices dataset as a ‘Freedom Of Information’ dataset is troubling from an Open Data perspective, mainly for is "on demand” nature: a FOI data release, being a reactive response to a request, does not establish an ongoing process; while data release under an Open licence often comes proactively from the publishing entity, which in doing so creates a sustainable data update procedure".

I think this is rather over the top. NHS Choices hasn’t “branded” the data as a FOI dataset. It has merely made it available, along with a number of other useful data files, in the FOI section of its site. It would be nice if the NHS Choices site also had a dedicated open data landing page. However it’s perfectly sensible to draw users’ attention to existing datasets that they may want to know about before submitting a FOI request. NHS Choices says the data files are updated daily, so they are clearly not being published as a “reactive response” to FOI requests.

3. ODUG maintains that the GP practices data on the HSCIC site is not open data, and points to a page about “responsibilities in using the ODS data”. However HSCIC has recorded that dataset (EGPCUR) on Data.gov.uk as reusable under the OGL. (The ODS “responsibilities” page seems to written for NHS users. A literal reading only permits use of the data in connection with NHS-related activities, which is obviously not the actual licensing position.)

It’s also worth noting that elsewhere on its site HSCIC publishes an open dataset of practice codes, names and addresses as part of its monthly release of GP prescribing data.

Why are the HSCIC/NHS Choices datasets not “authoritative”?

There’s nothing wrong with arguing that existing datasets could be made more useful by improving the quality, or updating them more frequently, or appending data from other sources.

But we can have those arguments about most of the nation’s information infrastructure. A dataset doesn’t need to be ideal to be authoritative in practice.

The HSCIC and NHS Choices datasets are produced by the relevant official body, they are in wide use, and there are currently no better equivalents. The datasets are therefore, on the face of it, authoritative.

ODUG proposes that DoH establishes “an ongoing process to build, update and maintain on data.gov.uk an authoritative dataset of medical practices and operating practitioners, drawing on the datasets made available by HSCIC and NHS Choices”.

I’m not sure how ODUG expects DoH to build an authoritative dataset by drawing on datasets it has dismissed as non-authoritative. ODUG’s call is to DoH, but in practice DoH would surely delegate any such new process to HSCIC. So what is ODUG proposing HSCIC should do differently?

Maintaining the new dataset on Data.gov.uk is also unlikely to add credibility, given the current state of the DGU catalogue and other functionality. HSCIC already has its own platforms and they seem serviceable for the publication of data. What in the ODUG proposal requires the involvement of Data.gov.uk?

Release of open data or creation of a new data product?

The typical model of open data activism is to argue for the release of existing data assets (usually those held by public authorities) for reuse under an open licence. ODUG was originally set up to frame those arguments based on views from UK data users (within terms of reference from the Cabinet Office).

I’ve never been entirely on board with the idea of submitting “benefits cases” for release of open data, because it seems to conflict with the principle of “open by default”. In my view the onus should be reversed; public authorities should be required to demonstrate why we should not be able to reuse data that they hold. Benefits cases should only be necessary when there are significant costs involved in extracting and publishing the data.

However that model of open data release assumes we are talking about data that the public authority already holds and maintains in order to deliver its public task.

In this instance ODUG seems to be arguing for creation of a new data product, combining the existing HSCIC/NHS Choices datasets with data from other sources such as GMC’s Medical Register and patient acceptance criteria for each GP practice.

That last source in particular would probably involve quite a bit of ongoing administration and processing, as patient acceptance criteria are not held centrally or in a standard format.

Conclusions

Arguing for release of existing data is one thing. Arguing for the creation of new data products and new processes is something more.

I have no doubt there is room for improvement in the existing open data that HSCIC publishes on GP and dental practices. However public datasets are mainly produced to support a public task. I will be surprised if DoH takes up these ODUG recommendations without a more detailed demonstration of why the existing data and processes are inadequate to meet the requirements of the agencies and public bodies it supports.

For purposes of reuse beyond the needs of the health system itself, I think we are already quite well served by the existing open data on GP and dental practices. The ODUG benefits case is somewhat perfunctory; in the absence of more detailed analysis I am unconvinced by its attempts to talk down the value of the existing open datasets.

In my view the most interesting element of the ODUG benefits case is the idea that the Government should require the General Medical Council to release data from the Medical Register on individual practitioners. This register is an existing, useful source of public data that is not currently available for reuse under an open licence. I think a focus on that element, properly explicated, would make a more practical and worthwhile proposal.

Update (24 July 2014)

Giuseppe Sollazzo has written a new post in response to my post above, and I am grateful to him for engaging in further discussion.

Giuseppe’s new post provides a useful gloss on the benefits case and the thinking behind it. However in general the post seems to be more about what ODUG meant to say than what it did say. I cannot find much in there that changes my perspective on the benefits case itself.

There are sound arguments for releasing additional open data in this space, such as data from the Medical Register and data on patient acceptance criteria (if the administrative costs of collecting and maintaining that data can be justified).

But with respect to the core bulk data on GP practices, it seems to me that the key question is whether HSCIC’s EGPCUR dataset – the most robust of the existing sources – is currently available under the Open Government Licence.

ODUG may well be in contact with users who consider the EGPCUR dataset to be inadequate for technical or qualitative reasons. But the benefits case doesn’t get into that detail. As near as I can see the EGPCUR dataset is pretty good for general purposes – provided it is reusable as open data.

So why is this in doubt? ODUG’s benefits case states plainly that the HSCIC is not open data. I assume ODUG may also have given that advice to users who have consulted it. ODUG’s public profile is such that some potential users may take that as definitive and be discouraged from using the data. This is worrisome.

Giuseppe’s new post says ODUG is “asking for clarity”. But if ODUG has already engaged directly and positively with NHS England, why was the licensing position not simply confirmed with the data publisher at the earliest stage?

I have difficulty taking seriously the proposition that EGPCUR is not open data, given that the Department of Health listed it on Data.gov.uk as an Open Government Licence dataset more than three years ago. However this question makes quite a difference to the strength of ODUG’s case. The ongoing availability of a credible open dataset of GP practices is likely to be of wider concern to the open data community than enhancements to existing datasets.