As covered in my earlier post, Unique Property Reference Numbers (UPRNs) have been allocated to about 92% of domestic Energy Performance Certificates in DLUHC's most recent release of bulk EPC data.

Energy assessors have allocated UPRNs to 7.7% of records. DLUHC, which manages the EPC registers, has allocated UPRNs to 84.7% of records by matching the addresses to OS AddressBase. The remaining 7.6% of records currently have no UPRNs.

An individual property may have multiple EPCs, but should have only one UPRN. About 25% of the 15,678,307 unique UPRNs in the domestic EPC dataset are allocated to more than one EPC record.

The vast majority of UPRNs in the EPC dataset have been allocated correctly.

However, in this post I highlight examples of errors in the allocation of UPRNs.


Incorrect UPRNs added by energy assessors

The most obvious errors tend to appear in the allocation of individual UPRNs by energy assessors. Since September 2020 any energy assessor submitting an EPC record to the register has been able to allocate a UPRN from AddressBase.

Some energy assessors may be less than diligent in this task, or be unaware of pitfalls in finding the correct address.

For example, the following records for a 60 m2 mid-terrace house in Leeds and a 244 m2 semi-detached in Hebden Bridge have the same UPRN in the EPC dataset. The first, allocated by an energy assessor, is incorrect:

LMK_KEY: cd415ab1dc443b5f0facee97d38286cdad6e9fb1bd9c1b145df3010ddec9351e
Address: 32 SPRINGFIELD LANE, LEEDS LS27 9PH
Allocated UPRN: 10035031171
Correct UPRN: 72359675

LMK_KEY: 82557929802011020400345543597498
Address: Allswell Farm Wadsworth HEBDEN BRIDGE HX7 8TF
Allocated UPRN: 10035031171
Correct UPRN: 10035031171

Similarly, these records for a 227 m2 detached house in Doncaster and a 130 m2 semi-detached in Lincoln have the same UPRN in the EPC dataset. The first is incorrect:

LMK_KEY: 1b079e2969e084eb286bfef0dbf4da78498ff18ecf110e77e710088e3b7ab6ab
Address: 3A CHURCH LANE, DONCASTER DN4 6QB
Allocated UPRN: 100032089741
Correct UPRN: 10006573417

LMK_KEY: 61518009242018043020222749487308
Address: 35, Eagle Road, North Scarle, LINCOLN LN6 9EW
Allocated UPRN: 100032089741
Correct UPRN: 100032089741

It is unclear whether DLUHC carries out any centralised checking or verification of UPRNs provided by energy assessors. If not, errors from this source will become a bigger problem in future as the proportion of UPRNs from assessors increases.


UPRNs from address mis-matching of flats attached to houses

UPRNs that have been allocated in error by address matching are more difficult to find in the dataset. However, there are two scenarios where these errors seem to occur.

The first is where an annex has been added to an existing house, or the basement of a house has been converted into a flat, and the new property has a very similar address to the main property.

In this example, a semi-detached house in Ealing has a ground-floor flat with its own UPRN. However, in the EPC dataset both records have been address matched to the UPRN for the house:

LMK_KEY: 938542693552013052411043199070708
Address: 10 Ribblesdale Avenue UB5 4NF
Allocated UPRN: 12047058
Correct UPRN: 12047058

LMK_KEY: 8ddfd051fb427a42a2b7e5b72078d82ef80c7c01dcda168b338e2228d4c125fd
Address: Side Flat, 10 Ribblesdale Avenue UB5 4NF
Allocated UPRN: 12047058
Correct UPRN: 12159739


Historic UPRNs mixed up with current UPRNs where addresses are re-used

The second address matching scenario involves confusion between existing properties and properties in demolished or converted buildings, where the address has been re-used.

There is no direct mechanism for removing EPC records from the register when properties cease to exist.

UPRNs are unique to a property, but not unique to an address. DLUHC's address matching method seems to have used only contemporary versions of AddressBase, which may not be sufficient to recognise historical use of addresses.

There are numerous examples in the EPC dataset where DLUHC has allocated the same UPRN to both an existing property and an earlier property, because the address is the same.

Consider, for example, Leys Court in Ruddington. In 2013-14, 30 supported flats that were previously rented to elderly people were converted into 21 general needs flats. The local planning authority allocated a new UPRN to each of those 21 flats.

EPCs were registered when the flats were marketed for sale and, in most cases, rented out again. DLUHC has correctly allocated the new UPRNs to those records.

However, there are 14 records (CSV) where DLUHC has also allocated those new UPRNs to the EPCs for the pre-conversion flats, instead of the correct historic UPRNs – where the street addresses are the same.

This data quality issue seems to be quite common in the EPC dataset.


How can re-users of the EPC dataset detect bad UPRNs?

UPRNs can be verified authoritatively by reference to OS AddressBase. But AddressBase is a commercial product and not freely available.

Public sector organisations have free access to AddressBase under the terms of the Public Sector Geospatial Agreement. However, the scope for PSGA use is restricted and some staff may face internal barriers to quick access.

Users can look up individual UPRNs and addresses using GeoPlace's public FindMyAddress website. The site draws on AddressBase data but is, unfortunately, designed to prevent programmatic access or checking of large numbers of UPRNs.

Some UPRN errors in the EPC dataset can be detected by comparing the values in the POSTCODE field to the postcodes in one of ONS's UPRN datasets: the National Statistics UPRN Lookup (NSUL) or ONS UPRN Directory (ONSUD). If the postcodes are different that will sometimes (but not usually) be because the UPRN is wrong.

The ONS datasets or OS Open UPRN can be used to geocode the UPRNs in the EPC dataset, which means that anomalies can sometimes be detected using GIS techniques.

Where a UPRN has been allocated to multiple EPC records, comparison of other information in the records can highlight errors. For example, one or more of the UPRNs is likely to be wrong if the addresses are in different parts of the country. Inconsistent information in the BUILT_FORM field can indicate a problem. Comparing the LODGEMENT_DATE and TRANSACTION_TYPE fields is also useful, because it is unlikely that the same property will be marketed for rent before it is lodged as a new dwelling.

Unfortunately, the above approaches are more useful for finding examples of errors than they are for consistent detection of bad UPRNs across the whole EPC dataset.