 
 The process of collecting, cleaning, analysing – and eventually visualising – tens of thousands of financial records gets very complicated very quickly.
This project overwhelmingly involves data generated and reported by humans, and humans make mistakes. They add extra zeros; they forget decimal places; they spell people’s names wrong; they spell their own names wrong. Even when they’re not making explicit errors, people also introduce an inevitable level of variability to any dataset.
Wrangling this kind of data into a format that’s usable for analysis and investigation – especially at this scale – is always going to involve making a series of decisions about how to standardise nonstandard data.
Below is a record of where we found the data and how we collected it, but it’s also a log of all those decisions and – when necessary – why we made them.
To build the database, we pulled information from three main sources, each of which corresponds to one of the groups or individuals investigated by the project:
To collect the first two, we scraped each individual record from the Parliament website. For the data on donations to parties, we manually downloaded the relevant files from the Electoral Commission’s website. All three will be updated on an ongoing basis. Every time a new register or version of is published, the data will be collected, cleaned, tested and then added to the database.
In addition to these datasets, the database also includes basic information about MPs (i.e. party, constituency, gender, etc.) and parties (i.e. abbreviation, whether it’s in government, etc.), which we collected via Parliament’s Members API.
Our guiding methodological principle for this project was to err on the side of taking register entries at face-value. However, when mistakes were very clear or we felt correcting an entry would increase the quality of the data, our threshold for making any amendments was high – whether the intended meaning of the original entry was obvious. This mostly allowed us to fix simple errors like typos, missing decimals and illogical dates, but it also informed other, more complex decisions like the ones below.
Dates
The database is limited to the current Parliament. It only includes donations, payments, members, all-party parliamentary groups and parties that were made or have operated from 19 December 2019 onwards.
In most cases, we used the “date registered” – or the date on which the payment, donation or benefit was registered with the corresponding oversight body – to determine whether an interest was relevant to the current Parliament. When this date was not provided, we used the date on which the register containing the item was published. However, we only included earnings from secondary employment for work performed on or after 19 December 2019, regardless of the date it was received or registered, and used the date received as the cut-off. The exception to this rule was royalty payments for books written before the current parliament, which we did include.
All members, APPGs and political parties that have operated at some point during the current Parliament were included. If an MP left office for any reason and was replaced in a by-election, both members were included in the database. All parties that have been represented by an MP in the House of Commons at any point during the current Parliament – regardless of that MPs current status or party membership – were included. We used the date of the latest register on which an APPG appeared to determine whether it was active during the current Parliament. If that date was the most recent register, we assumed the group is still active.
Values
Although the database retains all of the values as they were originally entered into the registers, all the values displayed by the tool itself are rounded, including aggregate figures. The scale of rounding depends on the value itself:
Updates
Existing entries are frequently updated by members. When the next register is published, the new entry replaces the old one. This was often easy to spot because members clearly indicated in the new entry that it was an update of an existing one, and we updated those entries in the database accordingly. We also did our best to identify entries in which members did not explicitly highlight corrections, link the new and the old together, and update them to avoid counting the same donations or payments twice. But it’s inevitable that some of those entries were missed.
From time to time, there are also entries that will appear for a register or two and then silently disappear without replacement. Entries are meant to remain on the register for 12 months, which makes this odd. It could be an error on the part of the recipient (i.e. MP, APPG or party) or the office maintaining the registry, but there’s no way for us to know exactly what’s happened. When entries only appear on a single register, we have removed them. When entries appear on multiple registers, but for less than 12 months without explanation, we have retained them.
Source Matching
The source of a donation or payment can be either an organisation (i.e. companies, institutions, etc.) or an individual.
Often the name of an organisation was recorded slightly differently in different entries (e.g. “Unite” and “Unite the Union”), and when possible we combined these into a single source. For two organisations that are legally distinct but related in some way (e.g. one organisation is a subsidiary of another), we mostly left these as separate sources.
For individuals, we almost never attempted to merge name variations together. In a very small number of cases, we combined name variations for public figures who are well known for making political donations (e.g. Lord Peter Cruddas). So with the exception of those individuals, each name is as it was originally entered in the public record.
Overlap Between Entries
We know that there’s an unknown amount of overlap between payments to members, payments to parties and payments to APPGs.
For example, a central party might receive a donation and then pass all or part of that donation on to one of its MPs. If both the party and the member declare that donation, the same sum of money would then appear on both the Electoral Commission’s register and the Register of Members’ Financial Interests. In our database, that payment would also appear twice. The same scenario is also possible with a member and an APPG.
Because there’s no shared system of identification that would allow us to track payments across the three different registers, we almost exclusively erred on the side of not touching two payments we suspected to be the same payment. In a very small number of cases, we believed that removing obvious duplicates between different sources would increase the quality of the data, and we removed the duplicated entries from the database.
Every current of Member of Parliament is required by law to declare certain financial interests within 28 days of acquiring them. Those declarations then appear – and remain for an entire year – on the Register of Members’ Financial Interests, which the Parliamentary Commissioner for Standards updates and publishes roughly every fortnight.
Those financial interests can be separated into two broad categories, each of which had to be cleaned and analysed in slightly different ways: earnings from secondary employment and donations, benefits and gifts.
Secondary Employment and Earnings
Members must report all earnings over £100 (as well as those under £100 for every source from which they receive more than £300 in total in one calendar year) which they receive in return for their labour or services.
Most of these earnings are individual, one-off payments recorded in the register as one item on one date. For example, an MP wrote an article for a newspaper and earned £300. These are relatively straightforward, but when adding them to the database, we made the following methodological decisions:
A smaller number of earnings are recurring payments, usually recorded in the register as one item that includes some sort of time period and payment frequency. These are much more complicated to add to the database because there’s more variation in how members report them, but we did our best to calculate approximate values.
We multiplied together the total number of days a member worked across a given time period by the average daily income the member would have made across the same time period given the payment frequency. However, not all of these values were provided for each recurring payment, nor could the values always be extrapolated cleanly from the information that was provided, and we had to make the following methodological decisions:
One final note on recurring payments: We made the choice to actively split up each of these recurring payments into individual ones because the Westminster Accounts project, as a whole, is focused on not just the origin, type, and size of payments and donations, but also the frequency of those payments and donations. This was not a decision we made lightly. And while it’s arguably the largest deviation from the public record we took, it’s also consistent with the historical register, which required MPs to report recurring payments as individual interests until 2015. In the end, it took considerably more time and resources to harvest the individual payments from the original entries, but without doing so it would have been impossible to assess the cadence and regularity of a members’ income – and thus the cadence and regularity of their contact with the sources of that income.
Donations, Benefits and Gifts
Members also must report all donations over £1,500 (as well as those over £500 for every source from which they receive more than £1,500 in total in one calendar year). For non-cash donations, they must provide an equivalent value.
These were relatively straightforward to add to the database, but we made the following methodological decisions for specific scenarios:
Like members, APPGs are required to report certain financial interests. Those declarations then appear on the Register of All-Party Parliamentary Groups, which the Parliamentary Commissioner for Standards updates and publishes roughly every six weeks.
On the register, those financial interests appear in one of two categories: cash donations over £1,500 and benefits-in-kind worth more than £1,500. To include these in the database, we made the following methodological decisions:
Political parties are required to report all donations to the central party over £7,500 to any other party unit over £1,500 to the Electoral Commission, the independent body which oversees elections and political finance in the UK. Although the data published by the Commission every quarter includes donations to both the central political party and other party units (e.g. local branches), we excluded the latter from our database.
We have also excluded “short money”, a form of public funding received by the main opposition party from the House of Commons.
Contacts
You can direct any questions or comments to westminster.accounts@tortoisemedia.com.