Posts Tagged ‘error’
I have a puzzle for you: thousands of patients are apparently missing from the English waiting list. I don’t know where they are (though I’ll have a go at guessing), and I’m hoping some of you can help me.
Here’s the problem.
In principle, we should be able to start with, say, the 4-5 week waiters from the end-of-January waiting list, take away those patients who were admitted and non-admitted from the cohort during February, and (because February was exactly 4 weeks long) end up with an estimate of the 8-9 week waiters on the end-of-February waiting list.
That method would miss any patients who were removed without being seen or treated (for instance ‘validated’ patients who had been reported on the January waiting list in error), but that error should all be in one direction: to make the reported February figure smaller than our estimate. Patients cannot appear on the waiting list with several weeks on the clock out of thin air, can they? So our estimate, minus the reported end-of-February list, should always produce an anomaly that is positive and which reflects validation during February.
Sounds great. But if you actually do the sums you come across some oddities. Several, in fact, as you can see from the supposedly-impossible negative values in the chart below.
1) Missing very-short waiters
The first oddity is for the very shortest waiters. If you take the number of patients across England who have waited 1-2 weeks at the end of January, and knock off February’s admitted and non-admitted patients, then the expected number of 5-6 week waiters at the end of February should be no more than about 177,720. But in fact some 179,087 were reported in the end-of-February waiting list data: more than a thousand too many. That’s the small negative anomaly at 5-6 weeks in the chart above. A thousand-odd patients have appeared in the February figures out of thin air. Where did they come from?
They weren’t new referrals being treated immediately (they could only affect February’s 4-5 week cohort, which should really be part of this oddity as well). So they must only have appeared on the waiting list a week after referral. This, as far as I am aware, is quite common, because paper referrals are often graded for urgency by the consultant before being recorded on PAS, and this process can take as long as a week or two. So if that’s the explanation then that would explain the first oddity.
2) Missing 9-week waiters
The second oddity crops up at 8-10 weeks, and this is larger and more mysterious. At the end of January there were 233,003 patients on the waiting list who had waited 4-6 weeks since referral. After deducting the relevant admitted and non-admitted patients, you are left with an upper limit for 8-10 week waiters at the end of February of about 129,045. But in fact the reported figures show there were 144,434: some 15,389 too many, and causing the large negative anomaly in the chart. That’s a lot of patients suddenly appearing in the February figures. Where did they come from?
I don’t know the answer to this one, which is why I’m asking. But my guess is that this has something to do with cancer pathways. Could it be that some cancer patients are not being reported in the incomplete pathways statistics, but are being reported in the admitted and non-admitted figures? The NHS Standard Contract specifies that cancer patients should be treated within 62 days of referral, which is 9 weeks and coincides nearly enough with this anomaly. If large numbers of cancer patients are not being recorded in hospitals’ mainstream computer systems, which this explanation implies, then that in itself could be worrying because parallel and duplicate administrative systems can lead to patients getting lost.
3) Missing 17-week waiters
The third oddity is around 18 week waits. It isn’t large enough to appear as a negative anomaly in the national statistics charted above (though it does show as a step-change), but if you drill down to Trust level it does produce a negative anomaly for some individual Trusts. Because the cohort-tracking sums are inexact, and because quite a few Trusts crop up in this analysis, I am not going to name Trusts individually but instead will look at the overall pattern.
At some Trusts, the reported number of patients waiting 17-18 weeks at the end of February is higher than you would expect (a negative anomaly at Trust level), and they have no negative anomaly for 18-19 week waiters. In most cases the negative anomaly is small (or a small percentage). But in a handful of Trusts it does look significant; in other words significantly more patients are being reported just within the 18-week target than you would expect.
Again I don’t know what the explanation is, but my guess is that some Trusts (or some parts of some Trusts) might be applying clock pauses to their waiting list figures. That is strictly forbidden; the guidance says (emphasis in original):
“Clock pauses may be applied to incomplete/open pathways locally – to aid good waiting list management and to ensure patients are treated in order of clinical priority – however, adjustments must not be applied to either non-admitted or incomplete pathways RTT data reported in monthly RTT returns to the Department of Health.“
4) Disappearing 18-week breaches
The final oddity is just above the 18-week mark, and this anomaly goes in the opposite direction. From 18-22 weeks, the end-of-February waiting list is around half the expected size, so the anomaly is much more positive than expected.
My guess is that this is the result of waiting list validation being targeted at over-18-week waiters so that they don’t score against the admitted and non-admitted standards. This is a largely redundant tactic now that the main focus of the penalties, from April, is on incomplete pathways; Trusts today would be better advised to focus their validation efforts on patients approaching 18 weeks, rather than those who have already breached.
So there are four oddities in the data. If you can help explain any of them, or at least explain what is happening where you work, then do leave a comment below this post on the HSJ website (either anonymously or otherwise), or contact me in confidence by email or publicly on Twitter.
If you want to dive into the figures, you can download a spreadsheet that contains all the detailed calculations here.
A few more suggestions that have been put to me since I posted this:
Some missing waiters around the nine-week mark could be Choose & Book patients, who were told by C&B that no appointments were available and therefore raised an ASI (Appointment Slot Issue). Those patients might then be managed on paper by the hospital until their slot is arranged, which might take several weeks, during which they might not be reported as incomplete pathways. (Incidentally, this is a wasteful and risky administrative process, and the patient usually ends up in a similarly-dated slot to the one they would have had if C&B polling ranges had simply been extended.)
Some missing patients close to the 18-week mark at Trust level (though not at national level) are tertiary referrals. These arrive at the tertiary centre with time already on the clock (although there is now the option for the referring provider to take the ‘hit’ on any breaches caused by delays at their end: http://transparency.dh.gov.uk/files/2012/06/RTT-Reporting-patients-who-transfer-between-NHS-Trusts.pdf).
Here is a comment left at the HSJ website:
Anonymous | 2-May-2013 11:13 am
A few points come to mind in response to this article:
- As a general comment, early this (calendar) year, the impending financial penalties for >52 week waiters resulted in a flurry of activity to clear up waiting lists and address data quality issues. This almost certainly has created lots of apparent anomalies that are in fact data quality corrections.
- The >52 week penalties are contained in the standard NHS contract template – you will find that some CCGs have chosen not to include them in the final versions used for their providers. I think this may happen in situations where the provider is on a block contract. This is probably not a major factor though.
- My experience suggests that providers will not stop validating 18 week breaches against the clock stop targets – I am not sure any board or exec would simply not be worried about breaches that aren’t really breaches, financial penalty or not. It is still a core operational standard (as defined by the NTDA) so will still create a fuss if not achieved.
- as regards the missing very short waiters, grading for urgency by clinicans has definitley been known to take longer than 2 weeks. A less than one percent discrepancy could easily be explained by late grading and, probably more commonly, hospitals without single points of referral receipt not getting things on the system ina timely fashion e.g. letters going directly to med secs who sit on them for too long. If you know the patient won’t be seen for >10 weeks, why bother getting them on the system – this is the attitude in some cases at least!
Official statistics aren’t perfect, and that goes for the waiting list too. Sometimes Trusts discover waiting lists that they should have been reporting, but weren’t. Sometimes they find problems with their data, take a ‘reporting break’ for a while, and then resume on a different basis. And data can also be discontinuous when Trusts are abolished and created, or when services shut down or move.
So stuff happens, and it all affects the reported number of patients on the waiting list. The question is: when you add up all these changes, could they explain the apparent growth in the English waiting list? Funnily enough it turns out that, yes, they could.
Here is the officially-reported number of patients on the English waiting list (count of incomplete pathways) since the 18-week target was achieved ‘properly’ in summer 2009. You may recognise this chart from my monthly reports on waiting times in England, and as you can see the red line is looking high for the time of year.
But if you trawl through all the detail at Trust-specialty level, and strip out any apparent step-changes in counting, the chart looks like this instead:
As if by magic, the increase has disappeared. It isn’t proof, but it’s enough to cast serious doubt on the apparent increase, and I think we can all be more relaxed about it. After adjustment, the size of the waiting list looks pretty stable year after year, and any increases and decreases are lost in the noise without any discernible trend.
You may be feeling sceptical at this point, which is perfectly reasonable. So now I’ll explain exactly how I adjusted the official figures to produce the second chart, and you can make your own mind up about the conclusions.
Fans of statistical process control may be thinking of 3-sigma variations or CUSUM charting at this point, but the problem with those methods is that they all rely on deviations from an intended or mean central value. But the size of a waiting list does not have a central value, so we need to use a different approach. Instead I applied two rules to detect steps that may be caused by counting changes; either:
1) the reported list size falls to zero, or rises from zero, which should detect new or closed services and ‘reporting holidays’; or
2) the average of the next 4 months differs from the average of the previous 4 months by more than 2 standard deviations (where standard deviation is measured month by month over the whole time series), which should detect ‘newly-discovered’ waiting lists and major validation exercises.
The two tests were applied month by month to list size data from August 2009 to January 2013, at Trust-specialty level, which is the most granular data publicly available and therefore gives the best chance of detecting service-level changes. Steps in the data were detected in 2.4 per cent of months, which is equivalent to a step-change every 3.5 years at Trust-specialty level.
The data trawl was based on the current list of Trusts, so further adjustments were made for Trusts who existed in the March 2012 data but not the following month (principally pre-merger Barts). No Trusts disappeared from the data series in the month following March 2011 or March 2010.
If you have ever tried to detect anomalous deviations in time series data, you will know how frustrating it is. Sometimes your eye tells you there is a screaming change in the data, but your formula doesn’t pick it up. Other times your formula picks up a deviation that your eye tells you is just noise. The eye is very good at pattern-recognition, but it is also subjective, easily-led, and gets tired. So with 2,622 Trust-specialties to trawl, it’s better to let the computer do the work and hope the errors come out in the wash.
Let’s take a look at some examples of steps detected by the two rules. In each chart, the blue line is the list size (count of incomplete pathways) for one specialty in one Trust, and the yellow column indicates where a step up or down has been detected by the rules.
Here is a new Trust coming into existence:
Here the size of waiting list steps up, perhaps after the Trust discovered an unrecorded waiting list:
In this one, a Trust discovered a problem with its waiting list data, took a ‘reporting holiday’, and resumed reporting with corrected data:
I mentioned that sometimes the eyeball and the computer disagree with each other, and here are a couple of examples. Firstly, here is an example where the computer detected a step but the eyeball says it’s just noise:
And here is some data where the eyeball says this is a service that is being progressively shut down. The algorithm, however, doesn’t detect the early stages of the closure because the standard deviation is so high that the steps don’t exceed the two-sigma threshold, and only the final closure down to zero is detected.
To end the examples on a positive, here is some noisy data where no steps are detected by either the computer or the eyeball.
Whenever a step is detected, the later data is assumed to be correct, and all months prior to the step are adjusted by the size of the step. For instance, if the waiting list steps-up by 1,000 patients in June 2011, then all months prior to June 2011 are adjusted by adding 1,000 patients.
The total size of the adjustments across all Trusts and specialties is:
The adjustments made are shown by the green line and, as we saw, they are enough to put the waiting list on the same path as in previous years. Given that the total list size is a decent leading indicator of long-wait pressures feeding through, that would indicate that (at least so far) pressure is not building on the waiting list itself.
The constant caveat, of course, is that the list size does not tell the whole story because referral restrictions may be holding up patients before they get that far.
UPDATE: This methodology is now incorporated into my regular monthly analysis of the English waiting list, with a couple of differences. Firstly, independent sector providers will be included. Secondly, hospitals admitting fewer than 50 patients in the most recent month will be excluded. The overall conclusions remain the same despite the changes.
Why is forward planning such a slog in the NHS? Fundamentally, all we are doing is this:
- Take what happened last year
- Add a bit
- Adjust for any specific pathway and demand management changes
- Apply some agreed performance assumptions using well-known equations
- Output the results as activity, capacity and money.
- Profile it all into a monthly plan.
The first thing that makes it difficult is the sheer volume of numbers involved. Your plans need to break everything down at least by specialty (treatment function code), or by HRG chapter, or even by HRG. Then you need to separate out emergency spells, elective spells, A&E, first outpatients, etc. And you need to split it by commissioner or provider, and possibly by provider site as well. All in all, you are looking at dozens of service lines at least, and quite possibly hundreds.
The second problem is that different kinds of data come from different places in different formats (including notes of meetings and scraps of paper). Some of the performance assumptions are broad-brush, some are detailed, and some are exceptions to a general rule. They somehow need knitting together into a single planning model. And they keep changing: time goes by and more recent activity data becomes available; performance assumptions and pathways are negotiated and amended; new guidance comes down from the Department of Health (and in future the Commissioning Board).
The third problem is that some of the historical data is prone to errors: activity is not completely or correctly coded, there are delays in recording events on the system, there are duplicates and omissions, and changing customs and practices cause coding drift and other systematic error. To some extent, these errors can be detected and corrected automatically; in many cases they can’t.
The fourth problem is that well-known equations do not exist for some of the workings. Waiting time standards of the form “90% of patients must be treated within 8 weeks” have historically been a high-profile example; the standard is easy to state, but to model it properly you need to take in the effect of clinical urgency, cancellations, whether you are running a fully-booked or partially-booked system, and other factors. If you try to simplify the problem by assuming that current practice reflects how things ought to be, then you are ignoring (often substantial) opportunities to improve.
There are similar problems with monthly profiling: you can profile non-elective work based on historical patterns; but what agreed methods are there for profiling inpatient elective work around peaks in non-elective demand, when the 18-week waiting time limit means that you can’t slow down surgery very much over the winter?
The fifth problem is that you probably have the wrong tools for the job. The suggested tool for presenting your plans is usually a spreadsheet, and (despite the well-known problems with spreadsheet errors, and their limitations when it comes to iterative calculations) they are the cultural default.
How much does this matter? Aren’t these plans just shelfware? Feeding the beast, and all that?
Actually, no. Although your painstakingly-crafted plans may end up on the shelf afterwards, there are two good reasons why the effort is important:
- The planning process causes lots of conversations to happen that do change the way healthcare is delivered, and the numbers make sure those conversations are tough enough.
- The financial squeeze is now adding urgency: PCTs will not be allowed to create a legacy of debt for future GP consortia, continually-rising demand is no longer affordable, and hospitals have a capacity overhang from the boom years… and so back to point 1 above.
It is natural when planning to focus on the correctness of the calculations. The complexity of the process can make this all-consuming.
But it is equally important to make sure that everyone else involved can keep track of the performance and pathway assumptions being used. Why? Because when clinicians and managers make changes to healthcare in real life, they are implementing changes to these assumptions.
Of course the calculations must be right, and the “bottom line” results are crucial in showing how much further negotiation will be needed. But it is also worth paying attention to the presentation of those key assumptions. If other people can easily see what they are, what they mean, and how they change during negotiations, then better decisions will be made about them, and the planning process will be a more powerful force for improvement in the real world.
First law of forecasting
Forecasts are always wrong.
Second law of forecasting
Detailed forecasts are worse than aggregate forecasts.
Third law of forecasting
The further into the future, the less reliable the forecast will be.
Factory Physics, p.441
So if all forecasts are wrong, why bother? Well, the “first law” is a bit mischievous; instead of “wrong” perhaps “inaccurate” would be closer to the truth. As a Professor of Statistics once said:
All models are wrong but some are useful.
We cannot avoid forecasting. Even if we refuse to make explicit forecasts, and just carry on as usual, then we are effectively forecasting that the future will be like the past. So we make forecasts because we expect the future to be different in some way, or because we expect the analysis to tell us something useful that we don’t already know… or perhaps because someone told us to.
All forecasting starts by estimating future demand, and in healthcare there are two main ways of doing this. We could look at population, morbidity, medical advance, and anything else we can think of, and try to work out from first principles how much demand there should be for healthcare. Try it if you like, but you’ll be massively and embarrassingly wrong. The better alternative is to start by looking at actual demand in the recent past, and estimating how it might be affected by future trends.
And how do we measure demand? In theory we want to get as close to the source of demand as we can: which from a GP Commissioner’s point of view means evaluating all contacts between primary care practitioners and patients; and from an acute hospital’s point of view means evaluating GP and consultant referrals and A&E arrivals. Which is all very well, but in practice does not give us a complete enough picture; we don’t know what is wrong with patients when they first arrive, and so we don’t know what activity will be needed to care for them. So in healthcare, we end up using activity as a proxy for demand.
Starting with observed activity as our baseline, we then apply some kind of trend growth rate. This trend might indeed be based on demographics and medical advance (but these usually underestimate growth by a large margin), or worked backwards from financial affordability (which at best shows the scale of the challenge facing us, or at worst is merely wishful thinking), or simply estimated by looking at what happened in recent years (which is pragmatic and usually best).
Whichever method we pick, it is still going to be either inaccurate or a fluke. No trend continues forever, and these errors in future demand trends are a big source of error in any healthcare forecasting model. The more detailed we make our plan (HRGs, monthly profiles…), the more volatile the numbers; the further into the future we go (25-year PFI capacity plans…) the worse our trend assumptions. The second and third laws of forecasting are right about all that.
Given the inaccuracies around demand, there is little point in being over-sophisticated about the rest of the forecast. But there are a few other things that make a big enough difference to matter:
- If we’re using part-year historical data in a highly-seasonal area such as medicine or trauma, then we need to smooth it for the seasonal effects to make the baseline representative. (Though it’s usually easier just to use a full year’s data.)
- If we’ve been doing a lot of non-recurring activity (or failing to keep up with demand) in the past, then we need to adjust our baseline demand accordingly.
- What if there are specific things we know we are going to change, such as diverting COPD patients to a primary care led service, ceasing a low-effectiveness treatment, or stopping activity that does not address demand? The best way to handle these is to change the baseline activity as if the change were already in place.
Other than correcting for those kinds of things, the emphasis of our forecasting should not be on trying to improve accuracy any further: we have done enough.
Instead we should focus on making our forecasting useful. What capacity will providers need? What will waiting times be? How much will it cost? Where can we disinvest? How should we present the results so that we can understand them and take the right action?
There is another benefit to keeping forecasting simple and pragmatic: it makes it easier to relate our high-level longer-term forecasts to our more-detailed and shorter-term operating plans. By adopting common assumptions, when reality doesn’t turn out quite the same as our forecast and our operating plans are adapting, at least we can relate our local knowledge more easily to the big picture.
Garbage in, garbage out.
When you’re planning for next financial year, you don’t want your modelling to be shot through with data errors. But neither do you want to have to pick through your data, line by line, looking for errors and fixing them manually (and perhaps inconsistently).
So how can you detect and fix common data errors automatically? Naturally, it depends on what’s wrong, and the difficulties comes in several flavours.
The data you get directly from your activity database is likely to be pretty complete (accuracy is a different matter), so that takes care of completeness for activity counts, actual lengths of stay, urgency rates (which are essential for calculating waiting times), and so on. You are more likely to find gaps in data that comes from elsewhere, such as:
- demand growth assumptions
- waiting time targets
- waiting list sizes
- removal rates
- bed occupancy and bed utilisation assumptions
- bed, theatre and clinic performance assumptions
The important thing is to be explicit about the assumptions you want to make when any of these data items are missing. In many cases zero will be an acceptable default when data is missing, but sometimes it won’t and it’s dangerous to assume.
For instance if you know the waiting list size at the start, but not the end, of your historical data period, then zero would not be a safe default because your model would then be based on a rapidly shrinking list size (and therefore a high level of historical non-recurring activity). So it would be more sensible to assume the waiting list had remained a constant size, and populate the missing end list by copying the start list. Or vice versa, if it’s the start list size that’s missing.
For demand growth (or waiting time targets) you may have a standard set of assumptions, such as 3% growth for non-electives (or 90% admitted within 8 weeks), in which case you just need to ensure that your standard assumption is used wherever it is needed (but without over-writing any exceptions).
For removals, zero is often a good-enough default because any systematic errors in the removal rate should be second-order in a well-constructed demand calculation.
Capacity performance figures are proportional to the capacity being calculated, and so it is important to get these numbers right. However it is common for very broad assumptions to be made without proper consideration. For instance, 85% bed occupancy is often assumed to be a suitable buffer against fluctuations, but for this figure to be arbitrarily raised to 95% when the calculations show that a bigger hospital would be needed! This is a big subject in its own right, but the broad message is that Trusts would benefit from closer attendance to capacity assumptions.
Even if the raw data in a low-volume service is accurate, it can still be misleading for modelling purposes because the data is “noisy”. For instance, in a service that is provided only occasionally by one consultant, demand might fluctuate wildly:
- 10 in 2007-08
- 5 in 2008-09
- 18 in 2009-10
You can see the danger. If you used only the years from 2008-09 to 2009-10 when calculating the trend, you might conclude that demand was growing at 2,600% per year and so future demand would be:
- 65 in 2010-11
- 233 in 2011-12
- 840 in 2012-13
Which would be ridiculous, but not necessarily easy to spot if you’re crunching dozens (or even hundreds) of services automatically in a giant spreadsheet.
Instead, you need to cap demand growth within sensible limits. It is also sensible to avoid conducting detailed waiting time modelling on very small services (again because of noisy data leading you astray), and instead assume simply that demand must be met.
Additions data is used in waiting time calculations, but this data source is notoriously unreliable. The standard check for data quality (in the absence of suspensions and deferrals) is the reconciliation formula:
start list + additions = end list + activity + removals
So in most cases you can cross-check additions data against the other data, and if it lies outside a defined tolerance then you can cap it. So far, so simple.
However there is a complication when it comes to admitted patients. Daycases who stay overnight automatically become inpatients. So when you use the formula, you might find that it is out of balance because some of the daycase additions ended up as inpatient activity.
The solution is to consider daycases and inpatients together when detecting errors in the additions figures for a given specialty. If the reconciliation works out well in total, then the separate additions figures do not need adjusting.
Waiting list data
Sometimes there is a delay between receiving an elective referral (or making a decision to admit) and logging that patient onto the IT system as an addition to the outpatient (or inpatient/daycase) waiting list. So if you were to extract yesterday’s waiting list you would miss any patients that haven’t been keyed-in yet.
For planning purposes, this problem can usually be avoided just by using older data. When planning for next financial year, some months before it even starts, it makes little difference whether you rely on data up to the end of last month or the month before.
Gooroo Planner has comprehensive error detection and correction built-in, under the control of settings tables that are pre-set with defaults and editable by the user. This ensures that automated error handling is performed consistently and under user control.
Type “spreadsheet error” into Google and a litany of woes will unfurl. Spreadsheet error is serious stuff: so serious that it’s a research field in its own right, and even has an international organisation devoted to it. The leading researcher in the field, a fellow called Raymond Panko, concludes:
All in all, the research done to date in spreadsheet development presents a very disturbing picture. Every study that has attempted to measure errors, without exception, has found them at rates that would be unacceptable in any organization. … With such high cell error rates, most large spreadsheets will have multiple errors, and even relatively small “scratch pad” spreadsheets will have a significant probability of error.
Surely any serious errors would be picked up before they did lasting damage? That’s what everybody hopes, but unfortunately the answer is: sometimes. And sometimes not. At the height of the banking crisis, for instance, Barclays Capital accidentally bought 179 trading contracts they didn’t want from the collapsed Lehman Brothers, all because of a spreadsheet formatting error. As one study of spreadsheet errors put it:
We draw two fundamental conclusions… First, it is clear that spreadsheet errors sometimes lead to major losses and/or bad decisions in practice. Indeed, we heard about managers losing their jobs because of inadequate spreadsheet quality control. Second, many senior decision makers whose organizations produce erroneous spreadsheets do not report serious losses or bad decisions stemming from those flawed spreadsheets. Hence, it seems in no way inevitable that errors in spreadsheets that inform decisions automatically lead to bad decisions.
But then again, it is in no way inevitable that they won’t. (Unless spreadsheets totally lack influence in your organisation, in which case why create them?)
Of course spreadsheets have their place. You use them all the time, and so do I, because they’re easy, flexible, and great for doing things “on the fly”. But a useful spreadsheet also has a tendency to grow like Topsy, until you start to worry that you’re not entirely sure exactly how it works, keep finding mistakes in it, notice “odd” results that don’t reconcile, and find that you’re spending inordinate amounts of time on it with relatively little to show for your efforts.
Or, to get down to the nitty gritty, you might remember times when:
- a formula was “temporarily” changed to a number, but not changed back again;
- a formula was changed, but the change was not copied across to other cells;
- an error was made when typing in data from a different source;
- cells were linked to an external table, but some of the links were misaligned.
We’ve all been there.
Once a spreadsheet becomes complex, or needs to be flooded with data from a database, or needs to run very similar calculations over and over again with different numbers, then it probably shouldn’t be a spreadsheet any more. NHS planning spreadsheets will typically tick all those boxes.