Categories: Visualizations

Detecting fraud in utility billing

Reading Time: 3 mins

An energy utility approached us with an interesting problem:

We know our meter readings are incorrect. This is for various reasons, but fraud is a key component. We don’t, however, have the concrete proof we need to act on this.

Part of their problem was the inexperience in tools or analyses to identify such patterns. The other was the volume of data: the meter readings for just one city was 2 gigabytes.

We took the data in the raw database format, extracted it, and ran it through our toolset. The first step was to look at the frequency of subscribers at various meter readings.

It looks mostly like a log-normal distribution, except that there are large spikes at 50 units, 100 units and 200 units. Interestingly, those are exactly the slab boundaries. Subscribers who consume even one more unit more than 50 would pay at a higher rate plan, and similarly for 100 and 200.

It is statistically impossible (p < 10-18) for this to be a normal event. This clearly shows fraud of some kind.

This is not “random fraud” either – it’s not a random set of people that are benefitting from this just-at-the-boundary slab reading. There are a relatively small group of people who consistently have the same set of readings. Here are the monthly meter readings of 10 subscribers:

Notice the pattern on the first row. 200, 200, 200, 200, 200… such precision in usage would be admirable if it were believable.

What’s also interesting are the smaller spikes at 10, 20, 30, … 90. For the spikes at 50 and 100, there’s an economic reason. For these smaller spikes, there appears to be no economic reason. However, in this case, a different vice was suggested: laziness. These would represent meter readings that were never taken in the first place, and were just entered as round numbers.

So we have a mechanism to detect not just fraud, but laziness too!

To measure the fraud and narrow it down by region, we took the height of the spike as a proxy for the extent of fraud. So if the average of subscribers with a reading of 99 and 101 is 1 million, but 1.5 million customers had a reading of 100, the extent of fraud is:

Fraud = (1.5 million – 1 million) / 1 million = 50%.

We then plotted the extent of fraud by different sections in a city.

This is sorted in descending order of fraud. Section 1 has fraud of around 100% – which means there are nearly twice as many subscribers with a reading of 100 as compared to 99 or 101. While this number has fluctuated a bit, it’s remained quite high right through.

In contrast, Section 9 has relatively less fraud – ranging just up to 37%. (That’s still a huge number, of course.)

Section 5 shows an strange pattern. In June 2010, fraud dipped dramatically. Then, almost as if to make up, it shot back up in September 2010. In our discussions, we identified the cause behind this, but we’ll leave this for you to work out as a lateral thinking puzzle: what do you think caused the anomalous pattern in Section 5?

Gramener - A Straive Company

Gramener – A Straive company is a design-led data science firm. We build custom Data & Al solutions that help solve complex business problems with actionable insights and compelling data stories.

Leave a Comment

View Comments

Share
Published by
Gramener - A Straive Company
Tags: Utilities

Recent Posts

Top 7 Benefits of Using AI for Quality Control in Manufacturing

AI in Manufacturing: Drastically Boosting Quality Control Imagine the factory floors are active with precision… Read More

1 day ago

10 Key Steps to Build a Smart Factory

Did you know the smart factory market is expected to grow significantly over the next… Read More

2 weeks ago

How to Future-Proof Warehouse Operations with Smart Inventory Management?

Effective inventory management is more crucial than ever in today's fast-paced business environment. It directly… Read More

1 month ago

Gramener Bags a Spot in AIM’s Top Data Science Service Providers 2024 Penetration-Maturity (PeMa) Quadrant

Gramener - A Straive Company has secured a spot in Analytics India Magazine’s (AIM) Challengers… Read More

3 months ago

Gramener Wins Nasscom AI Gamechangers 2024 Award for Responsible AI

Recently, we won the Nasscom AI Gamechangers Award for Responsible AI, especially for our Fish… Read More

4 months ago

Master Supply Chain Resilience: 5 Powerful Lessons from Our Location Intelligence Webinar

Supply chain disruptions can arise from various sources, such as extreme weather events, geopolitical tensions,… Read More

4 months ago

This website uses cookies.